Nokigiri vs Goliath ... or can they get along?

I have a project that has to parse literally hundreds of thousands of HTML and XML documents.

I thought it would be a great opportunity to learn Ruby fibers and the new Goliath system.

But, obviously, Goliath crashes if you use blocking libraries. But the problem is that I don’t know how to say what “thread safety” is (even if this is the correct term for Goliath).

So my question is: can Nokogiri cause any problems with Goliath or multithreaded / fibers in general?

If so, is there anything safer to use than Nokogiri?

thanks

+4
source share
1 answer

Goliath is a web environment, so I assume you plan to "swallow" these documents via HTTP? Each request falls into a ruby ​​fiber, but the server works efficiently in one reactor.

So, to answer your question: Nokogiri is thread safe as far as I know, but it doesn't even really matter. What you have to look for: while the document is being analyzed, the CPU is pinned and Goliath does not accept any new requests. Thus, you will need to implement the correct logic to handle your specific case (for example: you can perform a stream analysis on pieces of data coming from a socket, or load a balance between several goliath servers or both ... :-))

+5
source

Source: https://habr.com/ru/post/1347545/


All Articles