Building a vertical finder with Bixo

I ran into the open source crawler Bixo. Has anyone tried it? Could you share your experience? Could we focus the guiding tracked track (compared to the Nutch / Heritrix)? thanks nine

+3
source share
1 answer

I used Bixo in production on a large social networking site (100M page views / day) to classify user content (basically everything that was created by the user with a link in it).

It was a pretty complicated workflow using Cascading for

  • Dedupe urls
  • make Bixo get the contents of the page,
  • , - ..

Cascading, Bixo Cascading, , , URL- .

, , , , - . , , . Bixo, Cascading, .

. ( ) , ( "" URL-). , , Bixo , .

. 6-9 , , .

+8

Source: https://habr.com/ru/post/1755209/


All Articles