Nutch 2.1 injection urls forever

I am trying to deploy nutch 2.1 on Ubuntu 12.04, following this tutorial . Everything is going fine until I try to enter the URLs into the database. When I type ($ bin / nutch inject urls) and click Sign in, I get

    InjectorJob: starting
    InjectorJob: urlDir: urls

and remains there (for several hours) until I decide to cancel the execution. urls is a directory that contains a file with urls. I added the proxy and port data in the nutch-site.xml file as suggested here , but this does not solve. I tried apache nutch 2.2.1 and the problem continued.

If you know how to fix this problem, please help me!

Thanks in advance.

+4
source share
1

Ubuntu IP- 127.0.1.1. HBase ( ), IP- 127.0.0.1.

Ubuntu /etc/hosts ( myComputerName - ):

127.0.0.1   localhost
127.0.1.1   myComputerName

sudo gedit /etc/hosts hosts :

127.0.0.1   localhost
127.0.0.1   myComputerName

Ubuntu. Nutch URL- HBase.

+3

Source: https://habr.com/ru/post/1536362/


All Articles