Connect MySQL to Apache nutch

I am using Apache Nutch for the first time. How can I store data in a MySQL database after a scan? I want to be able to easily use data in other web applications.

I found a related question, but I don’t understand which part of the id gona code is being replaced with a MySQL connector. Please help with a shortcode example.

+3
source share
3 answers

Get the source from http://mirror.nyi.net/apache//nutch/apache-nutch-1.2-src.zip

Open the org.apache.nutch.crawl.Crawlclass in the editor.

Search variable Path crawlDb = new Path(dir + "/crawldb");

The variable will give a hint about where to replace the code to get your own class CustomMySQLCrawl.

: crawlDbTool.update(crawlDb, segs, true, true); // update crawldb , . , .

+3

: Lucene, Nutch, ( , Nutch 2.0) .

Lucene , . , , Nutch.

+1

Nutch, -readseg . , html . .

Nutch Eclipse, Fetcher.

pstatus = output(fit.url, fit.datum, content, status, CrawlDatum.STATUS_FETCH_SUCCESS);
updateStatus(content.getContent().length);

Fetcher. html:

content.getContent();

html , String . : Nutch UTF-8 Nutch. , , Eclipse. , , "charset" :

String yourContent = new String(content.getContent, encodingYouFound);

"encoding" String, , "". , charset, , ​​ UTF-8.

+1

Source: https://habr.com/ru/post/1785062/


All Articles