Nutch, -readseg . , html . .
Nutch Eclipse, Fetcher.
pstatus = output(fit.url, fit.datum, content, status, CrawlDatum.STATUS_FETCH_SUCCESS);
updateStatus(content.getContent().length);
Fetcher. html:
content.getContent();
html , String . : Nutch UTF-8 Nutch. , , Eclipse. , , "charset" :
String yourContent = new String(content.getContent, encodingYouFound);
"encoding" String, , "". , charset, , UTF-8.