Details of the following natural language processing procedures?

Named Entity Extraction (extract ppl, cities, organizations) Content Tagging (extract topic tags by scanning doc) Structured Data Extraction Topic Categorization (taxonomy classification by scanning doc....bayesian ) Text extraction (HTML page cleaning) 

Are there any libraries that I can use to perform any of the above NLP functions?

Dont really feel like playing money at AlchemyAPI

+4
source share
2 answers

In fact, there are many freely available open source natural language processing programs. Here is a short list organized in what language the toolkit is implemented in:

If you don’t know who to go with, I would recommend starting with NLTK . The package is quite easy to use and has excellent online documentation, including a free book .

You should be able to use NLTK to easily perform the NLP tasks you listed, for example. Recognized Person Name (NER) , retrieving tags for documents, and categorizing a document .

What alchemy people call structured data mining , it looks like it's just HTML debugging that is robust against changes in basic HTML if the page still visually displays the same. Therefore, this is not an NLP task.

To extract text from HTML, simply use boilerpipe . It is fast, good and free.

+8
source

The Apache UIMA project was originally created by IBM and provides an NLP framework very similar to GATE. There are various annotators that are built for UIMA.

+1
source

Source: https://habr.com/ru/post/1307505/


All Articles