Withdrawal using non-English

I am looking for an open source project that performs selection in several languages.

I already found Yahoo BOSS Term Extraction Web Service , and that’s good. However, he did not process languages ​​other than English.

Do you know any open source term highlighting project that supports more languages?

Thanks!

+4
source share
3 answers

Of the packages that I used in production or just played with them, were the most comprehensive and most actively supported:

  • GATE is a computer architecture for a wide range of natural language processing tasks available under the GNU Public License

  • Ling-Pipe (Java) - a set of Java libraries for linguistic analysis of the human language, which can associate creature records with database records, relationship discovery, cluster documents, ...

  • OpenNLP (Java) is a Java machine learning toolkit for natural language processing (NLP). It supports the most common NLP tasks.

  • NLTK (Python) - NLTK is a leading platform for creating Python programs for working with human language data.

  • Proxem Antelope (. Net) - Enhanced Natural Language Object-Oriented Processing Environement

  • Scala -NLP (Scala)

  • Stanford NLP (Java)

In addition, there are some good web APIs such as:

+2
source

GATE - General architecture for text engineering: http://gate.ac.uk/

Will perform term extraction, sorting and keyword selection, mood analysis, all that is good.

Open-source, free, from the UK. There are a number of languages, including Arabic.

0
source

You can try Linnaeus - it is partly designed to extract species names from scientific articles, but I think you can give it your own dictionaries and use for other domains / tasks.

0
source

Source: https://habr.com/ru/post/1333775/


All Articles