Unsupervised Named Entity Recognition (NER) with a user-controlled dictionary for cross-reference sentences in Java

I am looking for a Java library that can perform name recognition in a name (NER) using a custom dictionary without requiring marked training data first. I searched for some of the SEs, but most of the questions are pretty non-specific.

Consider the following use case:

  • the editor introduces articles into the CMS (about 500 words).
  • the text may contain links (in plain text) to objects of a particular domain. eg:
    • names of points of interest, such as bars, restaurants, as well as neighborhoods, etc.
  • there is a controlled vocabulary of these entities (about 5,000 entities).
    • I assume that the entity should be β€œtop” in the dictionary
  • after the end of the text, the user should be able to save the document.
  • This leads to the fact that the workflow scans a fragment of the text in vocabulary, comparing it with the name of the object. It does not require a 100% match: 97% on Jarao-winkler or something else (I am not familiar with using the NER algorithm), I need this to be customizable.
  • Hits are returned to the controller server. This returns JSON to the client containing the entities that are presented as suggested cross-references to the editor.

Ideally, I am looking for a project that uses NRE to offer cross-linking in a CMS environment for contrailing. (I’m sure that, for example, there are plugins for wordpress), I’m not so sure that something like this exists in Java.

All other more general pointers to NRE libraries that work with controlled user dictionaries are also welcome.

+6
source share
2 answers

For people who watch this in the future:

"Approximate vocabulary-based terminology" see http://alias-i.com/lingpipe/demos/tutorial/ne/read-me.html

(URL edited.)

+3
source

Not sure if they can be useful: http://www-nlp.stanford.edu/software/CRF-NER.shtml http://cogcomp.cs.illinois.edu/page/software

+1
source

Source: https://habr.com/ru/post/898678/


All Articles