Named entity recognition from a personal Gazetter using Python

I am trying to perform entity name identification in python using NLTK. I want to extract a personal skills list. I have a list of skills and you want to find them in the requisition and tag the skills. I noticed that NLTK has an NER tag for standard tags like Person, Location, etc. Is there an external gazetter tagger in Python that I can use? any idea how to make it more complicated than searching for terms (sometimes a verbose term)?

Thanks Assaf

+4
source share
2 answers

I have not used NLTK recently, but if you have words that you know are skills, you don’t need to do NER- just a text search.

Maybe use Lucene or another search library to find the text and then annotate it? This is a lot of work, but if you are working with a lot of data, which may be in order. In addition, you can hack into a regular expression search, which will be slower, but it probably works fine for smaller amounts of data and will be much easier to implement.

+1
source

Take a look at RegexpTagger and finally RegexpParser , I think that is exactly what you are looking for.

You can create your own POS tags, i.e. match skills with a tag, and then easily determine the grammar.

Some sample code for tagger in this pdf .

+1
source

Source: https://habr.com/ru/post/1340878/


All Articles