Is there a way to tell NLTK that a particular word is not a proper noun, but a noun?

I do some NLP where I find out when patients were diagnosed with multiple sclerosis.

I would like to use nltk to tell me that the noun of the sentence is multiple sclerosis. The problem is that doctors often refer to multiple sclerosis as MS, which nltk picks up as a proper noun.

For example, this sentence: "His MS was diagnosed in 1999." Marked as: [('His', 'PRP$'), ('MS', 'NNP'), ('was', 'VBD'), ('diagnosed', 'VBN'), ('in', 'IN'), ('1999', 'CD'), ('.', '.')]

MS should be a noun here. Any suggestions?

+5
source share
1 answer

To summarize, you have the following options:

  • Fixing a tag in post processing is a little ugly, but quick and easy.
  • Use the external Entity Recognizer feature (Stanford NER as @Bob Dylan thoughtfully suggested) - this is more due to the fact that Stanford NER is in java and not particularly fast.
  • Reinstall POS Tagger for domain specific data (do you have a large enough annotated dataset to use for this?)
  • Using the WSD (Word Sense Disamiguation) method - first you need to use a good dictionary for the domain.
0
source

Source: https://habr.com/ru/post/1241080/


All Articles