Extracting nouns, noun phrases, adjective verbs from corpus text file using visual C #

I am doing a project in which I have to extract nouns, adjective nouns and verbs from text files (.doc). I have about 75 of these files. I got network access to find out about this, and I came across putting POS in python using nltk. since my project is in C # (using visual studio 2008), I need code for this. I tried the wordnet api for the same and even sharpnlp, but since I'm new, I found it hard to integrate with my project. can anyone suggest me a simpler code to do this using something like vocabulary, etc. Plz help me guys. Thanx.

+3
source share
3 answers

I worked at NLP (Natural Language Processing) for an industry leader for a while, and what you want to do is not a trivial task. I know one of the creators nltk, and I myself used it; it's a high quality open source tool, and I would recommend using it (do you have a particularly good reason to use C #?)

POS marking is usually implemented by training a language model using manual annotated data, then applying this model to a new text, predicting parts of speech and providing confidence. nltkhas tools that do this, and they also have some models (if I'm not mistaken).

, ++, Java Python. , -!

. Wikipedia, , , .

+3

, . , OpenNLP .NET- PoS. , , . , OpenNLP Tools Models 1.5 .

, , , , , .

.NET OpenNLP

+2

.

SharpNLP # Visual Studio Project

In this article, I gave a step-by-step way to integrate SharpNLP with a C # project and gave examples of code snippets for a specific solution to your problem, such as Sentence Splitting, tokenizing, and POSTagging.

Try this and I can help you with the problems you are facing.

0
source

Source: https://habr.com/ru/post/1774511/


All Articles