How to remove verbs, prepositions, conjunctions, etc. From my text?

Mostly in my text, I just want to keep nouns and delete other parts of speech.

I do not think there is an automated way for this. If there is, please suggest.

If there is no automated way, I can also do it manually, but for this I will need lists of all possible words, verbs or prepositions or conjunctions or adjectives, etc. Can someone suggest a possible source where I can get these specific listings.

+4
source share
2 answers

NLTK, , . NLTK, NLTK:

>>> import nltk
>>> sentence = """At eight o'clock on Thursday morning
... Arthur didn't feel very good."""
>>> tokens = nltk.word_tokenize(sentence)
>>> tokens
['At', 'eight', "o'clock", 'on', 'Thursday', 'morning',
'Arthur', 'did', "n't", 'feel', 'very', 'good', '.']
>>> tagged = nltk.pos_tag(tokens)
>>> tagged[0:6]
[('At', 'IN'), ('eight', 'CD'), ("o'clock", 'JJ'), ('on', 'IN'),
('Thursday', 'NNP'), ('morning', 'NN')]

tagged, , N, .. , . ; , , (FW).

NLTK , , . .. .

+14

.

.

https://dumps.wikimedia.org/enwiktionary/20140609/

. . , - .

python:

import xml.etree.ElementTree as ET
wiktionary = file('/path/to/wiktionary.xml')
tree = ET.iterparse(wiktionary.xml)
for event, elem in tree:
    if elem.tag == your_target_tag:
        do magic

.

, , , , NLP. , !

+1

Source: https://habr.com/ru/post/1545975/


All Articles