NLP - retrieving information in Python (spaCy)

Question

NLP - retrieving information in Python (spaCy)

I am trying to extract this type of information from the following paragraph structure:

 women_ran men_ran kids_ran walked
         1       2        1      3
         2       4        3      1
         3       6        5      2

text = ["On Tuesday, one women ran on the street while 2 men ran and 1 child ran on the sidewalk. Also, there were 3 people walking.", "One person was walking yesterday, but there were 2 women running as well as 4 men and 3 kids running.", "The other day, there were three women running and also 6 men and 5 kids running on the sidewalk. Also, there were 2 people walking in the park."]

I use Python spaCyas my NLP library. I am new to NLP and I hope for some guidance on what would be the best way to extract this tabular information from such suggestions.

If I could just determine if there were people or walking, I would just use a sklearnclassification model to fit, but the information I need to extract is clearly more detailed than that (I try to get subcategories and values for each). Any guidance would be greatly appreciated.

+4

python nlp information-extraction spacy

kathystehl Nov 06 '16 at 19:21

source share

1

syllogism_ · Accepted Answer · 2016-11-06T19:44:55+0000

. , .

, - , XPath, DOM ..

- :

nlp = spacy.load('en')
docs = [nlp(t) for t in text]
for i, doc in enumerate(docs):
    for j, sent in enumerate(doc.sents):
        subjects = [w for w in sent if w.dep_ == 'nsubj']
        for subject in subjects:
            numbers = [w for w in subject.lefts if w.dep_ == 'nummod']
            if len(numbers) == 1:
                print('document.sentence: {}.{}, subject: {}, action: {}, numbers: {}'.format(i, j, subject.text, subject.head.text, numbers[0].text))

text :

document.sentence: 0.0, subject: men, action: ran, numbers: 2
document.sentence: 0.0, subject: child, action: ran, numbers: 1
document.sentence: 0.1, subject: people, action: walking, numbers: 3
document.sentence: 1.0, subject: person, action: walking, numbers: One

NLP - retrieving information in Python (spaCy)

More articles: