How can I get a sentence noun that is the object of a particular verb?

I work with pharmaceutical label data. The text is always structured using the phrase “indicated for”.

For instance:

sentence = "Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis" 

I have already used SpaCy to filter only offers containing the phrase “indicated for”.

Now I need a function that will accept the sentence, and return a phrase that is the "specified for" object. So, for this example, the function I called extract() will work as follows:

 extract(sentence) >> 'relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis' 

Is there a function for this using spacy?

EDIT: Simple splitting after “indicated for” will not work for complex examples.

Here are some examples:

'' 'buprenorphine and naloxone sublingual tablets are indicated for "strong" supportive treatment of opioid dependence and should be used as part of a complete treatment plan, including counseling and psychosocial support for buprenorphine and naloxone sublingual tablets containing buprenorphine - a partial opioid agonist and naloxone - opioid antagonist for the "supportive" treatment of opioid dependence ''

Ophthalmic solution ofloxacin ofloxacin

'' is indicated for the treatment of infections caused by susceptible strains of the following bacteria , under the conditions listed below, conjunctivitis of gram-positive bacteria, gram-negative bacteria, staphylococcus aureus staphylococcus epidermidis streptococcus pneumoniae enterobacter cloacae haemophilus influenzae proteus mirabilis pseudomonas aeruginosa aeruginosa aeruginosa pseudomonas aeruginosa serratia marcescens' ''

where I just need the bold parts.

+5
source share
4 answers
 # -*- coding: utf-8 -*- #!/usr/bin/env python from __future__ import unicode_literals import spacy nlp = spacy.load('en_core_web_sm') text = 'Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis.' doc = nlp(text) for word in doc: if word.dep_ in ('pobj'): subtree_span = doc[word.left_edge.i : word.right_edge.i + 1] print(subtree_span.text) 

Output:

 relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis the signs and symptoms of osteoarthritis and rheumatoid arthritis osteoarthritis and rheumatoid arthritis 

The reason for multiple output is due to multiple pobjs.

Edit 2:

 # -*- coding: utf-8 -*- #!/usr/bin/env python from __future__ import unicode_literals import spacy nlp = spacy.load('en_core_web_sm') para = '''Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis. Ofloxacin ophthalmic solution is indicated for the treatment of infections caused by susceptible strains of the following bacteria in the conditions listed below.''' doc = nlp(para) # To extract sentences based on key word indicated_for_sents = [sent for sent in doc.sents if 'indicated for' in sent.string] print indicated_for_sents print # To extract objects of verbs for word in doc: if word.dep_ in ('pobj'): subtree_span = doc[word.left_edge.i : word.right_edge.i + 1] print(subtree_span.text) 

output:

 [Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis. , Ofloxacin ophthalmic solution is indicated for the treatment of infections caused by susceptible strains of the following bacteria in the conditions listed below.] relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis the signs and symptoms of osteoarthritis and rheumatoid arthritis osteoarthritis and rheumatoid arthritis the treatment of infections caused by susceptible strains of the following bacteria in the conditions listed below infections caused by susceptible strains of the following bacteria in the conditions listed below susceptible strains of the following bacteria in the conditions listed below the following bacteria in the conditions listed below the conditions listed below 

check this link

https://github.com/NSchrading/intro-spacy-nlp/blob/master/subject_object_extraction.py

+3
source

You need to use the Spacy dependency analysis function. The selected sentence containing ("specified for") must be dependent, analyzed in Spacy, to show the relationship between all words. You can see the dependency analysis visualization for an example sentence in your question using Spacy here .

After Spacy returns the syntax analysis, you need to find the “specified” token as a verb and find the children of the dependency tree. See an example here . In your case, you will look so that they match the “specified” as a verb, and instead of “xcomp” or “ccomp” in the Github example, children are obtained.

0
source

You do not need a spa. You can do regex or just split:

 sentence = "Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis" sentence.split('indicated for ')[1] >>> relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis 

This is based on assumptions about the string, for example, that the “specified for” is displayed exactly once, all after you need it, etc.

Grammar note: what you are looking for is actually an indirect object, not a subject. The subject is a meloxicam pill.

-1
source

Try looking at nouns with a savior and https://spacy.io/usage/linguistic-features#noun-chunks . I am not an expert at SpaCy, but that should help.

-1
source

Source: https://habr.com/ru/post/1276139/


All Articles