Spacy Pipeline?

So lately I have been playing with WikiDump. I pre-processed it and trained it in Word2Vec + Gensim

Does anyone know if there is only one script in Spacy that will generate tokenization, sentence recognition, part of speech labels, lemmatization, dependency analysis and object name recognition at the same time

I could not find clear documentation Thank you

+6
source share
3 answers

Spacy gives you all this just by using en_nlp = spacy.load('en'); doc=en_nlp(sentence). The documentation gives you detailed information on how to access each of the elements.

The following is an example:

In [1]: import spacy
   ...: en_nlp = spacy.load('en')

In [2]: en_doc = en_nlp(u'Hello, world. Here are two sentences.')

Offers can be obtained using doc.sents:

In [4]: list(en_doc.sents)
Out[4]: [Hello, world., Here are two sentences.]

- doc.noun_chunks:

In [6]: list(en_doc.noun_chunks)
Out[6]: [two sentences]

doc.ents:

In [11]: [(ent, ent.label_) for ent in en_doc.ents]
Out[11]: [(two, u'CARDINAL')]

Tokenization: , . token.orth_ str .

In [12]: [tok.orth_ for tok in en_doc]
Out[12]: [u'Hello', u',', u'world', u'.', u'Here', u'are', u'two', u'sentences', u'.']

POS token.tag_:

In [13]: [tok.tag_ for tok in en_doc]
Out[13]: [u'UH', u',', u'NN', u'.', u'RB', u'VBP', u'CD', u'NNS', u'.']

:

In [15]: [tok.lemma_ for tok in en_doc]
Out[15]: [u'hello', u',', u'world', u'.', u'here', u'be', u'two', u'sentence', u'.']

. token.dep_ token.rights token.lefts. :

In [19]: for token in en_doc:
    ...:     print(token.orth_, token.dep_, token.head.orth_, [t.orth_ for t in token.lefts], [t.orth_ for t in token.rights])
    ...:     
(u'Hello', u'ROOT', u'Hello', [], [u',', u'world', u'.'])
(u',', u'punct', u'Hello', [], [])
(u'world', u'npadvmod', u'Hello', [], [])
...

. .

+12

spacy spacy-alpha V2.0.0.

0

Source: https://habr.com/ru/post/1651461/


All Articles