There is more as_doc()solution using as_doc()for the object Span( https://spacy.io/api/span#as_doc ):
nlp = spacy.load('en_core_web_lg')
content = "This is my sentence. And here another one."
doc = nlp(content)
for i, sent in enumerate(doc.sents):
print(i, "a", sent, type(sent))
doc_sent = sent.as_doc()
print(i, "b", doc_sent, type(doc_sent))
Gives a conclusion:
0 a This is my sentence. <class 'spacy.tokens.span.Span'>
0 b This is my sentence. <class 'spacy.tokens.doc.Doc'>
1 a And here another one. <class 'spacy.tokens.span.Span'>
1 b And here another one. <class 'spacy.tokens.doc.Doc'>
(the code fragment was written out completely for clarity - it can be shortened in the future)
source
share