Python NLTK Shakespeare Corpus

Question

Python NLTK Shakespeare Corpus

I am trying to import sentences from Shakespeare NLTK corpus - after this help site - but it’s hard for me to access sentences (for learning the word2vec model):

from nltk.corpus import shakespeare #XMLCorpusreader
shakespeare.fileids()
['a_and_c.xml', 'dream.xml', 'hamlet.xml', 'j_caesar.xml', ...]

play = shakespeare.xml('dream.xml') #ElementTree object
print(play)
<Element 'PLAY' at ...>

for i in range(9):
    print('%s: %s' % (play[i].tag, play[i].text))

Returns the following:

TITLE: A Midsummer Night Dream
PERSONAE: 

SCNDESCR: SCENE  Athens, and a wood near it.
PLAYSUBT: A MIDSUMMER NIGHT DREAM
ACT: None
ACT: None
ACT: None
ACT: None
ACT: None

Why are all the actions missing?

None of the methods defined here ( http://www.nltk.org/howto/corpus.html#data-access-methods ) (.sents (), tagged_sents (), chunked_sents (), parsed_sents ()) seems to be work when applied to shakespeare XMLCorpusReader

I would like to understand:
1 / how to get offers

2 / how to find out how to look for them in an ElementTree object

+4

python nlp nltk

Romain g May 01 '17 at 14:55

source share

1 answer

David Michael Gang · Accepted Answer · 2017-05-01T15:12:30+0000

, . Python - ,

:

for p in play:
    print('%s: %s' % (p.tag, list(p.itertext())))

,

Python NLTK Shakespeare Corpus

More articles: