I am trying to import sentences from Shakespeare NLTK corpus - after this help site - but it’s hard for me to access sentences (for learning the word2vec model):
from nltk.corpus import shakespeare
shakespeare.fileids()
['a_and_c.xml', 'dream.xml', 'hamlet.xml', 'j_caesar.xml', ...]
play = shakespeare.xml('dream.xml')
print(play)
<Element 'PLAY' at ...>
for i in range(9):
print('%s: %s' % (play[i].tag, play[i].text))
Returns the following:
TITLE: A Midsummer Night Dream
PERSONAE:
SCNDESCR: SCENE Athens, and a wood near it.
PLAYSUBT: A MIDSUMMER NIGHT DREAM
ACT: None
ACT: None
ACT: None
ACT: None
ACT: None
Why are all the actions missing?
None of the methods defined here ( http://www.nltk.org/howto/corpus.html#data-access-methods ) (.sents (), tagged_sents (), chunked_sents (), parsed_sents ()) seems to be work when applied to shakespeare XMLCorpusReader
I would like to understand:
1 / how to get offers
2 / how to find out how to look for them in an ElementTree object
source
share