Python ntlk donwload gives parsing

I am trying to run the following command

import nltk
nltk.download('all')

But I get this error

Traceback (most recent call last):
  File "./update.py", line 3, in <module>
    nltk.download('all')
  File "/usr/lib/python3.6/site-packages/nltk/downloader.py", line 664, in download
    for msg in self.incr_download(info_or_id, download_dir, force):
  File "/usr/lib/python3.6/site-packages/nltk/downloader.py", line 534, in incr_download
    try: info = self._info_or_id(info_or_id)
  File "/usr/lib/python3.6/site-packages/nltk/downloader.py", line 508, in _info_or_id
    return self.info(info_or_id)
  File "/usr/lib/python3.6/site-packages/nltk/downloader.py", line 875, in info
    self._update_index()
  File "/usr/lib/python3.6/site-packages/nltk/downloader.py", line 825, in _update_index
    ElementTree.parse(compat.urlopen(self._url)).getroot())
  File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 1196, in parse
    tree.parse(source, parser)
  File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 597, in parse
    self._root = parser._parse_whole(source)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 23, column 143

I'm new to python, so I'm not quite sure what to do. I looked at the source module that was reported above and noticed that it was trying to load an XML file. So I ran the command below and did not give me any errors.

compat.urlopen('https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml')

Therefore, I assume that there are no problems in loading, but in the parser. Can anyone suggest how I get out of here?

+6
source share
2 answers

index.xmlthere was a typo. It has already been fixed. Just tested and nltk.download('all')works great!

see: nltk / nltk_data # 70

+6
source

The problem is with the XML returned by the NLTK.

xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 23, column 143

23: 143 , "=":

... unzip="1" unzipped_size"1917" url="https...

NTLK , , .

+1

Source: https://habr.com/ru/post/1016549/


All Articles