Gensim Getting Started Error: There is no such file or directory: 'text8'

I am studying the word2vec and GloVe model in python, so I am looking at this one available here .

After I compiled these codes step by step in Idle3:

>>>from gensim.models import word2vec >>>import logging >>>logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO) >>>sentences = word2vec.Text8Corpus('text8') >>>model = word2vec.Word2Vec(sentences, size=200) 

I get this error:

 2017-01-13 11:15:41,471 : INFO : collecting all words and their counts Traceback (most recent call last): File "<pyshell#4>", line 1, in <module> model = word2vec.Word2Vec(sentences, size=200) File "/usr/local/lib/python3.5/dist-packages/gensim/models/word2vec.py", line 469, in __init__ self.build_vocab(sentences, trim_rule=trim_rule) File "/usr/local/lib/python3.5/dist-packages/gensim/models/word2vec.py", line 533, in build_vocab self.scan_vocab(sentences, progress_per=progress_per, trim_rule=trim_rule) # initial survey File "/usr/local/lib/python3.5/dist-packages/gensim/models/word2vec.py", line 545, in scan_vocab for sentence_no, sentence in enumerate(sentences): File "/usr/local/lib/python3.5/dist-packages/gensim/models/word2vec.py", line 1536, in __iter__ with utils.smart_open(self.fname) as fin: File "/usr/local/lib/python3.5/dist-packages/smart_open-1.3.5-py3.5.egg/smart_open/smart_open_lib.py", line 127, in smart_open return file_smart_open(parsed_uri.uri_path, mode) File "/usr/local/lib/python3.5/dist-packages/smart_open-1.3.5-py3.5.egg/smart_open/smart_open_lib.py", line 558, in file_smart_open return open(fname, mode) FileNotFoundError: [Errno 2] No such file or directory: 'text8' 

How to fix this? Thanks in advance for your help.

+5
source share
1 answer

It seems you are missing the file that is used here. In particular, it tries to open text8 and cannot find it (hence FileNotFoundError ).

You can download the file directly from here , as indicated in the documentation for Text8Corpus :

 Docstring: Iterate over sentences from the "text8" corpus, unzipped from http://mattmahoney.net/dc/text8.zip . 

and make it available. Extract it, and then set Text8Corpus as an argument:

 sentences = word2vec.Text8Corpus('/path/to/text8') 
+5
source

Source: https://habr.com/ru/post/1262724/


All Articles