How to speed up the loading time of the Gensim Word2vec model?

I am creating a chatbot, so I need to vectorize user input using Word2Vec.

I am using a pre-prepared model with 3 million words from Google (GoogleNews-vectors-negative300).

So, I load the model using Gensim:

import gensim
model = gensim.models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)

The problem is that loading the model takes about 2 minutes. I can not allow the user to wait so long.

So what can I do to speed up boot time?

I was thinking about putting each of the 3 million words and their corresponding vector into the MongoDB database. This will probably speed everything up, but my intuition will tell me that this is not a good idea.

+15
source share
4

gensim , , limit load_word2vec_format(). (, GoogleNews , N N-, . limit=500000, 500 000 - - 5/6- / .)

. -, IO .

, , .

, word2vec.c-origin, , gensim native save(). , ( GoogleNews ), . , gensim native [load(filename, mmap='r')][1].

- , , , , , . !

, , most_similar(), , . , ( , ), ( , N- ). , , - IO.

, , , - . - ( -). , , .

word2vec.c load_word2vec_format(). model.init_sims(replace=True), , ( ).

: model.save('GoogleNews-vectors-gensim-normed.bin``. ( , , , .)

Python, , . , , ( ), , . , KeyedVectors , . ( , .)

:

from gensim.models import KeyedVectors
from threading import Semaphore
model = KeyedVectors.load('GoogleNews-vectors-gensim-normed.bin', mmap='r')
model.syn0norm = model.syn0  # prevent recalc of normed vectors
model.most_similar('stuff')  # any word will do: just to page all in
Semaphore(0).acquire()  # just hang until process killed

, , / -. , . , / , . .

, - :

model = KeyedVectors.load('GoogleNews-vectors-gensim-normed.bin', mmap='r')
model.syn0norm = model.syn0  # prevent recalc of normed vectors
# … plus whatever else you wanted to do with the model

. ( , , X , , X , .).

, web-reqeust load() , . , , - /, - .

, , . , , , , . ( : .)

, , / "" , -, , .

+40
+2

, google. , , - . , . , , , , , , 50k , Gensim . 2 .

, 50 100k , WMT, .

- Gensim. FIFO script Gensim. script "", , "" , .

, -, . word2vec API. , "" , :

curl http://127.0.0.1:5000/word2vec/model?word=restaurant
+1

:

model = Word2Vec.load_word2vec_format('wikipedia-pubmed-and-PMC-w2v.bin',binary=True)
model.init_sims(replace=True)
model.save('bio_word')

Word2Vec.load('bio_word',mmap='r')

: https://groups.google.com/forum/#!topic/gensim/OvWlxJOAsCo

0

Source: https://habr.com/ru/post/1673057/


All Articles