Doc2vec - How to display document vectors faster?

I prepared paragraph-paragraphs for about 2,300 paragraphs (between 2,000-12,000 words each), each with a vector size of 300. Now I need to print the paragraph vectors of about 100,000 sentences, which I considered as paragraphs (each sentence around 10-30 words, each of which corresponds to 2300 paragraphs already prepared).

So i use

model.infer_vector(sentence)

But the problem is that it takes too much time and it does not accept any arguments, such as " workers".! Is there a way to speed up the process by streaming or some other way? I use a machine with 8 GB of RAM, and when I checked the available kernels with

cores = multiprocessing.cpu_count()

it will be equal to 8.

I need this to answer a few selection questions. Also, are there other libraries / models, such as doc2vec, that can help with this task?

Thanks in advance for your time.

+4
source share
1 answer

You can get slight acceleration when calling infer_vector()from multiple threads, in different subsets of the new data that you need to output vectors to. There will still be quite a few thread conflicts, which will prevent the full use of all cores due to the Python Global Interpreter Lock ("GIL").

, , , 8 1/8 . .

infer_vector() gensim - , .

+1

Source: https://habr.com/ru/post/1655115/


All Articles