I prepared paragraph-paragraphs for about 2,300 paragraphs (between 2,000-12,000 words each), each with a vector size of 300. Now I need to print the paragraph vectors of about 100,000 sentences, which I considered as paragraphs (each sentence around 10-30 words, each of which corresponds to 2300 paragraphs already prepared).
So i use
model.infer_vector(sentence)
But the problem is that it takes too much time and it does not accept any arguments, such as " workers".! Is there a way to speed up the process by streaming or some other way? I use a machine with 8 GB of RAM, and when I checked the available kernels with
cores = multiprocessing.cpu_count()
it will be equal to 8.
I need this to answer a few selection questions. Also, are there other libraries / models, such as doc2vec, that can help with this task?
Thanks in advance for your time.
Tarun source
share