Well, you need to somehow rethink and reverse engineer the way you store your data, or, in other words, implement the “orthodox” version of your “inverted index”.
Your bottleneck is the on-the-fly calculation of document frequency (DF) for conditions. It would be a reasonable idea that this be dynamic, so every time you update your corpus (collection of documents), do some processing and update DF for each term in the document (and, of course, keep the results in constant mode, aka the database and etc.).
The only structure you need is a nested dictionary like this
{ "term1" : { "DF" : x, "some_doc_id" : tf , "some_other_doc_id" : tf, etc } , "term2" : ... etc.. }
Correctly updated every time you "feed" your body.
And, of course, keep your case somewhere ...
As a hobby and part of my job, I implement a small python search engine - redis. You can get other ideas. Take a look here .
source share