Inverted Index Search Algorithm

Think about whether there are 10 million words that people searched on google. appropriate for each word you have a sorted list of all document identifiers. The list is as follows:

[Word 1]->[doc_i1,doc_j1,.....]
[Word 2]->[doc_i2,doc_j2,.....]
...
...
...
[Word N]->[doc_in,doc_jn,.....]

I am looking for an algorithm to search for 100 rare pairs of words. A rare pair of words is a pair of words that occur together (not necessarily adjacent) in exactly 1 document.

I am looking for something better than O (n ^ 2), if possible.

+4
source share
1 answer
  • , . , , , . , , , .
  • , . , , . . , , , , , .
  • , , (1.). - , , , . , , ​​ .

, , 100 , , . , , (1.), , , , . O (N * log (N1)), N1 - , , 100 . , , , .

+2

Source: https://habr.com/ru/post/1525458/


All Articles