Inverted Search: Document Phrases

Question

Inverted Search: Document Phrases

I have a database full of phrases (80-100 characters) and several long documents (50-100 KB), and I need a ranked list of phrases for this document; instead of the usual search engine output, a list of documents for a given phrase.

I used MYSQL full-text indexing before and looked in lucene but never used it. Both of them seem oriented toward comparing short (search term) with long (document).

How would you get the opposite from this?

+3

mysql search indexing full-text-search lucene

Tourch Dec 31 '09 at 17:37

source share

4 answers

, ?

, , ( |) . , . .

0

Claudiu 31 . '09 17:42

? , .

:

. . , , , , . 5 , 5 , . , , - (, "XX" ), .
, ( ) , , , .
.
, .
, . , .

0

Larry Watanabe 31 . '09 17:56

, . , , itsadok.

0

Yuval F 03 . '09 10:49

itsadok · Accepted Answer · 2009-12-31T18:43:39+0000

- ~ 50 . , , , .

, , .

. , . , , . .

, 1,2,.., n , . , .

, , , .

, whet, , :

            HashSet<Long> foundHashes = new HashSet<Long>();

            LinkedList<String> words = new LinkedList<String>();
            for(int i=0; i<params.maxPhrase; i++) words.addLast("");

            StandardTokenizer st = new StandardTokenizer(new StringReader(docText));
            Token t = new Token();
            while(st.next(t) != null) {
                String token = new String(t.termBuffer(), 0, t.termLength());
                words.addLast(token);
                words.removeFirst();

                for(int len=params.minPhrase; len<params.maxPhrase; len++) {
                    String term = Utils.join(new ArrayList<String>(words.subList(params.maxPhrase-len,params.maxPhrase)), " ");

                    long hash = Utils.longHash(term);

                    if(params.lexicon.isTermHash(hash)) {
                        foundHashes.add(hash);
                    }
                }
            }

            for(long hash : foundHashes) {
                if(count.containsKey(hash)) {
                    count.put(hash, count.get(hash) + 1);
                } else {
                    count.put(hash, 1);
                }
            }

Inverted Search: Document Phrases

More articles: