Search for the longest dictionary at the end of a line

I am looking for the best way to find a string of alphabetic characters for a word with the longest dictionary at the end of the string.

Example: For the qbehugejackhammer line qbehugejackhammer result should be jackhammer instead of hammer .

One way to do this somewhat efficiently would be to save the words in reverse in an indexed table and repeat them one letter at a time until it no longer matches anything:

 SELECT word FROM dictionary WHERE word LIKE 'remmahkca%'; SELECT word FROM dictionary WHERE word LIKE 'remmahkcaj%'; # last match SELECT word FROM dictionary WHERE word LIKE 'remmahkcaje%'; 

It looks and looks like a hack and, most likely, not an optimal solution. Is there a faster and / or better way to do this? My selection tools are PHP and MySQL, but if some other language or DBMS is better than me, I’m all ears.

+4
source share
4 answers

Quick hacker answer: load the dictionary into map or regardless of the data structure equivalent to php (English dictionary - ~ 50 thousand words in total, easily fits into RAM v, and the map is much faster, faster than calling the database). Then iterate forward 1 character at a time, checking each substring on the map until you find a match.

Depending on how long your lines are, you can optimize by first checking the longest word in the dictionary (you can get this while loading the dictionary) and running the appropriate distance. I am sure that there are other similar optimizations that can use a too long start character, etc.

Edit: "card" must be "installed."

+3
source

This may seem a bit evil, but you will probably get better performance by loading the dictionary into an array in the form of a dictionary tree, but in the reverse order of words, for example:

 array( 'r' => array( 'u' => array(), // -- words ending in 'ur' would end up in here 'a' => array(), // -- words ending in 'ar' would end up here 'e' => array( // -- words ending in 'er' would end up in here 'm' => array( 'm' => array( // -- jackhammer will be kept further up here 

Then, looking up.

 $reverseWord = ""; // -- Incoming 'word' string goes here, in reverse. $dictionary = [structure above]; $dictionaryPosition = $dictionary; $dictionaryHistory = ""; for( $i = 0, $l = strlen($reverseWord); $i < $l; $i++ ) { $char = $reverseWord[$i]; // -- If this character doesn't exist in this dictionary position, we've reached the end if( !isset($dictionaryPosition[$char]) ) break; // -- log this character $dictionaryHistory = $char . $dictionaryHistory; // -- Climb up the tree $dictionaryPosition = $dictionaryPosition[$char]; } // -- $dictionaryHistory now contains the word you're looking for. 

Each array should contain no more than 26 entries (taking into account only alphabetic characters), so you see, at best, 26 * n queries of one character each. Even with a word depth of 20 characters, it is infinitely better than repeating several words with a length of 50 thousand words.

+4
source

You can start by searching for a word that matches the entire line and save deleting the letter at the beginning of the line until you find a match:

 SELECT word FROM dictionary WHERE word = 'qbehugejackhammer'; --no match SELECT word FROM dictionary WHERE word = 'behugejackhammer'; --no match SELECT word FROM dictionary WHERE word = 'ehugejackhammer'; --no match SELECT word FROM dictionary WHERE word = 'hugejackhammer'; --no match --... SELECT word FROM dictionary WHERE word = 'jackhammer'; --found it! 
+4
source

Load the dictionary into a PHP array. For each input word, use in_array ( reference ) on successively smaller substrings, as described below, until you find a match.

For example, consider your entry qbehugejackhammer . First find an array for qbehugejackhammer , then for behugejackhammer , then for ehugejackhammer and so on until you find a match. You can stop as soon as you find the first match.

+2
source

Source: https://habr.com/ru/post/1344957/


All Articles