The following solutions need a table containing serial numbers from 1 to (at least) the length of your word_column . Assuming word_column VARCHAR(190) you need a table with numbers from 1 to 190. If you use MariaDB with a sequence plugin, you can use the table seq_1_to_190 . If you do not have it, there are many ways to create it. One easy way is to use the information_schema.columns table:
create table if not exists seq_1_to_190 (seq tinyint unsigned auto_increment primary key) select null as seq from information_schema.columns limit 190;
You can also create it on the fly in a subquery, but that will complicate your queries.
I will use the @word session @word to store the search string.
set @word = 'StackExch_bla_bla_bla';
But you can replace all its occurrences with a constant search string.
Now we can use the sequence table to create all the substrings using
select seq as l, left(@word, seq) as substr from seq_1_to_190 s where s.seq <= char_length(@word)
http://rextester.com/BWU18001
and use it for the LIKE condition when you join it to the words table:
select w.word_column from ( select seq as l, left(@word, seq) as substr from seq_1_to_190 s where s.seq <= char_length(@word) ) s join words w on w.word_column like concat(replace(s.substr, '_', '\_'), '%') order by sl desc limit 1
http://rextester.com/STQP82942
Note that _ is a placeholder, and you need to escape it in the search bar with \_ . You should also do this for % if your line can contain it, but I will skip this part in my answer.
A request can also be written without a subquery:
select w.word_column from seq_1_to_190 s join words w on w.word_column like concat(replace(left(@word, seq), '_', '\_'), '%') where s.seq <= char_length(@word) order by s.seq desc limit 1
http://rextester.com/QVZI59071
These queries do the job, and in theory they should also be fast. But MySQL (in my case, MariaDB 10.0.19) creates a poor execution plan and does not use the index for the ORDER BY . Both queries are executed after approximately 1.8 seconds with a data set of 100 thousand lines.
The best I could do to improve performance with a single query is
select ( select word_column from words w where w.word_column like concat(replace(left(@word, s.seq), '_', '\_'), '%') limit 1 ) as word_column from seq_1_to_190 s where s.seq <= char_length(@word) having word_column is not null order by s.seq desc limit 1
http://rextester.com/APZHA8471
It's faster, but it still takes 670 ms. Note that the CORD request for Gordons is completed in 125 ms, although this requires a full table / index scan and file server.
However, I managed to get the engine to use the index for the ORDER BY with an indexed temporary table:
drop temporary table if exists tmp; create temporary table tmp( id tinyint unsigned auto_increment primary key, pattern varchar(190) ) engine=memory select null as id, left(@word, seq) as pattern from seq_1_to_190 s where s.seq <= char_length(@word) order by s.seq desc; select w.word_column from tmp force index for order by (primary) join words w on w.word_column >= tmp.pattern and w.word_column < concat(tmp.pattern, char(127)) order by tmp.id asc limit 1
http://rextester.com/OOE82089
This query is "instantaneous" (less than 1 ms) in my test pattern of rows of 100 thousand rows. If I delete FORCE INDEX or use the LIKE clause, it will be slow again.
Note that char(127) seems to work for ASCII strings. You may need to find another character according to your character set.
After all this, I have to say that my first thought was to use the UNION ALL query that Gordon Linoff also suggested. However, here is just the SQL solution:
set @subquery = '( select word_column from words where word_column like {pattern} limit 1 )'; set session group_concat_max_len = 1000000; set @sql = ( select group_concat( replace( @subquery, '{pattern}', replace(quote(concat(left(@word, seq), '%')), '_', '\_') ) order by s.seq desc separator ' union all ' ) from seq_1_to_190 s where s.seq <= char_length(@word) ); set @sql = concat(@sql, ' limit 1'); prepare stmt from @sql; execute stmt;
http://rextester.com/OPTJ37873
He is also "instant."
If you like stored procedures / functions, here is the function:
create function get_with_similar_begin(search_str text) returns text begin declare l integer; declare res text; declare pattern text; set l = char_length(search_str); while l > 0 and res is null do set pattern = left(search_str, l); set pattern = replace(pattern, '_', '\_'); set pattern = replace(pattern, '%', '\%'); set pattern = concat(pattern, '%'); set res = (select word_column from words where word_column like pattern); set l = l - 1; end while; return res; end
Use it like
select get_with_similar_begin('StackExch_bla_bla_bla'); select get_with_similar_begin('StackO_bla_bla_bla');
http://rextester.com/CJTU4629
This is most likely the fastest way. Although for long lines, some sort of split and conquer algorinthm can reduce the average number of searches. But it can also be just redundant.
If you want to test your queries in a large table, I used the following code to create my test table (for MariaDB with a sequence plugin):
drop table if exists words; create table words( id mediumint auto_increment primary key, word_column varchar(190), index(word_column) ); insert into words(word_column) select concat('Stack', rand(1)) as word_column from seq_1_to_100000; insert into words(word_column)values('StackOferflow'),('StackExchange'),('MetaStackExchange');