SQL: select rows that have equal words

Suppose I have a table of strings, for example:

VAL ----------------- Content of values Values identity Triple combo my combo sub-zero combo 

I want to find lines that have equal words. The result set should be like

 VAL MATCHING_VAL ------------------ ------------------ Content of values Values identity Triple combo My combo Triple combo sub-zero combo 

or at least something like that. You can help?

+5
source share
3 answers

One way is to use a hack for regular expressions:

 select t1.val, t2.val from t t1 join t t2 on regexp_like(t1.val, replace(t2.val, ' ', '|'); 

You might also want the case to be identical:

  on regexp_like(lower(t1.val), replace(lower(t2.val), ' ', '|'); 
+7
source

You can use a combination of SUBSTRING and LIKE.

use charIndex ("") to break the words in the substring, if that is what you want to do.

+1
source

Using some internal [oracle internal] identity found in UTL_Match ( https://docs.oracle.com/database/121/ARPLS/u_match.htm#ARPLS71219 ) corresponding to ...

This logic is more suitable for matching names or descriptions that are β€œsimilar”, and where phonetic spelling or typos can cause the entries to not match.

By setting .5 below you can see how% brings you closer and closer to perfect matches.

 with cte as ( select 'Content of values' val from dual union all select 'Values identity' val from dual union all select 'triple combo' from dual union all select 'my combo'from dual union all select 'sub-zero combo'from dual) select a.*, b.*, utl_match.edit_distance_similarity(a.val, b.val) c, UTL_MATCH.JARO_WINKLER(a.val,b.val) JW from cte a cross join cte b where UTL_MATCH.JARO_WINKLER(a.val,b.val) > .5 order by utl_match.edit_distance_similarity(a.val, b.val) desc 

and a screenshot of the request / output.

Or we could use an inner join and> if we want only one compilation method ...

 select a.*, b.*, utl_match.edit_distance_similarity(a.val, b.val) c, UTL_MATCH.JARO_WINKLER(a.val,b.val) JW from cte a inner join cte b on A.Val > B.Val where utl_match.jaro_winkler(a.val,b.val) > .5 order by utl_match.edit_distance_similarity(a.val, b.val) desc 

this returns 3 required records.

But this does not explicitly check each any word matches. which was your basic requirement. I just wanted you to know about the alternatives.

enter image description here

+1
source

Source: https://habr.com/ru/post/1238316/


All Articles