MySQL based row query

This query has bothered me for the past 10 hours. Here we go:

I want to make a comparison with some data that I pull. I pull out the names and I want to remove similar names and not return them in the query.

Example:

I have the following names:

  • Seaside heights
  • Seaside HGTS
  • Talladega
  • Torncal Center
  • Tornkal ctr
  • Yonkers
  • Zebraville

I want it to return like this:

  • Seaside heights
  • Talladega
  • Torncal Center
  • Yonkers
  • Zebraville

Basically, I think it should be a substring (name, 0, 8) to get the first 8 characters, then run this 8 characters against the next entry, and if they match, to ignore it.

Perhaps I am thinking about understanding this. Any insights or concepts that may work will be appreciated.

-1
source share
4 answers

First you request all the data.

Then, for each returned record, you want to run the LCS algorithm (the longest common subsequence).

If the longest common subsequence between two different records has a number of your choice, you can classify them as similar.

http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

edit: It’s just that it turns out a good PHP function for this: http://php.net/manual/en/function.similar-text.php

+1
source

Try the following:

If the difference between the lines is similar to the difference in the example.

select names from tablename group by substring_index(names," ",1) 
+1
source

Perhaps you should take a look at soundex . It will not be perfect, but it can lead you to a park park.

0
source

If the differences between the lines are limited to a small set of abbreviations (HGTS ↔ Heights, CTR ↔ Center, etc.), you can simply save the table of this data and replace the abbreviations with the full versions, then check the uniqueness.

0
source

Source: https://habr.com/ru/post/911816/


All Articles