MySQL, select entries with at least X characters

I am trying to do the following. Let's say we have a table containing these fields (ID, content)

1 | an Apple

2 | pineapple

3 | application

4 | nation

Now, I am looking for a function that will tell me all possible common matches. For example, if the argument is "3", the function will return all possible three-character strings that appear in more than one record.

In this case, I get the "application", "ppl", "ple", "ati", "tio", "ion"

If the argument is "4", I get: "appl", "pple", "atio", "tion"

If the drug is "5", I get: "apple", "ation"

If the argument is "6", nohting is returned.

So far, I have not found a function that performs this.

thanks!

Additional info: I use this in a PHP script with a MySQL database. I just want to specify the number of characters as an argument and, of course, the table to search for.

+3
source share
3 answers

Well, this is disgusting, but everything works fine. It generates SQL and will work in any environment. Just generate a few samples of the substring that is greater than the maximum length of the field you are reading. Change the number 50 in the function to a number longer than the field length. It can return a long request, but, as I said, it will work fine. Here is an example in Python:

import sqlite3

c = sqlite3.connect('test.db')

c.execute('create table myTable (id integer, content varchar[50])')
for id, content in ((1,'apple'),(2,'pineapple'),(3,'application'),(4,'nation')):
    c.execute('insert into myTable values (?,?)', [id,content])

c.commit();

def GenerateSQL(substrSize):
    subqueries = ["select substr(content,%i,%i) AS substr, count(*) AS myCount from myTable where length(substr(content,%i,%i))=%i group by substr(content,%i,%i) " % (i,substrSize,i,substrSize,substrSize,i,substrSize)  for i in range(50)]
    sql = 'select substr FROM \n\t(' + '\n\tunion all '.join(subqueries) + ') \nGROUP BY substr HAVING sum(myCount) > 1'
    return sql

print GenerateSQL(3)

print c.execute(GenerateSQL(3)).fetchall()

The created request looks like this:

select substr FROM 
    (select substr(content,0,3) AS substr, count(*) AS myCount from myTable where length(substr(content,0,3))=3 group by substr(content,0,3) 
    union all select substr(content,1,3) AS substr, count(*) AS myCount from myTable where length(substr(content,1,3))=3 group by substr(content,1,3) 
    union all select substr(content,2,3) AS substr, count(*) AS myCount from myTable where length(substr(content,2,3))=3 group by substr(content,2,3) 
    union all select substr(content,3,3) AS substr, count(*) AS myCount from myTable where length(substr(content,3,3))=3 group by substr(content,3,3) 
    union all select substr(content,4,3) AS substr, count(*) AS myCount from myTable where length(substr(content,4,3))=3 group by substr(content,4,3) 
    ... ) 
GROUP BY substr HAVING sum(myCount) > 1

And the results that he produces:

[(u'app',), (u'ati',), (u'ion',), (u'nat',), (u'pin',), (u'ple',), (u'ppl',), (u'tio',)]
+3
source

, php , , # 3.5

pseudocode: . , count > 1:

    static void Main(string[] args)
    {

        string[] data = { "apple", "pinapple", "application", "nation" };
        string[] result = my_func(3,data);

        foreach (string str in result)
        {
            Console.WriteLine(str);
        }
        Console.ReadKey();
    }

    private static string[] my_func(int l, string[] data)
    {
        Dictionary<string,int> dict = new Dictionary<string,int>();
        foreach (string str in data)
        {
            for (int i = 0; i < str.Length - l + 1; i++)
            {
                string part = str.Substring(i, l);
                if (dict.ContainsKey(part))
                {
                    dict[part]++;
                }else {
                    dict.Add(part,1);
                }
            }
        }
        var result = from k in dict.Keys
                where dict[k] > 1
                orderby dict[k] descending
                select k;

        return result.ToArray<string>();
    }
+2

One obvious option is to use REGEX. I have no experience with this, but it can help you: http://dev.mysql.com/doc/refman/5.1/en/regexp.html

You need to find the right expression to match what you need.

0
source

Source: https://habr.com/ru/post/1712985/


All Articles