SQL Server Best Method for Matching Phrases and Order

What is the best way to rank a sql varchar column by the number (count) / match of words in a parameter with four different unique criteria. This is probably not a trivial question, but it’s hard for me to order strings based on “best fit” using my criteria.

column: varchar (100) description Parameter: @MyParameter varchar (100)

Conclusion with this order preference:

  • Exact match (whole string match) - always first
  • Starts with (depends on the length of the match parameter)
  • Graph of rank words with related words ranking above for the same number of matching words
  • Words match anywhere (don't touch)

Words may NOT exactly match, as partial matches are possible and possible, the lessor's value should be applied to partial words for ranking, but not critical (the bank will correspond to each in: bank, potter, tray, depot, deposit for example). It starts with other matches of words that should be higher than those that do not have subsequent matches, but this is not a kill / super deal.

I would like to have a ranking method where the column “begins with” the value in the parameter. Let's say I have the following line:

'This is my value string as a test template to rank on.' 

In the first case, I would like to have the column / row rank where the largest number of words exist.

And second in rank, based on occurrence (best match) at the start, like:

 'This is my string as a test template to rank on.' - first 'This is my string as a test template to rank on even though not exact.'-second 'This is my string as a test template to rank' - third 'This is my string as a test template to' - next 'This is my string as a test template' - next etc. 

Secondly: (perhaps the second data set / group after the first (starts with) - this is desirable

I want to rank (sort) strings by the number of words in @MyParameter that occur in @MyParameter with a rank where adjacent words have a higher rank than the same counter.

Thus, for the sample line above, 'is my string as shown' will be of a higher rank than 'is not my other string as' due to the “better match” of the adjacent string (words together) with the same number of words. Higher match lines (the number of words that occur) rank the smallest match first.

If possible, I would like to do this in one request.

No string should occur twice as a result.

For performance reasons, the table will have no more than 10,000 rows.

The values ​​in the table are quite static with a few changes, but not completely.

I can’t change the structure at this time, but I would think about it later (for example, a word / phrase)

To make this a little more complicated, the list of words is in two tables, but I could create a view for this, but one result of the table (smaller list) should happen before the second, larger result of the data set with the same match - There will be duplicates from these tables, and also inside the table, and I need only individual values. Selecting DISTINCT is not easy, because I want to return a single column (sourceTable), which may well make the rows different and in this case only select from the first (smaller) table, but all other DISTINCT columns are desirable (do not take into account the fact that the column is in “excellent” grade .

Psuedo columns in the table:

 procedureCode VARCHAR(50), description VARCHAR(100), -- this is the sort/evaluation column category VARCHAR(50), relvu VARCHAR(50), charge VARCHAR(15), active bit sourceTable VARCHAR(50) - just shows which table it comes from of the two 

No unique index exists as identifier column

Corresponds to NOT in the third table, which should be excluded SELECT * FROM (select * from tableone where procedureCode not in (select procedureCode from tablethree)) UNION ALL (select * from tabletwo where procedureCode not in (select procedureCode from tablethree))

EDIT: in an attempt to solve this problem, I created a table value parameter like this:

 0 Gastric Intubation & Aspiration/Lavage, Treatmen 1 Gastric%Intubation%Aspiration%Lavage%Treatmen 2 Gastric%Intubation%Aspiration%Lavage 3 Gastric%Intubation%Aspiration 4 Gastric%Intubation 5 Gastric 6 Intubation%Aspiration%Lavage%Treatmen 7 Intubation%Aspiration%Lavage 8 Intubation%Aspiration 9 Intubation 10 Aspiration%Lavage%Treatmen 11 Aspiration%Lavage 12 Aspiration 13 Lavage%Treatmen 14 Lavage 15 Treatmen 

where the actual phrase is on line 0

Here is my current attempt:

 CREATE PROCEDURE [GetProcedureByDescription] ( @IncludeMaster BIT, @ProcedureSearchPhrases CPTFavorite READONLY ) AS DECLARE @myIncludeMaster BIT; SET @myIncludeMaster = @IncludeMaster; CREATE TABLE #DistinctMatchingCpts ( procedureCode VARCHAR(50), description VARCHAR(100), category VARCHAR(50), rvu VARCHAR(50), charge VARCHAR(15), active VARCHAR(15), sourceTable VARCHAR(50), sequenceSet VARCHAR(2) ) IF @myIncludeMaster = 0 BEGIN -- Excluding master from search INSERT INTO #DistinctMatchingCpts (sourceTable, procedureCode, description , category ,charge, active, rvu, sequenceSet ) SELECT DISTINCT sourceTable, procedureCode, description, category ,charge, active, rvu, sequenceSet FROM ( SELECT TOP 1 LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[COMBO])) AS category, LTRIM(RTRIM(CPT.[CHARGE])) AS charge, ''True'' AS active, LTRIM(RTRIM([RVU])) AS rvu, ''0CPTMore'' AS sourceTable, ''01'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [CPTMORE] AS CPT ON CPT.[LEVEL] = PP.[LEVEL] WHERE (CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles'')) AND CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) ORDER BY PP.CODE UNION ALL SELECT LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[COMBO])) AS category, LTRIM(RTRIM([CHARGE])) AS charge, ''True'' AS active, LTRIM(RTRIM([RVU])) AS rvu, ''0CPTMore'' AS sourceTable, ''02'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [CPTMORE] AS CPT ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%'' WHERE (CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles'')) AND CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) UNION ALL SELECT LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[COMBO])) AS category, LTRIM(RTRIM(CPT.[CHARGE])) AS charge, ''True'' AS active, LTRIM(RTRIM([RVU])) AS rvu, ''0CPTMore'' AS sourceTable, ''03'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [CPTMORE] AS CPT ON CPT.[LEVEL] LIKE ''%'' + PP.[LEVEL] + ''%'' WHERE (CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles'')) AND CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) ) AS CPTS ORDER BY procedureCode, sourceTable, [description] END -- Excluded master from search ELSE BEGIN -- Including master in search, but present favorites before master for each code -- Get matching procedures, ordered by code, source (favorites first), and description. -- There probably will be procedures with duplicated code+description, so we will filter -- duplicates shortly. INSERT INTO #DistinctMatchingCpts (sourceTable, procedureCode, description , category ,charge, active, rvu, sequenceSet) SELECT DISTINCT sourceTable, procedureCode, description, category ,charge, active, rvu, sequenceSet FROM ( SELECT TOP 1 LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[COMBO])) AS category, LTRIM(RTRIM(CPT.[CHARGE])) AS charge, ''True'' AS active, LTRIM(RTRIM([RVU])) AS rvu, ''0CPTMore'' AS sourceTable, ''00'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [CPTMORE] AS CPT ON CPT.[LEVEL] = PP.[LEVEL] WHERE (CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles'')) AND CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) ORDER BY PP.CODE UNION ALL SELECT TOP 1 LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[CATEGORY])) AS category, LTRIM(RTRIM(CPT.[CHARGE])) AS charge, COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active, LTRIM(RTRIM([RVU])) AS rvu, ''2MasterCPT'' AS sourceTable, ''00'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [MASTERCPT] AS CPT ON CPT.[LEVEL] = PP.[LEVEL] WHERE CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) ORDER BY PP.CODE UNION ALL SELECT LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[COMBO])) AS category, LTRIM(RTRIM(CPT.[CHARGE])) AS charge, ''True'' AS active, LTRIM(RTRIM([RVU])) AS rvu, ''0CPTMore'' AS sourceTable, ''01'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [CPTMORE] AS CPT ON CPT.[LEVEL] = PP.[LEVEL] WHERE (CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles'')) AND CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) UNION ALL SELECT LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[CATEGORY])) AS category, LTRIM(RTRIM(CPT.[CHARGE])) AS charge, COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active, LTRIM(RTRIM([RVU])) AS rvu, ''2MasterCPT'' AS sourceTable, ''01'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [MASTERCPT] AS CPT ON CPT.[LEVEL] = PP.[LEVEL] WHERE CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) UNION ALL SELECT TOP 1 LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[COMBO])) AS category, LTRIM(RTRIM(CPT.[CHARGE])) AS charge, ''True'' AS active, LTRIM(RTRIM([RVU])) AS rvu, ''0CPTMore'' AS sourceTable, ''02'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [CPTMORE] AS CPT ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%'' WHERE (CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles'')) AND CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) ORDER BY PP.CODE UNION ALL SELECT TOP 1 LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[CATEGORY])) AS category, LTRIM(RTRIM(CPT.[CHARGE])) AS charge, COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active, LTRIM(RTRIM([RVU])) AS rvu, ''2MasterCPT'' AS sourceTable, ''02'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [MASTERCPT] AS CPT ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%'' WHERE CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) ORDER BY PP.CODE UNION ALL SELECT LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[COMBO])) AS category, LTRIM(RTRIM(CPT.[CHARGE])) AS charge, ''True'' AS active, LTRIM(RTRIM([RVU])) AS rvu, ''0CPTMore'' AS sourceTable, ''03'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [CPTMORE] AS CPT ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%'' WHERE (CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles'')) AND CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) UNION ALL SELECT LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[CATEGORY])) AS category, LTRIM(RTRIM(CPT.[CHARGE])) AS charge, COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active, LTRIM(RTRIM([RVU])) AS rvu, ''2MasterCPT'' AS sourceTable, ''03'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [MASTERCPT] AS CPT ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%'' WHERE CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) UNION ALL SELECT LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[COMBO])) AS category, LTRIM(RTRIM(CPT.[CHARGE])) AS charge, ''True'' AS active, LTRIM(RTRIM([RVU])) AS rvu, ''0CPTMore'' AS sourceTable, ''04'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [CPTMORE] AS CPT ON CPT.[LEVEL] LIKE ''%'' + PP.[LEVEL] + ''%'' WHERE (CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles'')) AND CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) UNION ALL SELECT LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[CATEGORY])) AS category, LTRIM(RTRIM(CPT.[CHARGE])) AS charge, COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active, LTRIM(RTRIM([RVU])) AS rvu, ''2MasterCPT'' AS sourceTable, ''04'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [MASTERCPT] AS CPT ON CPT.[LEVEL] LIKE ''%'' + PP.[LEVEL] + ''%'' WHERE CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) ) AS CPTS ORDER BY sequenceSet, sourceTable, [description] END /* Final select - uses artificial ordering from the insertion ORDER BY */ SELECT procedureCode, description, category, rvu, charge, active FROM ( SELECT TOP 500 *-- procedureCode, description, category, rvu, charge, active FROM #DistinctMatchingCpts ORDER BY sequenceSet, sourceTable, description ) AS CPTROWS DROP TABLE #DistinctMatchingCpts 

However, this does NOT meet the criteria for the best match for the number of words (as in the value of line 1 in the sample), which should match the best (most) words found in this line.

I have full control over the form / format of the table value parameter, if that matters.

I am returning this result to a C # program if it is useful.

+6
source share
4 answers

You should be able to split the lines to solve this problem. I prefer a number table approach to split a string in TSQL

In order for my code below to work (as well as my split function), you need to do this setting of one temporary table:

 SELECT TOP 10000 IDENTITY(int,1,1) AS Number INTO Numbers FROM sys.objects s1 CROSS JOIN sys.objects s2 ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number) 

Once the Numbers table is configured, create this split function:

 CREATE FUNCTION [dbo].[FN_ListToTable] ( @SplitOn char(1) --REQUIRED, the character to split the @List string on ,@List varchar(8000)--REQUIRED, the list to split apart ) RETURNS TABLE AS RETURN ( ---------------- --SINGLE QUERY-- --this will not return empty rows ---------------- SELECT ListValue FROM (SELECT LTRIM(RTRIM(SUBSTRING(List2, number+1, CHARINDEX(@SplitOn, List2, number+1)-number - 1))) AS ListValue FROM ( SELECT @SplitOn + @List + @SplitOn AS List2 ) AS dt INNER JOIN Numbers n ON n.Number < LEN(dt.List2) WHERE SUBSTRING(List2, number, 1) = @SplitOn ) dt2 WHERE ListValue IS NOT NULL AND ListValue!='' ); GO 

Feel free to create your own split function, but you still need a Numbers table for my solution to work.

Now you can easily split the CSV row into a table and join it:

 select * from dbo.FN_ListToTable(',','1,2,3,,,4,5,6777,,,') 

CONCLUSION:

 ListValue ----------------------- 1 2 3 4 5 6777 (6 row(s) affected) 

try the following:

 DECLARE @BaseTable table (RowID int primary key, RowValue varchar(100)) set nocount on INSERT @BaseTable VALUES ( 1,'The cows came home empty handed') INSERT @BaseTable VALUES ( 2,'This is my string as a test template to rank') -- third INSERT @BaseTable VALUES ( 3,'pencil pen paperclip eraser') INSERT @BaseTable VALUES ( 4,'wow') INSERT @BaseTable VALUES ( 5,'no dice here') INSERT @BaseTable VALUES ( 6,'This is my string as a test template to rank on even though not exact.') -- second INSERT @BaseTable VALUES ( 7,'apple banana pear grape lemon orange kiwi strawberry peach watermellon') INSERT @BaseTable VALUES ( 8,'This is my string as a test template') -- 5th INSERT @BaseTable VALUES ( 9,'rat cat bat mat sat fat hat pat ') INSERT @BaseTable VALUES (10,'house home pool roll') INSERT @BaseTable VALUES (11,'This is my string as a test template to') -- 4th INSERT @BaseTable VALUES (12,'talk wisper yell scream sing hum') INSERT @BaseTable VALUES (13,'This is my string as a test template to rank on.') -- first INSERT @BaseTable VALUES (14,'aaa bbb ccc ddd eee fff ggg hhh') INSERT @BaseTable VALUES (15,'three twice three once twice three') set nocount off DECLARE @SearchValue varchar(100) SET @SearchValue='This is my value string as a test template to rank on.' ;WITH SplitBaseTable AS --expand each @BaseTable row into one row per word (SELECT b.RowID, b.RowValue, s.ListValue FROM @BaseTable b CROSS APPLY dbo.FN_ListToTable(' ',b.RowValue) AS s ) , WordMatchCount AS --for each @BaseTable row that has has a word in common withe the search string, get the count of matching words (SELECT s.RowID,COUNT(*) AS CountOfWordMatch FROM dbo.FN_ListToTable(' ',@SearchValue) v INNER JOIN SplitBaseTable s ON v.ListValue=s.ListValue GROUP BY s.RowID HAVING COUNT(*)>0 ) , SearchLen AS --get one row for each possible length of the search string ( SELECT n.Number,SUBSTRING(@SearchValue,1,n.Number) AS PartialSearchValue FROM Numbers n WHERE n.Number<=LEN(@SearchValue) ) , MatchLen AS --for each @BaseTable row, get the max starting length that matches the search string ( SELECT b.RowID,MAX(l.Number) MatchStartLen FROM @BaseTable b LEFT OUTER JOIN SearchLen l ON LEFT(b.RowValue,l.Number)=l.PartialSearchValue GROUP BY b.RowID ) SELECT --return the final search results b.RowValue,w.CountOfWordMatch,m.MatchStartLen FROM @BaseTable b LEFT OUTER JOIN WordMatchCount w ON b.RowID=w.RowID LEFT OUTER JOIN MatchLen m ON b.RowID=m.RowID WHERE w.CountOfWordMatch>0 ORDER BY w.CountOfWordMatch DESC,m.MatchStartLen DESC,LEN(b.RowValue) DESC,b.RowValue ASC 

CONCLUSION:

 RowValue CountOfWordMatch MatchStartLen ----------------------------------------------------------------------- ---------------- ------------- This is my string as a test template to rank on. 11 11 This is my string as a test template to rank on even though not exact. 10 11 This is my string as a test template to rank 10 11 This is my string as a test template to 9 11 This is my string as a test template 8 11 (5 row(s) affected) 

This means that the beginning of the string word is slightly different from the fact that it looks at the number of characters at the beginning of the string that matches.

Once you do this, you can try to optimize it by creating some static indexed tables for SplitBaseTable. Perhaps using a trigger on your @BaseTable.

+4
source

It sounds like you're looking for a suitable algorithm that can be difficult to create without using stored procedures. From past experience, it is possible to edit distance algorithms (for example, Levenshtein), which are very useful in determining similarity. They return a number, sometimes a few differences between the lines on which you can create your own weighting equation to give an estimate. You can then create ratings or thresholds for ratings to reduce false negatives / positives.

0
source

I had a similar question a while ago. The question I tried to answer is how many words matched between two different columns and a rank based on the maximum percentage of matching words. It was far from me, but I got a fantastic response from Martin.

See his answer to my question here .

0
source

One answer to all your questions: use sphynx http://sphinxsearch.com and don't solve it in SQL.

Sphynx is open source, working with all databases and all operating systems.

What uses Craigslist.

This is the best out-of-full search engine at the time of publication. He will order your results in the relevance you ask for, and you won’t need fantastic SQL tables or SQL procedures. Give it a try.

0
source

Source: https://habr.com/ru/post/891448/


All Articles