What is the best way to rank a sql varchar column by the number (count) / match of words in a parameter with four different unique criteria. This is probably not a trivial question, but it’s hard for me to order strings based on “best fit” using my criteria.
column: varchar (100) description Parameter: @MyParameter varchar (100)
Conclusion with this order preference:
- Exact match (whole string match) - always first
- Starts with (depends on the length of the match parameter)
- Graph of rank words with related words ranking above for the same number of matching words
- Words match anywhere (don't touch)
Words may NOT exactly match, as partial matches are possible and possible, the lessor's value should be applied to partial words for ranking, but not critical (the bank will correspond to each in: bank, potter, tray, depot, deposit for example). It starts with other matches of words that should be higher than those that do not have subsequent matches, but this is not a kill / super deal.
I would like to have a ranking method where the column “begins with” the value in the parameter. Let's say I have the following line:
'This is my value string as a test template to rank on.'
In the first case, I would like to have the column / row rank where the largest number of words exist.
And second in rank, based on occurrence (best match) at the start, like:
'This is my string as a test template to rank on.' - first 'This is my string as a test template to rank on even though not exact.'-second 'This is my string as a test template to rank' - third 'This is my string as a test template to' - next 'This is my string as a test template' - next etc.
Secondly: (perhaps the second data set / group after the first (starts with) - this is desirable
I want to rank (sort) strings by the number of words in @MyParameter that occur in @MyParameter with a rank where adjacent words have a higher rank than the same counter.
Thus, for the sample line above, 'is my string as shown' will be of a higher rank than 'is not my other string as' due to the “better match” of the adjacent string (words together) with the same number of words. Higher match lines (the number of words that occur) rank the smallest match first.
If possible, I would like to do this in one request.
No string should occur twice as a result.
For performance reasons, the table will have no more than 10,000 rows.
The values in the table are quite static with a few changes, but not completely.
I can’t change the structure at this time, but I would think about it later (for example, a word / phrase)
To make this a little more complicated, the list of words is in two tables, but I could create a view for this, but one result of the table (smaller list) should happen before the second, larger result of the data set with the same match - There will be duplicates from these tables, and also inside the table, and I need only individual values. Selecting DISTINCT is not easy, because I want to return a single column (sourceTable), which may well make the rows different and in this case only select from the first (smaller) table, but all other DISTINCT columns are desirable (do not take into account the fact that the column is in “excellent” grade .
Psuedo columns in the table:
procedureCode VARCHAR(50), description VARCHAR(100), -- this is the sort/evaluation column category VARCHAR(50), relvu VARCHAR(50), charge VARCHAR(15), active bit sourceTable VARCHAR(50) - just shows which table it comes from of the two
No unique index exists as identifier column
Corresponds to NOT in the third table, which should be excluded SELECT * FROM (select * from tableone where procedureCode not in (select procedureCode from tablethree)) UNION ALL (select * from tabletwo where procedureCode not in (select procedureCode from tablethree))
EDIT: in an attempt to solve this problem, I created a table value parameter like this:
0 Gastric Intubation & Aspiration/Lavage, Treatmen 1 Gastric%Intubation%Aspiration%Lavage%Treatmen 2 Gastric%Intubation%Aspiration%Lavage 3 Gastric%Intubation%Aspiration 4 Gastric%Intubation 5 Gastric 6 Intubation%Aspiration%Lavage%Treatmen 7 Intubation%Aspiration%Lavage 8 Intubation%Aspiration 9 Intubation 10 Aspiration%Lavage%Treatmen 11 Aspiration%Lavage 12 Aspiration 13 Lavage%Treatmen 14 Lavage 15 Treatmen
where the actual phrase is on line 0
Here is my current attempt:
CREATE PROCEDURE [GetProcedureByDescription] ( @IncludeMaster BIT, @ProcedureSearchPhrases CPTFavorite READONLY ) AS DECLARE @myIncludeMaster BIT; SET @myIncludeMaster = @IncludeMaster; CREATE TABLE #DistinctMatchingCpts ( procedureCode VARCHAR(50), description VARCHAR(100), category VARCHAR(50), rvu VARCHAR(50), charge VARCHAR(15), active VARCHAR(15), sourceTable VARCHAR(50), sequenceSet VARCHAR(2) ) IF @myIncludeMaster = 0 BEGIN -- Excluding master from search INSERT INTO #DistinctMatchingCpts (sourceTable, procedureCode, description , category ,charge, active, rvu, sequenceSet ) SELECT DISTINCT sourceTable, procedureCode, description, category ,charge, active, rvu, sequenceSet FROM ( SELECT TOP 1 LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[COMBO])) AS category, LTRIM(RTRIM(CPT.[CHARGE])) AS charge, ''True'' AS active, LTRIM(RTRIM([RVU])) AS rvu, ''0CPTMore'' AS sourceTable, ''01'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [CPTMORE] AS CPT ON CPT.[LEVEL] = PP.[LEVEL] WHERE (CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles'')) AND CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) ORDER BY PP.CODE UNION ALL SELECT LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[COMBO])) AS category, LTRIM(RTRIM([CHARGE])) AS charge, ''True'' AS active, LTRIM(RTRIM([RVU])) AS rvu, ''0CPTMore'' AS sourceTable, ''02'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [CPTMORE] AS CPT ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%'' WHERE (CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles'')) AND CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) UNION ALL SELECT LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[COMBO])) AS category, LTRIM(RTRIM(CPT.[CHARGE])) AS charge, ''True'' AS active, LTRIM(RTRIM([RVU])) AS rvu, ''0CPTMore'' AS sourceTable, ''03'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [CPTMORE] AS CPT ON CPT.[LEVEL] LIKE ''%'' + PP.[LEVEL] + ''%'' WHERE (CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles'')) AND CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) ) AS CPTS ORDER BY procedureCode, sourceTable, [description] END -- Excluded master from search ELSE BEGIN -- Including master in search, but present favorites before master for each code -- Get matching procedures, ordered by code, source (favorites first), and description. -- There probably will be procedures with duplicated code+description, so we will filter -- duplicates shortly. INSERT INTO #DistinctMatchingCpts (sourceTable, procedureCode, description , category ,charge, active, rvu, sequenceSet) SELECT DISTINCT sourceTable, procedureCode, description, category ,charge, active, rvu, sequenceSet FROM ( SELECT TOP 1 LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[COMBO])) AS category, LTRIM(RTRIM(CPT.[CHARGE])) AS charge, ''True'' AS active, LTRIM(RTRIM([RVU])) AS rvu, ''0CPTMore'' AS sourceTable, ''00'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [CPTMORE] AS CPT ON CPT.[LEVEL] = PP.[LEVEL] WHERE (CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles'')) AND CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) ORDER BY PP.CODE UNION ALL SELECT TOP 1 LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[CATEGORY])) AS category, LTRIM(RTRIM(CPT.[CHARGE])) AS charge, COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active, LTRIM(RTRIM([RVU])) AS rvu, ''2MasterCPT'' AS sourceTable, ''00'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [MASTERCPT] AS CPT ON CPT.[LEVEL] = PP.[LEVEL] WHERE CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) ORDER BY PP.CODE UNION ALL SELECT LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[COMBO])) AS category, LTRIM(RTRIM(CPT.[CHARGE])) AS charge, ''True'' AS active, LTRIM(RTRIM([RVU])) AS rvu, ''0CPTMore'' AS sourceTable, ''01'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [CPTMORE] AS CPT ON CPT.[LEVEL] = PP.[LEVEL] WHERE (CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles'')) AND CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) UNION ALL SELECT LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[CATEGORY])) AS category, LTRIM(RTRIM(CPT.[CHARGE])) AS charge, COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active, LTRIM(RTRIM([RVU])) AS rvu, ''2MasterCPT'' AS sourceTable, ''01'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [MASTERCPT] AS CPT ON CPT.[LEVEL] = PP.[LEVEL] WHERE CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) UNION ALL SELECT TOP 1 LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[COMBO])) AS category, LTRIM(RTRIM(CPT.[CHARGE])) AS charge, ''True'' AS active, LTRIM(RTRIM([RVU])) AS rvu, ''0CPTMore'' AS sourceTable, ''02'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [CPTMORE] AS CPT ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%'' WHERE (CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles'')) AND CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) ORDER BY PP.CODE UNION ALL SELECT TOP 1 LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[CATEGORY])) AS category, LTRIM(RTRIM(CPT.[CHARGE])) AS charge, COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active, LTRIM(RTRIM([RVU])) AS rvu, ''2MasterCPT'' AS sourceTable, ''02'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [MASTERCPT] AS CPT ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%'' WHERE CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) ORDER BY PP.CODE UNION ALL SELECT LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[COMBO])) AS category, LTRIM(RTRIM(CPT.[CHARGE])) AS charge, ''True'' AS active, LTRIM(RTRIM([RVU])) AS rvu, ''0CPTMore'' AS sourceTable, ''03'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [CPTMORE] AS CPT ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%'' WHERE (CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles'')) AND CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) UNION ALL SELECT LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[CATEGORY])) AS category, LTRIM(RTRIM(CPT.[CHARGE])) AS charge, COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active, LTRIM(RTRIM([RVU])) AS rvu, ''2MasterCPT'' AS sourceTable, ''03'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [MASTERCPT] AS CPT ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%'' WHERE CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) UNION ALL SELECT LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[COMBO])) AS category, LTRIM(RTRIM(CPT.[CHARGE])) AS charge, ''True'' AS active, LTRIM(RTRIM([RVU])) AS rvu, ''0CPTMore'' AS sourceTable, ''04'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [CPTMORE] AS CPT ON CPT.[LEVEL] LIKE ''%'' + PP.[LEVEL] + ''%'' WHERE (CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles'')) AND CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) UNION ALL SELECT LTRIM(RTRIM(CPT.[CODE])) AS procedureCode, LTRIM(RTRIM(CPT.[LEVEL])) AS description, LTRIM(RTRIM(CPT.[CATEGORY])) AS category, LTRIM(RTRIM(CPT.[CHARGE])) AS charge, COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active, LTRIM(RTRIM([RVU])) AS rvu, ''2MasterCPT'' AS sourceTable, ''04'' AS sequenceSet FROM @ProcedureSearchPhrases PP INNER JOIN [MASTERCPT] AS CPT ON CPT.[LEVEL] LIKE ''%'' + PP.[LEVEL] + ''%'' WHERE CPT.[CODE] IS NOT NULL AND CPT.[CODE] NOT IN (''0'', '''') AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL) ) AS CPTS ORDER BY sequenceSet, sourceTable, [description] END /* Final select - uses artificial ordering from the insertion ORDER BY */ SELECT procedureCode, description, category, rvu, charge, active FROM ( SELECT TOP 500 *-- procedureCode, description, category, rvu, charge, active FROM #DistinctMatchingCpts ORDER BY sequenceSet, sourceTable, description ) AS CPTROWS DROP TABLE #DistinctMatchingCpts
However, this does NOT meet the criteria for the best match for the number of words (as in the value of line 1 in the sample), which should match the best (most) words found in this line.
I have full control over the form / format of the table value parameter, if that matters.
I am returning this result to a C # program if it is useful.