Column intersects in two tables

I am trying to do something similar to a column intersecting on two tables. Tables:

  • LogTag: a log can have zero or more tags
  • MatchingRule: a matching rule consists of one or more tags that define a rule

Zero or more rules can be set in a log. I will go through MatchingRuleIDand expect to return all journals that comply with this rule.

Expected Result: A set of results for matching LogIDs. For example. transmission MatchingRuleID = 30must return LogID101. MatchingRuleID = 31must return LogID101 and 100.

In addition, a table LogTagcan have millions of rows, so the preferred efficient query is.

Question: How do I find all LogIDthat match a given rule definition?

enter image description here

Scheme:

CREATE TABLE dbo.Tag
(
    TagID INT,
    TagName NVARCHAR(50)
)
INSERT INTO dbo.Tag (TagID, TagName)
VALUES (1, 'tag1'), (2, 'tag2'), (3, 'tag3')

CREATE TABLE dbo.LogTag
(
    LogID INT,
    TagID INT
)
INSERT INTO dbo.LogTag (LogID, TagID)
VALUES (100, 1), (101, 1), (101, 2), (101, 3), (101, 4), (102, 2), (102, 3)  

CREATE TABLE dbo.MatchingRule
(
    MatchingRuleID INT,
    TagID INT
)
INSERT INTO dbo.MatchingRule (MatchingRuleID, TagID)
VALUES (30, 1), (30, 2), (30, 3), (31, 1)
+4
source share
5 answers

It is important to have the correct clustered index in the tables. I added an alternative index in the comments for #log_tag, which can improve performance for large sets. Since I do not have a proper test sample, you will need to check which is better.

CREATE TABLE #tag(tag_id INT PRIMARY KEY,tag_name NVARCHAR(50));
INSERT INTO #tag (tag_id,tag_name)VALUES
    (1,'tag1'),(2,'tag2'),(3,'tag3');

-- Try this key for large sets: PRIMARY KEY(tag_id,log_id));
CREATE TABLE #log_tag(log_id INT,tag_id INT,PRIMARY KEY(log_id,tag_id))
INSERT INTO #log_tag (log_id,tag_id)VALUES
    (100,1),(101,1),(101,2),(101,3),(101,4),(102,2),(102,3);

CREATE TABLE #matching_rule(matching_rule_id INT,tag_id INT,PRIMARY KEY(matching_rule_id,tag_id));
INSERT INTO #matching_rule(matching_rule_id,tag_id)VALUES
    (30,1),(30,2),(30,3),(31,1);

DECLARE @matching_rule_id INT=31;

;WITH required_tags AS (
    SELECT tag_id
    FROM #matching_rule
    WHERE matching_rule_id=@matching_rule_id
)
SELECT lt.log_id
FROM required_tags AS rt 
     INNER JOIN #log_tag AS lt ON
         lt.tag_id=rt.tag_id
GROUP BY lt.log_id
HAVING COUNT(*)=(SELECT COUNT(*) FROM required_tags);

DROP TABLE #log_tag;
DROP TABLE #matching_rule;
DROP TABLE #tag;

The results match your expected results for 30 and 31.

Execution plan for the index used in the script:

Execution plan for the index used in the script

+2
source

Try this request

Script here

DECLARE @InputMatchingRuleId  INT = 30
;WITH CTE1
AS
(
    SELECT DENSE_RANK() OVER(ORDER BY LT.TAGID) AS RN,LT.TagID,LT.LOGID 
    FROM MatchingRule MR INNER JOIN LogTag LT ON LT.TagID = MR.TagID 
    WHERE MatchingRuleID=@InputMatchingRuleId

),
CTE2
AS
(
    SELECT 1 AS RN2,LOGID FROM CTE1 C1 WHERE C1.RN=1
    UNION ALL
    SELECT RN2+1 as RN2,C2.LOGID 
    FROM CTE1 C1 INNER JOIN CTE2 C2 ON C1.RN = C2.RN2+1 AND C1.LOGID = C2.LOGID
)

  SELECT DISTINCT LOGID FROM CTE2 
  WHERE RN2>(CASE WHEN (SELECT MAX(RN2) FROM CTE2)=1 THEN 0 ELSE 1 END)
+1
source

. SQL Server 2008 +

, :

DECLARE @RuleID INT
SELECT @RuleID = 30

SELECT LogID
FROM LogTag lt
    INNER JOIN (
        SELECT TagID, MatchingRuleID, COUNT(*) OVER (PARTITION BY MatchingRuleID) TagCount
        FROM MatchingRule
    ) mr 
    ON lt.TagID = mr.TagID
        AND mr.MatchingRuleID = @RuleID
GROUP BY LogID, TagCount
HAVING COUNT(*) = TagCount

So basically I match everything TagIDin the specified matching rule, and then when I know that all the tags match, I check if the number of tags from the table MatchingRule(now filtered and grouped) matches the number of tags from the table LogTag.

+1
source

it should be

; with rules as
(
    select  TagID, cnt = sum(count(*)) over()
    from    dbo.MatchingRule
    where   MatchingRuleID  = @MatchingRuleID
    group by TagID
)
select  LogID
from    rules r
    inner join LogTag lt    on  r.TagID = lt.TagID
group by LogID, cnt
having  count(*) = r.cnt
+1
source
select l.LogID
from dbo.MatchingRule r
inner join dbo.LogTag l on l.TagID = r.TagID
where r.MatchingRuleID = 31

another approach is to identify all tags, and then:

select l.LogID
from dbo.LogTag l
where exists(select 1 from @Tags t where t.TagID = l.TagID)
0
source

Source: https://habr.com/ru/post/1628594/


All Articles