Best SQL 2005 query to get related items?

I have a small video site where I want to get related videos based on the most matching tags. What would be the best MSSQL 2005 query for getting related videos?

A LINQ query will also be appreciated.


Scheme:

CREATE TABLE Videos
    (VideoID bigint not null , 
    Title varchar(100) NULL, 
    isActive bit NULL  )

CREATE TABLE Tags
    (TagID bigint not null , 
    Tag varchar(100) NULL )

CREATE TABLE VideoTags
    (VideoID bigint not null , 
    TagID bigint not null )

Each video can have multiple tags. Now I want to get tagged related videos, but only those videos that match most tags. The most matching videos should appear at the top, and the smaller match should be at the bottom, if the tags do not match, then it should not return the video.

I also want to know that the scheme is higher if I say more than a million videos and 10-20 tags for each video.

+3
5

sql

SELECT v.VideoID, v.Title, v.isActive
FROM Videos v
  JOIN 
(
  SELECT vt.VideoID, Count(*) as MatchCount
  FROM VideoTags vt
  WHERE vt.TagID in
  (
    SELECT TagID
    FROM Tags t
    WHERE t.Tag in ('horror', 'scifi')
  )
  GROUP BY vt.VideoID
) as sub
  ON v.VideoID = sub.VideoID
ORDER BY sub.MatchCount desc

Linq.

List<string> TagList = new List<string>() {"horror", "scifi"};

  //find tag ids.
var tagQuery =
  from t in db.Tags
  where TagList.Contains(t.Tag))
  select t.TagID

  //find matching video ids, count matches for each
var videoTagQuery =
  from vt in db.VideoTags
  where tagQuery.Contains(vt.TagID)
  group vt by vt.VideoID into g
  select new { VideoID = g.Key, matchCount = g.Count;

  //fetch videos where matches were found
  //ordered by the number of matches
var videoQuery =
  from v in db.Videos
  join x in videoTagQuery on v.VideoID equals x.VideoID
  orderby x.matchCount
  select v
  //hit the database and pull back the results
List<Video> result = videoQuery.ToList();

, - taglist, . Ok:

SELECT v.VideoID, v.Title, v.isActive
FROM Videos v
  JOIN 
(
  SELECT vt.VideoID, Count(*) as MatchCount
  FROM VideoTags vt
  WHERE vt.TagID in
  (
    SELECT TagID
    FROM VideoTags vt2
    WHERE vt2.VideoID = @VideoID
  )
  GROUP BY vt.VideoID
) as sub
  ON v.VideoID = sub.VideoID
ORDER BY sub.MatchCount desc

Linq ,

int myVideoID = 4

  //find tag ids.
var tagQuery =
  from t in db.VideoTags
  where t.VideoID = myVideoID
  select t.TagID
+1

: SQL, , .

0

- , ?

String horror = "Horror";
String thriller = "Thriller";

var results =
    from v in db.Videos
    join vt in db.VideoTags on v.VideoId equals vt.VideoId
    join t in db.Tags on vt.TagId equals t.TagId
    where
        t.Tag == horror || t.Tag == thriller
    select v;
0

, ( ):

select video.videoId, Title, count(*) nroOfTags
from videos, VideoTags
where
videoTags.videoid = videos.videoID 
and tagId in ('horror','action','adventure')
group by video.videoId, Title
order by count(*) desc

, . , .

0

DDL:

CREATE TABLE [Tags](
    [TagID] [bigint] IDENTITY(1,1) NOT NULL,
    [Tag] [nvarchar](100) NOT NULL,
PRIMARY KEY CLUSTERED 
(
    [TagID] ASC
),
 CONSTRAINT [UC_Tags] UNIQUE NONCLUSTERED 
(
    [Tag] ASC
)
)

GO

CREATE TABLE [Videos](
    [VideoID] [bigint] IDENTITY(1,1) NOT NULL,
    [Title] [nvarchar](100) NOT NULL,
    [isActive] [bit] NOT NULL,
PRIMARY KEY CLUSTERED 
(
    [VideoID] ASC
),
 CONSTRAINT [UC_Videos] UNIQUE NONCLUSTERED 
(
    [Title] ASC
)
)

GO

CREATE TABLE [VideoTags](
    [VideoID] [bigint] NOT NULL,
    [TagID] [bigint] NOT NULL,
PRIMARY KEY CLUSTERED 
(
    [VideoID] ASC,
    [TagID] ASC
)
)

GO

ALTER TABLE [VideoTags]  WITH CHECK ADD FOREIGN KEY([TagID])
REFERENCES [Tags] ([TagID])
GO

ALTER TABLE [VideoTags]  WITH CHECK ADD FOREIGN KEY([VideoID])
REFERENCES [Videos] ([VideoID])
GO
  • nvarchar. "" .
  • id IDENTITY
  • I would make the Tag and Title columns unique. You do not want to duplicate headings or tags
  • I would make all these columns non-empty. It makes no sense to have a video or tag with an unknown name, and video inactive and inactive is never "possibly" or "unknown."
  • I added a primary key to VideoTags to prevent duplication.

For an SQL query, I would try the following. I cannot be sure what you want without test data:

;
WITH VIDEO_TAG_COUNTS(VideoID,TagCount)
AS
(
    SELECT v.VideoID, COUNT(*)
    FROM Videos V
    INNER JOIN VideoTags VT ON V.VideoID = VT.VideoID
    GROUP BY V.VideoID
)
SELECT V.VideoID, V.Title
FROM Videos V 
INNER JOIN VIDEO_TAG_COUNTS VTC ON V.VideoID = VTC.VideoID
WHERE V.isActive = 1
ORDER BY VTC.TagCount
0
source

Source: https://habr.com/ru/post/1711406/


All Articles