I used sys.dm_fts_parser below to split sentences into words. There are many TSQL partitioning functions around if you are not on SQL Server 2008 or find that for some reason this is not suitable.
The requirement that each A.id can only be paired with a B.id that has not been used before, and vice versa, I could not think of an efficient set-based solution.
;WITH A(Id, sentence) As ( SELECT 1,'What other text in here' UNION ALL SELECT 2,'What am I doing here' UNION ALL SELECT 3,'I need to find another job' UNION ALL SELECT 4,'Other text in here' ), B(Id, sentence) As ( SELECT 5,'Other text in here' UNION ALL SELECT 6,'I am doing what here' UNION ALL SELECT 7,'Purple unicorns' UNION ALL SELECT 8,'What are you doing in here' ), A_Split AS (SELECT Id AS A_Id, display_term, COUNT(*) OVER (PARTITION BY Id) AS A_Cnt FROM A CROSS APPLY sys.dm_fts_parser('"' + REPLACE(sentence, '"', '""')+'"',1033, 0,0)), B_Split AS (SELECT Id AS B_Id, display_term, COUNT(*) OVER (PARTITION BY Id) AS B_Cnt FROM B CROSS APPLY sys.dm_fts_parser('"' + REPLACE(sentence, '"', '""')+'"',1033, 0,0)), Joined As (SELECT A_Id, B_Id, B_Cnt, Cnt = COUNT(*), CAST(COUNT(*) as FLOAT)/B_Cnt AS PctMatchBToA, CAST(COUNT(*) as FLOAT)/A_Cnt AS PctMatchAToB from A_Split A JOIN B_Split B ON A.display_term = B.display_term GROUP BY A_Id, B_Id, B_Cnt, A_Cnt) SELECT IDENTITY(int, 1, 1) as id, * INTO
Returns
A_Id B_Id Cnt ----------- ----------- ----------- 1 8 3 2 6 5 4 5 4