Performing a join of two tables after their selection

I need a sample after I join the two tables, for example:

SELECT  *
from 
A left join B 
on A.col=B.col
sample 1000000

The problem is that A and B are huge (over 3 billion lines), and when I start the connection, I end the spool space.

Is there a way to do a join after the sample, so that it joins with smaller tables (for example, select 10,000,000 samples from A and B, inner join with them and select 1,000,000 from the join, hoping I get less than 1,000,000 lines?)

PS I am using teradata p>

+4
source share
4 answers

You can do what was suggested, apply SAMPLEin the view:

SELECT  *
from 
 (
  SELECT * FROM A 
  SAMPLE 10000000
 ) AS A
left join B 
on A.col=B.col

Similar to Inner Join

SELECT  *
from 
 (
  SELECT * FROM A
  SAMPLE 100000000 -- larger sample than needed 
 ) AS A
join B 
on A.col=B.col
sample 10000000
+1

SAMPLE, :

SELECT *
FROM
(SELECT * FROM A SAMPLE 1000) t1
LEFT JOIN
(SELECT * FROM B SAMPLE 1000) t2
    ON t1.col = t2.col
0
SELECT * INTO #A FROM A SAMPLE 1000000;

followed by

SELECT * FROM #A left join B on #A.Col = B.col;

I mean, in your original query, you seem to ask for 1,000,000 A, which then LEFT JOIN to B, where there is a batch, or returns NULL for B if there is no match - m, assuming this is a 1-1 connection or 1-0 also - otherwise it does not match your original idea

0
source
SELECT  *
from 
(select * from A sample 1000000) A left join B 
on A.col=B.col
0
source

Source: https://habr.com/ru/post/1660098/


All Articles