SQL query with several possible joins (or condition in a join)

I have a problem when I have to try to find people who have old accounts with an outstanding balance, but who have created a new account. I need to match them by comparing SSN. The problem is that we have primary and additional contacts, so there are 2 potential SSNs for each account. I need to match it, even if they are primary at first, but now they are secondary, etc.

Here was my first attempt, I just count to get the connections and conditions. I will select the actual data later. In principle, a personal table is connected once with active accounts and another copy with delinquent accounts. Then, two links to the personal table are compared based on 4 possible ways in which SSNs can be linked.

select count(*) from personal pa join consumer c on c.cust_nbr = pa.cust_nbr and c.per_acct = pa.acct join personal pu on pu.ssn = pa.ssn or pu.ssn = pa.addl_ssn or pu.addl_ssn = pa.ssn or pu.addl_ssn = pa.addl_ssn join uncol_acct u on u.cust_nbr = pu.cust_nbr and u.per_acct = pu.acct where u.curr_bal > 0 

This works, but it takes 20 minutes to start. I found this question. Does the "OR" in the INNER JOIN have the condition "bad idea"? , so I tried to rewrite it as 4 queries (one for one ssn combination) and combining them. It took 30 minutes.

Is there a better way to do this, or it's just a very inefficient process, isn't it how you do it?

Update: After playing with some options here and some other experiments, I think I found a problem. Our software provider encrypts SSNs in a database and provides a view that decrypts them. Since I have to work from this point of view, it takes a very long time to decrypt and compare later.

+4
source share
1 answer

If you start separate joins and then merge, then you may have problems. What to do if the same pair of records fulfills at least two conditions? As a result, you will get duplicates.

I believe your first approach is possible, but do not forget that you are joining the four tables. If the number of rows is A, B, C, D in the corresponding tables, then RDBMS will have to check the maximum records A * B * C * D. If you have many records in your database, this will take a lot of time.

Of course, you can optimize your query by adding indexes to some columns, and it would be a good idea if they are no longer indexed. But do not forget that if you add an index to the column, then RDBMS will be faster to read from there, but slower to write there. If your operations are mostly read (selected), you should index your columns, but not blindly, learn a little about indexing before you start doing this.

In addition, if you join four tables: personal, consumer, personal (again) and uncol_acct, then you can do something like this:

Write a query containing two subqueries, each of which is called t1 and t2, respectively. The first subquery is connected to personal and consumer and will call the result t1. The second request joins the second occurrence of the character with uncol_acct, and the where clause will be inside your second connection. As described above, your request will contain two subqueries named t1 and t2, respectively. Your request will join t1 and t2. This way you will describe, since your main request will only consider a pair of valid t1 and t2.

In addition, if the where clause is outside, as in your query example, then a 4-dimensional connection will be made, and only after that it will be taken into account. This is why the where clause must be inside the second subquery, so the where clause will execute before the main connection. In addition, you can create a subquery in the second subquery to calculate where, if the condition is rarely met.

Hooray!

+2
source

Source: https://habr.com/ru/post/1497231/


All Articles