SQL: row filtering

I am trying to write an SQL query that returns rows from a table containing data:

The structure of the table is as follows:

CREATE TABLE person( id INT PRIMARY KEY, name TEXT, operation TEXT); 

I want to return all unique name strings that have not been undone. A row is considered “canceled” if the operation is “insert” or “delete”, and there is another row with the same name with the opposite operation.

For example, if I have the following lines

 id name operation 1 bob insert 2 bob delete 3 bob insert 

The first two lines “cancel” each other because they have the same name with opposite operations. Therefore, the query should return string 3.

Here is another example:

 id name operation 1 bob insert 2 bob delete 3 bob insert 4 bob delete 

In this case, lines 1 and 2 are canceled, and lines 3 and 4 are truncated. Therefore, the query should not return rows.

Last example:

 id name operation 1 bob insert 2 bob insert 

In this case, lines 1 and 2 are not discarded because the operations are not opposite. Thus, the query should return both rows.

I have the following query that processes the first two scripts but does not process the final script.

Does anyone have any suggestions for a query that can handle all 3 scenarios?

 SELECT MAX(id),name FROM person z WHERE operation IN ('insert','delete') GROUP BY name HAVING count(1) % 2 = 1; 
+6
source share
3 answers

One way is to compare the number of operations. Since you also need to get the number of INSERTS or DELETES that match InsertCount - deleteCount or InsertCount - deleteCount, and since PostgreSQL supports the window function , you should be able to use row_number ().

Note. I have not tested it this way, but according to this PostgreSQL Guide Chapter 3. Additional functions, 3.5 Window functions , you can refer to the window function in an inline query

 SELECT id, name FROM ( SELECT row_number() over (partition by p.name, p.operation order by p.id desc) rn , id, p.Name, p.operation, operationCounts.InsertCount, operationCounts.deleteCount FROM Person p INNER JOIN ( SELECT SUM(CASE WHEN operation = 'insert' then 1 else 0 END) InsertCount, SUM(CASE WHEN operation = 'delete' then 1 else 0 END) deleteCount, name FROM person GROUP BY name ) operationCounts ON p.name = operationCounts.name WHERE operationCounts.InsertCount <> operationCounts.deleteCount) data WHERE (rn <= (InsertCount - deleteCount) and operation = 'insert') OR (rn <= (deleteCount - InsertCount) and operation = 'delete') 
+4
source

The best speed and the shortest answer: The problem can be reduced to

  • counts delete operations for each name (cnt_del)
  • ignore first cnt_del inserts

This can be written down with one shot as follows: (I don’t know if all of this query works)

 select * from( SELECT id, name, row_number() over (partition by name order by case when operation = 'insert' then id else null end nulls last ) rnk_insert, count(case when operation='delete' then 1 else null end) over (partition by name) as cnt_del FROM person z WHERE operation IN ('insert','delete') ) where rnk_insert > cnt_del 

If the previous ones do not work in postgres (AFAIK, Oracle can handle this), the solution can be implemented in a more relaxed way:

 select i.id, i.name from (select id, name, row_number over (partition by name order by id) as rnk_insert from person z where operation='insert') i left join (select name, count(*) as cnt_del from person z where operation='delete') d on d.name = i.name where rnk_insert > coalesce(cnt_del, 0) 
+1
source

Testing showed that my initial request was slower than the excellent @Conrad request. Confused, I tried several things and came up with a query that is actually simpler and faster.

Test setup

 INSERT INTO person SELECT i ,'name' || (random() * 500)::int::text ,CASE WHEN random() >= 0.5 THEN 'insert' ELSE 'delete' END FROM generate_series(1,10000) AS i; 

Query:

 SELECT id, name, operation FROM ( SELECT row_number() OVER (PARTITION BY name, operation ORDER by id) AS rn ,id ,name ,operation ,y.cancel FROM ( SELECT name ,least(ct_del, ct_all - ct_del) AS cancel FROM ( SELECT name ,count(*) AS ct_all ,count(NULLIF(operation, 'insert')) AS ct_del FROM person GROUP BY 1 ) x WHERE (ct_all - ct_del) <> ct_del ) y JOIN person USING (name) ) p WHERE rn > cancel 

It ended up looking like a @Conrad request with a few simplifications / improvements. The decisive moment is the elimination of names that were canceled at the beginning of the game.

0
source

Source: https://habr.com/ru/post/905737/


All Articles