How to remove duplicate rows in sqlite where each column containing id is a duplicate?

I am using sqlite. I am importing a dataset where an identifier is assigned externally to a temporary table before inserting them into my permanent table. The persistent table uses an external identifier (RunId) and does not have another id column.

I am importing a table from csv into a new table, Book1, where C15 is the identifier column. Then I started the insert:

INSERT INTO PrimusRuns (RunId, TransientName, RunDateStart, RunType, TestDateStart, Gross, CPS, Shares, MaxExposure, PercentWin, Duration) SELECT a.C15, a.C1, JULIANDAY(a.C2), a.C3,JULIANDAY(a.C4), a.C6, a.C8, a.C9, a.C10, a.C11, a.C14 FROM Book1 as a; 

however, I get a primary key constraint error:

 [19] [SQLITE_CONSTRAINT_PRIMARYKEY] A PRIMARY KEY constraint failed (UNIQUE constraint failed: PrimusRuns.RunID) 

At first, I thought some of these rows are already in the table:

 SELECT * FROM Book1 WHERE C15 IN( SELECT RunID from PrimusRuns ); 

returns nothing.

Then I realized that when importing, the lines are repeated:

 SELECT * FROM Book1 GROUP BY C15 HAVING COUNT(*) > 1 

This aggregate query returns 95 rows, which means you must delete at least 95 rows. How to remove it to remove duplicates?

NOTE. There are other questions like this, however my question is different in that the identifier is also a duplicate. Other questions group all other columns and remove max (id). But in my case max id returns both rows not one.

+5
source share
1 answer

For what is connected only with deleting the duplicated row representation in C15, you can find the entire min (id) group in C15 .. so this is the only row for each C15 value and delete the others, for example

  delete from book1 where id not in ( select min(id) from Book1 group by C15) 

but if you have complete exactly duplicated lines, you can use these steps.

1) You can create a temporary table with a separate result for duplication, for example:

 create table my_temp_distinct as select col1, col2 ... from Book1 group by col1, col2, ... having count(*)> 1 

2) then delete all lines with duplicate lines

  delete from book1 where id in ( select min(id) from Book1 group by C15) 

3) and the last insert using the selected form of parked rows

 insert into Book1 (col1, col2 ....) select col1, col2, ... from my_temp_distinct 
+1
source

Source: https://habr.com/ru/post/1275648/


All Articles