How to remove duplicate rows in SQL Server 2008?

Question

How to remove duplicate rows in SQL Server 2008?

+3

sql-server-2008

Gold Oct 31 '09 at 19:28

source share

4 answers

- CTE ( ). , ; , , , - , , - - .

:

WITH numbered AS (
    SELECT ROW_NUMBER() OVER(PARTITION BY [dupe-column-list] ORDER BY [dupe-column-list]) AS _dupe_num FROM [table-name] WHERE 1=1
)
DELETE FROM numbered WHERE _dupe_num > 1;

"dupe-column-list" - , , . ORDER BY - , , , "" . ( " 1 = 1" - .)

, , , Sql Server , CTE. , DELETE, , , , CTE. ( , "DELETE" "SELECT *", , , , .)

:

CREATE TABLE ##_dupes (col1 int, col2 int, col3 varchar(50));
INSERT INTO ##_dupes 
    VALUES (1, 1, 'one,one')
        , (2, 2, 'two,two')
        , (3, 3, 'three,three')
        , (1, 1, 'one,one')
        , (1, 2, 'one,two')
        , (3, 3, 'three,three')
        , (1, 1, 'one,one')
        , (1, 2, '1,2');

8 5 ; 3 . :

SELECT col1
    , col2
    , col3
    , COUNT(1) AS _total 
    FROM ##_dupes 
    WHERE 1=1 
    GROUP BY col1, col2, col3
    HAVING COUNT(1) > 1
    ORDER BY _total DESC;

, , 1 .

WITH numbered AS (
    SELECT ROW_NUMBER() OVER(PARTITION BY col1, col2, col3 ORDER BY col1, col2, col3) AS _dupe_num FROM ##_dupes WHERE 1=1
)
DELETE FROM numbered WHERE _dupe_num > 1;

5 , .

+11

Granger 23 . '11 21:14

Primary, .

delete from (Tablename)
          where tablename.%%physloc%%
          NOT IN (select MIN(b.%%physloc%%)
          from tablename b
          group by b.Column1,b.column2,b.column3
          );

+3

Yugndhar Feb 13 '13 at 6:00

source share

Assuming you have a primary key called id and the other columns are col2 ... coln, and by “duplicate” rows you mean all rows where all column values are duplicated except PK

delete from A where id not in
(select min(id) from A
group by col2, col3, ...coln) as x

i.e. group in all columns other than PK

0

davek Oct 31 '09 at 19:32

source share

AWhitford · Accepted Answer · 2009-10-31T22:07:02+0000

Add primary key. Seriously, each table should have one. This may be a person, and you can ignore it, but make sure that each individual table has a specific primary key.

Imagine you have a table like:

create table T (
    id int identity,
    colA varchar(30) not null,
    colB varchar(30) not null
)

Then you can say something like:

delete T
from T t1
where exists
(select null from T t2
where t2.colA = t1.colA
and t2.colB = t1.colB
and t2.id <> t1.id)

Another trick is to select individual entries with a minimum id and save them:

delete T
where id not in
(select min(id) from T
group by colA, colB)

(Sorry, I have not tested them, but one of these ideas may lead you to your decision.)

Note: if you do not have a primary key, the only way to do this is to use a pseudo-column, for example ROWID, but I'm not sure if SQL Server 2008 offers this idea.

How to remove duplicate rows in SQL Server 2008?

More articles: