Remove duplicate column combinations from data frame in R

Question

Remove duplicate column combinations from data frame in R

I want to remove duplicate combinations of sessionid, qf and qn from the following data

sessionid qf qn city 1 9cf571c8faa67cad2aa9ff41f3a26e38 cat biddix fresno 2 e30f853d4e54604fd62858badb68113a caleb amos 3 2ad41134cc285bcc06892fd68a471cd7 daniel folkers 4 2ad41134cc285bcc06892fd68a471cd7 daniel folkers 5 63a5e839510a647c1ff3b8aed684c2a5 charles pierce flint 6 691df47f2df12f14f000f9a17d1cc40e j franz prescott+valley 7 691df47f2df12f14f000f9a17d1cc40e j franz prescott+valley 8 b3a1476aa37ae4b799495256324a8d3d carrie mascorro brea 9 bd9f1404b313415e7e7b8769376d2705 fred morales las+vegas 10 b50a610292803dc302f24ae507ea853a aurora lee 11 fb74940e6feb0dc61a1b4d09fcbbcb37 andrew price yorkville

I read the data as data.frame and call it mydata. Here is the code that I still have, but I need to know how to sort data.frame correctly. Second, remove duplicate combinations of sessionid, qf, and qn. And finally, the graph in the histogram symbols in the qf column

 sortDATA<-function(name) { #sort the code by session Id, first name, then last name sort1.name <- name[order("sessionid","qf","qn") , ] #create a vector of length of first names sname<-nchar(sort1.name$qf) hist(sname) }

thanks!

+4

r dataframe

megv Dec 7 '11 at 20:58

source share

4 answers

In your example, repeating lines were completely repeated. unique works with data.frames.

 udf <- unique( my.data.frame )

As for sorting ... joran just posted a response.

+3

John Dec 7 '11 at 21:18

source share

To solve sorting problems, first read the data in your examples:

 dat <- read.table(text = " sessionid qf qn city 1 9cf571c8faa67cad2aa9ff41f3a26e38 cat biddix fresno 2 e30f853d4e54604fd62858badb68113a caleb amos NA 3 2ad41134cc285bcc06892fd68a471cd7 daniel folkers NA 4 2ad41134cc285bcc06892fd68a471cd7 daniel folkers NA 5 63a5e839510a647c1ff3b8aed684c2a5 charles pierce flint 6 691df47f2df12f14f000f9a17d1cc40e j franz prescott+valley 7 691df47f2df12f14f000f9a17d1cc40e j franz prescott+valley 8 b3a1476aa37ae4b799495256324a8d3d carrie mascorro brea 9 bd9f1404b313415e7e7b8769376d2705 fred morales las+vegas 10 b50a610292803dc302f24ae507ea853a aurora lee NA 11 fb74940e6feb0dc61a1b4d09fcbbcb37 andrew price yorkville ",sep = "",header = TRUE)

and then you can use arrange from plyr ,

 arrange(dat,sessionid,qf,qn)

or using basic functions,

 with(dat,dat[order(sessionid,qf,qn),])

+1

joran Dec 7 '11 at 21:14

source share

It works if you use duplicate twice:

 > df abcd 1 1 2 A 1001 2 2 4 B 1002 3 3 6 B 1002 4 4 8 C 1003 5 5 10 D 1004 6 6 12 D 1004 7 7 13 E 1005 8 8 14 E 1006 > df[!(duplicated(df[c("c","d")]) | duplicated(df[c("c","d")], fromLast = TRUE)), ] abcd 1 1 2 A 1001 4 4 8 C 1003 7 7 13 E 1005 8 8 14 E 1006

+1

Prakhar agarwal Jun 22 '16 at 14:13

source share

Josh o'brien · Accepted Answer · 2011-12-07T21:07:11+0000

duplicated() has a method for data.frame s, which is intended only for this kind of task:

 df <- data.frame(a = c(1:4, 1:4), b = c(4:1, 4:1), d = LETTERS[1:8]) df[!duplicated(df[c("a", "b")]),] # abd # 1 1 4 A # 2 2 3 B # 3 3 2 C # 4 4 1 D

Remove duplicate column combinations from data frame in R

More articles: