Summing a data frame that ignores repetition

I have a data frame in which there are duplicate records in one column. I want to generalize other columns based on this column. I want the summary to take into account each unique record, and not the total amount when compiling the summary. For example, in the example data frame below, if I want to answer the question , how many respondents are young, early and old? "RefID" 1-1 is taken at the expense of 1 in the summation of "ageclass" = young and is not interpreted as account 5.

RefID   Altitude    Sex ageclass
1-1 Low F   young
1-1 Low F   young
1-1 Low F   young
1-1 Low F   young
1-1 Low F   young
1-2 Low F   midage
1-2 Low F   midage
1-2 Low F   midage
1-2 Low F   midage
1-2 Low F   midage
1-2 Low F   midage
1-2 Low F   midage
1-2 Low F   midage
1-2 Low F   midage
1-2 Low F   midage
1-2 Low F   midage
1-2 Low F   midage
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-4 Low F   old
1-4 Low F   old
1-4 Low F   old
1-4 Low F   old
1-4 Low F   old
1-4 Low F   old
1-4 Low F   old
1-4 Low F   old
1-4 Low F   old
1-4 Low F   old
1-4 Low F   old
1-4 Low F   old
1-5 Low F   old
1-5 Low F   old
1-5 Low F   old
1-5 Low F   old
1-5 Low F   old
1-5 Low F   old
1-5 Low F   old
1-7 Low F   old
1-7 Low F   old
1-7 Low F   old
1-7 Low F   old
1-8 Low F   old
1-8 Low F   old
1-9 Low F   old
1-9 Low F   old
1-9 Low F   old

Thank.

+3
source share
3 answers

To get unique entries in a data frame, see? uniqe:

Data <- unique(Mydata)

You can use:

by(Data,Data$ageclass,summary)

. ?summary, . , table, :

table(Data$RefID,Data$ageclass)

:

margin.table(table(Data$RefID,Data$ageclass),margin=2)

: , unique() . refID 1-1, . , . , :

with(unique(Data[c(1,4)]),margin.table(table(RefID,ageclass),margin=2))

plyr, .

+2

plyr . . :

> require(plyr)
> ddply( df, .(ageclass), summarise, Num = length(unique(RefID)))
  ageclass Num
1   midage   1
2      old   6
3    young   1
+2

subset , duplicated , , . :

df <- data.frame(
   ID=rep(1:5,each=5),
   attitude="low",
   sex=c(rep("F",10),rep("M",15)),
   age=c(rep("young",5),rep("middle",10),rep("old",10))
   )

Then you can create a subset that is written only the first time each identifier is entered:

df.sub <- subset(df,!duplicated(df$ID))

Then you can summarize:

> summary(df.sub$age)
middle    old  young 
     2      2      1 
0
source

Source: https://habr.com/ru/post/1792511/


All Articles