Add a column to the data frame that shows the frequency of the variable

In R, little things always bother me.

Say I have a data frame like this:

location species 1 seattle A 2 buffalo C 3 seattle D 4 newark J 5 boston Q 

I would like to add a column to this frame that shows the number of times a location appears in the dataset, with a result similar to this:

  location species freq-loc 1 seattle A 2 #there are 2 entries with location=seattle 2 buffalo C 1 #there is 1 entry with location=buffalo 3 seattle D 2 4 newark J 1 5 boston Q 1 

I know that using table(data$location) can give me a contingency table. But I donโ€™t know how to match each value in the table with the corresponding record in the data frame. Can anyone help?

Update

Thank you so much for your help! Just for fun, I did a quiz to see how merge, plyr and ave solutions work compared to each other. The test suite is 10,000 rows of a subset of my original dataset of size 10 by ~ 7 mil .:

 Unit: milliseconds expr min lq median uq max neval MERGE 110.877337 111.989406 112.585420 113.51679 120.23588 100 PLYR 26.305645 27.080403 27.576580 27.87157 68.40763 100 AVE 2.994528 3.117255 3.179898 3.35834 10.02955 100 
+4
source share
4 answers

I'm sure someone will soon post (ugly;)) ave or plyr , but here data.table one:

 library(data.table) dt = data.table(your_df) dt[, `freq-loc` := .N, by = location] # note: using `-quotes around your var name, because of the "-" in the name 
+5
source

Here's the basic R path with ave .

 transform(d, freq.loc = ave(seq(nrow(d)), location, FUN=length)) 
+5
source

Trying to work with dashes in column names will be very painful. It is better to use underscores or "periods".

 dfrm$freq_loc <- ave( as.numeric(dat[[1]]), dat[["location"]] , FUN=length) 

I try to use ave without as.numeric in the first column, but to my surprise I received cryptic error messages related to factor levels.

+2
source

Merge:

 merge(data, data.frame(table(location = data$location)), by = c("location")) # location species Freq # 1 boston Q 1 # 2 buffalo C 1 # 3 newark J 1 # 4 seattle A 2 # 5 seattle D 2 

Also, I heard the plyr request:

 library(plyr) join(data, data.frame(table(location = data$location))) # Joining by: location # location species Freq # 1 seattle A 2 # 2 buffalo C 1 # 3 seattle D 2 # 4 newark J 1 # 5 boston Q 1 
+1
source

Source: https://habr.com/ru/post/1485489/


All Articles