Most votes in R

I need to calculate the majority of the votes for an element in R, and I have no clue how to approach this.

I have a data frame with elements and assigned categories. I need the category that has been assigned most often. How can I do it?

Data frame:

item category 1 2 1 3 1 2 1 2 2 2 2 3 2 1 2 1 

The result should be:

 item majority_vote 1 2 2 1 
+4
source share
4 answers

You can use two things here. Firstly, this is how you get the most frequent element in a vector:

 > v = c(1,1,1,2,2) > names(which.max(table(v))) [1] "1" 

This is the meaning of the symbol, but we can easily use as.numeric on it if necessary.

As soon as we learn how to do this, we can use the grouping functionality of the data.table package to evaluate for each element for which the most common category is. Here is the code for your example above:

 > dt = data.table(item=c(1,1,1,1,2,2,2,2), category=c(2,3,2,2,2,3,1,1)) > dt item category 1: 1 2 2: 1 3 3: 1 2 4: 1 2 5: 2 2 6: 2 3 7: 2 1 8: 2 1 > dt[,as.numeric(names(which.max(table(category)))),by=item] item V1 1: 1 2 2: 2 1 

The new column V1 contains a numerical version of the most common category for each item. If you want to give it the correct name, the syntax is a little uglier:

 > dt[,list(mostFreqCat=as.numeric(names(which.max(table(category))))),by=item] item mostFreqCat 1: 1 2 2: 2 1 
+6
source

One liner (using plyr ):

 ddply(dt, .(item), function(x) which.max(tabulate(x$category))) 
+3
source
  tdat <- tapply(dat$category, dat$item, function(vec) sort(table(vec), decreasing=TRUE)[1] ) data.frame(item=rownames(tdat), plurality_vote=tdat) item plurality_vote 1 1 3 2 2 2 

A more complex function is required to distinguish between plurality (possibly connections) from the true majority.

+1
source

If you have a function to calculate the mode, as in the prettyR package, you can use aggregate :

 require(prettyR) aggregate(d$category, by=list(item=d$item), FUN=Mode) # item x #1 1 2 #2 2 1 
+1
source

Source: https://habr.com/ru/post/1487159/


All Articles