Most votes in R

Question

Most votes in R

I need to calculate the majority of the votes for an element in R, and I have no clue how to approach this.

I have a data frame with elements and assigned categories. I need the category that has been assigned most often. How can I do it?

Data frame:

item category 1 2 1 3 1 2 1 2 2 2 2 3 2 1 2 1

The result should be:

 item majority_vote 1 2 2 1

+4

r

nantoki Jun 19 '13 at 21:33

source share

4 answers

One liner (using plyr ):

 ddply(dt, .(item), function(x) which.max(tabulate(x$category)))

+3

topchef Jun 20 '13 at 4:47

source share

  tdat <- tapply(dat$category, dat$item, function(vec) sort(table(vec), decreasing=TRUE)[1] ) data.frame(item=rownames(tdat), plurality_vote=tdat) item plurality_vote 1 1 3 2 2 2

A more complex function is required to distinguish between plurality (possibly connections) from the true majority.

+1

42- Jun 19 '13 at 22:11

source share

If you have a function to calculate the mode, as in the prettyR package, you can use aggregate :

 require(prettyR) aggregate(d$category, by=list(item=d$item), FUN=Mode) # item x #1 1 2 #2 2 1

+1

Ferdinand.kraft Jun 20 '13 at 1:31

source share

asieira · Accepted Answer · 2013-06-19T21:51:49+0000

You can use two things here. Firstly, this is how you get the most frequent element in a vector:

 > v = c(1,1,1,2,2) > names(which.max(table(v))) [1] "1"

This is the meaning of the symbol, but we can easily use as.numeric on it if necessary.

As soon as we learn how to do this, we can use the grouping functionality of the data.table package to evaluate for each element for which the most common category is. Here is the code for your example above:

 > dt = data.table(item=c(1,1,1,1,2,2,2,2), category=c(2,3,2,2,2,3,1,1)) > dt item category 1: 1 2 2: 1 3 3: 1 2 4: 1 2 5: 2 2 6: 2 3 7: 2 1 8: 2 1 > dt[,as.numeric(names(which.max(table(category)))),by=item] item V1 1: 1 2 2: 2 1

The new column V1 contains a numerical version of the most common category for each item. If you want to give it the correct name, the syntax is a little uglier:

 > dt[,list(mostFreqCat=as.numeric(names(which.max(table(category))))),by=item] item mostFreqCat 1: 1 2 2: 2 1

Most votes in R

More articles: