Building the top 5 values from a table in R

Question

Building the top 5 values from a table in R

I am very new to R, so this might be a simple question. I have a data table that contains frequency numbers of these kinds:

Acidobacteria 47 Actinobacteria 497 Apicomplexa 7 Aquificae 16 Arthropoda 26 Ascomycota 101 Bacillariophyta 1 Bacteroidetes 50279 ...

The table has about 50 species. As you can see, some of the values are much larger than others. I would like to have a stacked tablet with 5 percent views and one “other” category, which has the sum of all other percentages. So my barplot would have only 6 categories (5 and others).

I have 3 additional data sets (examples of sites) that I would like to do the same, to select only the first data set of 5 in each of these data sets and put them on the same chart. The last chart will have four folded bars showing how the top views in the first dataset change in each additional dataset.

I made a sample plot manually (tabulated data outside of R and just loaded into the final percent table) to give you an idea of what I'm looking for: http://dl.dropbox.com/u/1938620/phylumSum2.jpg

I would like to take these steps in an R script so that I can create these graphs for many datasets.

Thanks!

+6

r plot

helicase Sep 7 '11 at 18:36

source share

2 answers

We should make it a habit to use data.table wherever possible:

 library(data.table) DT<-data.table(DF,key="Count") DT[order(-rank(Count), Species)[6:nrow(DT)],Species:="Other"] DT<-DT[, list(Count=sum(Count),Pcnt=sum(Count)/DT[,sum(Count)]),by="Species"]

+1

andrekos Nov 29 '13 at 1:58

source share

Brian diggs · Accepted Answer · 2011-09-07T18:57:33+0000

Say your data is in the data.frame DF file

 DF <- read.table(textConnection( "Acidobacteria 47 Actinobacteria 497 Apicomplexa 7 Aquificae 16 Arthropoda 26 Ascomycota 101 Bacillariophyta 1 Bacteroidetes 50279"), stringsAsFactors=FALSE) names(DF) <- c("Species","Count")

Then you can determine which species are at the top of 5 on

 top5Species <- DF[rev(order(DF$Count)),"Species"][1:5]

Each of the data sets can then be converted to these 5 and “others” using

 DF$Group <- ifelse(DF$Species %in% top5Species, DF$Species, "Other") DF$Group <- factor(DF$Group, levels=c(top5Species, "Other")) DF.summary <- ddply(DF, .(Group), summarise, total=sum(Count)) DF.summary$prop <- DF.summary$total / sum(DF.summary$total)

When creating a Group coefficient stores them all in the same order in DF.summary (from the largest to the smallest in the first data set).

Then you simply stack them and plot them in the same way as in your example.

Building the top 5 values ​​from a table in R

We should make it a habit to use data.table wherever possible:

More articles:

Building the top 5 values from a table in R