Creating a Pareto chart with ggplot2 and R

I struggled with how to make a Pareto diagram in R using the ggplot2 package. In many cases, when creating a histogram or a histogram, we want the elements to be sorted along the X axis. In the Pareto chart, we want the elements to be sorted in descending order by the Y axis. Is there a way to get ggplot to build elements ordered by value along the Y axis? At first I tried to sort the data frame, but ggplot seems to reorder them.

Example:

val <- read.csv("http://www.cerebralmastication.com/wp-content/uploads/2009/11/val.txt") val<-with(val, val[order(-Value), ]) p <- ggplot(val) p + geom_bar(aes(State, Value, fill=variable), stat = "identity", position="dodge") + scale_fill_brewer(palette = "Set1") 

The val data frame is sorted, but the output looks like this:

alt text
(source: cerebralmastication.com )

Hadley correctly pointed out that this provides a much better graph for displaying actual and forecasted values:

 ggplot(val, aes(State, Value)) + geom_bar(stat = "identity", subset = .(variable == "estimate"), fill = "grey70") + geom_crossbar(aes(ymin = Value, ymax = Value), subset = .(variable == "actual")) 

which returns:

alt text
(source: cerebralmastication.com )

But this is still not a Pareto chart. Any tips?

+19
r graph ggplot2
Nov 14 '09 at 20:46
source share
7 answers

The bands in ggplot2 are ordered by ordering the levels in a coefficient.

 val$State <- with(val, factor(val$State, levels=val[order(-Value), ]$State)) 
+15
Nov 15 '09 at 0:37
source share
โ€” -

A subset and sorting of your data;

 valact <- subset(val, variable=='actual') valsort <- valact[ order(-valact[,"Value"]),] 

From there it is just a standard boxplot() with a very manual cumulative function on top:

 op <- par(mar=c(3,3,3,3)) bp <- barplot(valsort [ , "Value"], ylab="", xlab="", ylim=c(0,1), names.arg=as.character(valsort[,"State"]), main="How that?") lines(bp, cumsum(valsort[,"Value"])/sum(valsort[,"Value"]), ylim=c(0,1.05), col='red') axis(4) box() par(op) 

which should look like this

alt text
(source: eddelbuettel.com )

and he doesnโ€™t even need a perplotting trick, since lines() joyfully comments on the initial chart.

+23
Nov 14 '09 at 21:20
source share

Traditional Pareto chart in ggplot2 .......

Developed after reading Cano, EL, Moguerza, JM, and Redchuk, A. (2012). Six Sigma with R. (G. Robert, K. Hornik and G. Parmigiani, Ed.) Springer.

 library(ggplot2);library(grid) counts <- c(80, 27, 66, 94, 33) defects <- c("price code", "schedule date", "supplier code", "contact num.", "part num.") dat <- data.frame(count = counts, defect = defects, stringsAsFactors=FALSE ) dat <- dat[order(dat$count, decreasing=TRUE),] dat$defect <- factor(dat$defect, levels=dat$defect) dat$cum <- cumsum(dat$count) count.sum<-sum(dat$count) dat$cum_perc<-100*dat$cum/count.sum p1<-ggplot(dat, aes(x=defect, y=cum_perc, group=1)) p1<-p1 + geom_point(aes(colour=defect), size=4) + geom_path() p1<-p1+ ggtitle('Pareto Chart')+ theme(axis.ticks.x = element_blank(), axis.title.x = element_blank(),axis.text.x = element_blank()) p1<-p1+theme(legend.position="none") p2<-ggplot(dat, aes(x=defect, y=count,colour=defect, fill=defect)) p2<- p2 + geom_bar() p2<-p2+theme(legend.position="none") plot.new() grid.newpage() pushViewport(viewport(layout = grid.layout(2, 1))) print(p1, vp = viewport(layout.pos.row = 1,layout.pos.col = 1)) print(p2, vp = viewport(layout.pos.row = 2,layout.pos.col = 1)) 
+7
Oct 11 '12 at 12:28
source share

With a simple example:

  > data PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 0.29056 0.23833 0.11003 0.05549 0.04678 0.03788 0.02770 0.02323 0.02211 0.01925 

barplot(data) does everything right

ggplot equivalent "should be": qplot(x=names(data), y=data, geom='bar')

But this incorrectly reorders / sorts the columns in alphabetical order ... because levels(factor(names(data))) will be ordered.

Solution: qplot(x=factor(names(data), levels=names(data)), y=data, geom='bar')

Phew!

+4
Mar 30 '10 at 18:47
source share

Also see the qcc package which has the pareto.chart() function. It looks like it also uses basic graphics, so start your reward for the ggplot2 solution :-)

+3
Nov 14 '09 at 22:03
source share

To simplify things, let's just look at ratings only.

 estimates <- subset(val, variable == "estimate") 

First, we reorder the factor levels so that State displayed in descending order of Value .

 estimates$State <- with(estimates, reorder(State, -Value)) 

Similarly, we reorder the data set and compute the cumulative value.

 estimates <- estimates[order(estimates$Value, decreasing = TRUE),] estimates$cumulative <- cumsum(estimates$Value) 

Now we are ready to draw a plot. The trick to getting a line and a bar on the same axes is to convert the State (factor) variable to numeric.

 p <- ggplot(estimates, aes(State, Value)) + geom_bar() + geom_line(aes(as.numeric(State), cumulative)) p 

As already mentioned in this question, an attempt to make two Pareto graphs of two variable groups next to each other is not very simple. You should probably use facet if you want several Pareto plots.

+1
Sep 28 '10 at 10:09
source share
 freqplot = function(x, by = NULL, right = FALSE) { if(is.null(by)) stop('Valor de "by" precisa ser especificado.') breaks = seq(min(x), max(x), by = by ) ecd = ecdf(x) den = ecd(breaks) table = table(cut(x, breaks = breaks, right = right)) table = table/sum(table) intervs = factor(names(table), levels = names(table)) freq = as.numeric(table/sum(table)) acum = as.numeric(cumsum(table)) normalize.vec = function(x){ (x - min(x))/(max(x) - min(x)) } dados = data.frame(classe = intervs, freq = freq, acum = acum, acum_norm = normalize.vec(acum)) p = ggplot(dados) + geom_bar(aes(classe, freq, fill = classe), stat = 'identity') + geom_point(aes(classe, acum_norm, group = '1'), shape = I(1), size = I(3), colour = 'gray20') + geom_line(aes(classe, acum_norm, group = '1'), colour = I('gray20')) p } 
0
Feb 21 '13 at 19:49
source share



All Articles