Histogram of a nested area in R

I ran a Pig job in a Hadoop cluster that crunched a bunch of data into something that R could handle to perform cohort analysis. I have the following script, and from the second to the last line, I have data in the format:

> names(data) [1] "VisitWeek" "ThingAge" "MyMetric" 

VisitWeek is the date. ThingAge and MyMetric are integers.

The data looks like this:

 2010-02-07 49 12345 

script I still have:

 # Load ggplot2 for charting library(ggplot2); # Our file has headers - column names data = read.table('weekly_cohorts.tsv',header=TRUE,sep="\t"); # Print the names names(data) # Convert to dates data$VisitWeek = as.Date(data$VisitWeek) data$ThingCreation = as.Date(data$ThingCreation) # Fill in the age column data$ThingAge = as.integer(data$VisitWeek - data$ThingCreation) # Filter data to thing ages lt 10 weeks (70 days) + a sanity check for gt 0, and drop the creation week column data = subset(data, data$ThingAge <= 70, c("VisitWeek","ThingAge","MyMetric")) data = subset(data, data$ThingAge >= 0) print(ggplot(data, aes(x=VisitWeek, y=MyMetric, fill=ThingAge)) + geom_area()) 

This last line does not work. I tried many variations, bars, histograms, but, as usual, R docs defeats me.

I want it to show a standard Excel-style table table - one time series for each ThingAge stacked over the x-axis for weeks with a date on the y-axis. An example of this type of diagram is given below: http://upload.wikimedia.org/wikipedia/commons/a/a1/Mk_Zuwanderer.png

I read the docs here: http://had.co.nz/ggplot2/geom_area.html and http://had.co.nz/ggplot2/geom_histogram.html and this blog http://chartsgraphs.wordpress.com/2008 / 10/05 / r-lattice-plot-beats-excel-stacked-area-trend-chart / , but I can't get it to work for me.

How can i achieve this?

+4
source share
4 answers
 library(ggplot2) set.seed(134) df <- data.frame( VisitWeek = rep(as.Date(seq(Sys.time(),length.out=5, by="1 day")),3), ThingAge = rep(1:3, each=5), MyMetric = sample(100, 15)) ggplot(df, aes(x=VisitWeek, y=MyMetric)) + geom_area(aes(fill=factor(ThingAge))) 

gives me the image below. I suspect your problem is to correctly indicate the fill display for the area graph: fill=factor(ThingAge)

enter image description here

+5
source

ggplot (data.set, aes (x = time, y = value, color = type)) + geom_area (aes (fill = Type), position = 'stack')

you need to provide the geom_area fill element as well as the stack (although this may be the default)

found here http://www.mail-archive.com/ r-help@r-project.org /msg84857.html

+2
source

I managed to get my result:

I loaded the stackedPlot () function from https://stat.ethz.ch/pipermail/r-help/2005-August/077475.html

Function (not mine, see link):

 stackedPlot = function(data, time=NULL, col=1:length(data), ...) { if (is.null(time)) time = 1:length(data[[1]]); plot(0,0 , xlim = range(time) , ylim = c(0,max(rowSums(data))) , t="n" , ... ); for (i in length(data):1) { # Die Summe bis zu aktuellen Spalte prep.data = rowSums(data[1:i]); # Das Polygon muss seinen ersten und letzten Punkt auf der Nulllinie haben prep.y = c(0 , prep.data , 0 ) prep.x = c(time[1] , time , time[length(time)] ) polygon(prep.x, prep.y , col=col[i] , border = NA ); } } 

Then I processed my data in a wide format. Then it worked!

 wide = reshape(data, idvar="ThingAge", timevar="VisitWeek", direction="wide"); stackedPlot(wide); 
+2
source

For me, integers worked in factors and using geom_bar, not geom_area:

 df<-expand.grid(x=1:10,y=1:6) df<-cbind(df,val=runif(60)) df$fx<-factor(df$x) df$fy<-factor(df$y) qplot(fy,val,fill=fx,data=df,geom='bar') 
+2
source

Source: https://habr.com/ru/post/905314/


All Articles