I need to find out the cumulative frequency, converted to percent, by a continuous variable. For instance:
data <- data.frame(n = sample(1:12), d = seq(10, 120, by = 10), Site = rep(c("FirstSite", "SecondSite"), 6), Plot = rep(c("Plot1", "Plot1", "Plot2", "Plot2"), 3) ) data <- with(data, data[order(Site,Plot),]) data <- transform(data, G = ((pi * (d/2)^2) * n) / 10000) data nd Site Plot G 1 7 10 FirstSite Plot1 0.05497787 5 9 50 FirstSite Plot1 1.76714587 9 12 90 FirstSite Plot1 7.63407015 3 10 30 FirstSite Plot2 0.70685835 7 5 70 FirstSite Plot2 1.92422550 11 1 110 FirstSite Plot2 0.95033178 2 3 20 SecondSite Plot1 0.09424778 6 8 60 SecondSite Plot1 2.26194671 10 6 100 SecondSite Plot1 4.71238898 4 4 40 SecondSite Plot2 0.50265482 8 2 80 SecondSite Plot2 1.00530965 12 11 120 SecondSite Plot2 12.44070691
I need the cumulative frequency of the G column by Plot~Site factors to plot the ggplot geom_step G versus d graph for each graph and site.
I calculated the total amount G by the coefficient:
data.ss <- by(data[, "G"], data[,c("Plot", "Site")], function(x) cumsum(x)) # Gtot (data.ss.tot <- sapply(ss, max)) [1] 9.456194 3.581416 7.068583 13.948671
Now I need to express each Plot G in the range [0..1], where 1 is G tot for each Plot . I suggest that I should split G into its Plot Gtot , and then apply a new cumsum to it. How to do it?
Please note that I have to compose this cumulative frequency with respect to d not G , so this is not a proper ecdf.
Thanks.
mbask source share