I have the following data structure.
pos <- c(4532568,4541529,4586529,4591235,4712360,4732504,4740231,10532655,10542365,10564587,45312567,45326354,45369874,124832658,124845829,124869874)
cm <- c(2.21,2.25,2.26,2.29,3.31,3.35,3.36,4.32,4.35,4.39,5.23,5.27,5.29,7.36,7.45,7.49)
data <- cbind(pos,cm)
pos cm
[1,] 4532568 2.21
[2,] 4541529 2.25
[3,] 4586529 2.26
[4,] 4591235 2.29
[5,] 4712360 3.31
[6,] 4732504 3.35
[7,] 4740231 3.36
[8,] 10532655 4.32
[9,] 10542365 4.35
[10,] 10564587 4.39
[11,] 45312567 5.23
[12,] 45326354 5.27
[13,] 45369874 5.29
[14,] 124832658 7.36
[15,] 124845829 7.45
[16,] 124869874 7.49
My intention is to summarize the row grouping per 100,000 units in the "pos" column and get the average value of the "CM" column for each class. The result in this example will look like this:
pos <- c(4500000,4700000,10500000,45300000,124800000)
cm <- c(2.2525,3.34,4.35333,5.26333,7.43333)
newdata <- cbind(pos,cm)
pos cm
[1,] 4500000 2.25250
[2,] 4700000 3.34000
[3,] 10500000 4.35333
[4,] 45300000 5.26333
[5,] 124800000 7.43333
I do not know how to automate the process to work with a huge data frame.
The answer to this question is Akrun: So. If I use the following script in my real data set:
Ch1<- ch1 %>%
as.data.frame %>%
group_by(Pos = plyr::round_any(Pos, 1e5, f = floor))
Then I get the following result (only the first 10 lines)
structure(list(Chr = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = "1", class = "factor"), Pos = c(0, 0, 0,
2e+05, 5e+05, 5e+05, 5e+05, 5e+05, 5e+05, 7e+05), CM = c(0, 0.080572,
0.092229, 0.439456, 1.478148, 1.478214, 1.480558, 1.488889, 1.489481,
1.931794)), .Names = c("Chr", "Pos", "CM"), row.names = c(NA,
-10L), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), vars = "Pos", drop = TRUE, indices = list(
0:2, 3L, 4:8, 9L), group_sizes = c(3L, 1L, 5L, 1L), biggest_group_size = 5L, labels = structure(list(
Pos = c(0, 2e+05, 5e+05, 7e+05)), row.names = c(NA, -4L), class = "data.frame", vars = "Pos", drop = TRUE, .Names = "Pos"))
However, if I use the whole script to get the average values of Ch1 $ CM:
Ch1<- ch1 %>%
as.data.frame %>%
group_by(Pos = plyr::round_any(Pos, 1e5, f = floor)) %>%
summarise(cm = mean(cm))
Then I get the following data.frame file:
structure(list(Pos = c(0, 2e+05, 5e+05, 7e+05, 8e+05, 9e+05,
1e+06, 1100000, 1200000, 1300000), cm = c(4.528498, 4.528498,
4.528498, 4.528498, 4.528498, 4.528498, 4.528498, 4.528498, 4.528498,
4.528498)), .Names = c("Pos", "cm"), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
As you can see, the averages are incorrect because they are all equal. I do not know why this is happening.