Summation matrix. Getting average values ​​for each class of 100,000 units

I have the following data structure.

pos <- c(4532568,4541529,4586529,4591235,4712360,4732504,4740231,10532655,10542365,10564587,45312567,45326354,45369874,124832658,124845829,124869874)
cm <- c(2.21,2.25,2.26,2.29,3.31,3.35,3.36,4.32,4.35,4.39,5.23,5.27,5.29,7.36,7.45,7.49)
data <- cbind(pos,cm)

            pos   cm
 [1,]   4532568 2.21
 [2,]   4541529 2.25
 [3,]   4586529 2.26
 [4,]   4591235 2.29
 [5,]   4712360 3.31
 [6,]   4732504 3.35
 [7,]   4740231 3.36
 [8,]  10532655 4.32
 [9,]  10542365 4.35
 [10,]  10564587 4.39
 [11,]  45312567 5.23
 [12,]  45326354 5.27
 [13,]  45369874 5.29
 [14,] 124832658 7.36
 [15,] 124845829 7.45
 [16,] 124869874 7.49

My intention is to summarize the row grouping per 100,000 units in the "pos" column and get the average value of the "CM" column for each class. The result in this example will look like this:

pos <- c(4500000,4700000,10500000,45300000,124800000)
cm <- c(2.2525,3.34,4.35333,5.26333,7.43333)
newdata <- cbind(pos,cm)

           pos      cm
[1,]   4500000 2.25250
[2,]   4700000 3.34000
[3,]  10500000 4.35333
[4,]  45300000 5.26333
[5,] 124800000 7.43333

I do not know how to automate the process to work with a huge data frame.

The answer to this question is Akrun: So. If I use the following script in my real data set:

 Ch1<- ch1 %>%
 as.data.frame %>% 
 group_by(Pos = plyr::round_any(Pos, 1e5, f = floor))

Then I get the following result (only the first 10 lines)

 structure(list(Chr = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
 1L, 1L, 1L), .Label = "1", class = "factor"), Pos = c(0, 0, 0, 
 2e+05, 5e+05, 5e+05, 5e+05, 5e+05, 5e+05, 7e+05), CM = c(0, 0.080572, 
 0.092229, 0.439456, 1.478148, 1.478214, 1.480558, 1.488889, 1.489481, 
 1.931794)), .Names = c("Chr", "Pos", "CM"), row.names = c(NA, 
 -10L), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), vars = "Pos", drop = TRUE, indices = list(
 0:2, 3L, 4:8, 9L), group_sizes = c(3L, 1L, 5L, 1L), biggest_group_size = 5L, labels = structure(list(
 Pos = c(0, 2e+05, 5e+05, 7e+05)), row.names = c(NA, -4L), class = "data.frame", vars = "Pos", drop = TRUE, .Names = "Pos"))

However, if I use the whole script to get the average values ​​of Ch1 $ CM:

 Ch1<- ch1 %>%
 as.data.frame %>% 
 group_by(Pos = plyr::round_any(Pos, 1e5, f = floor)) %>% 
 summarise(cm = mean(cm))

Then I get the following data.frame file:

 structure(list(Pos = c(0, 2e+05, 5e+05, 7e+05, 8e+05, 9e+05, 
 1e+06, 1100000, 1200000, 1300000), cm = c(4.528498, 4.528498, 
 4.528498, 4.528498, 4.528498, 4.528498, 4.528498, 4.528498, 4.528498, 
 4.528498)), .Names = c("Pos", "cm"), row.names = c(NA, -10L), class = c("tbl_df", 
 "tbl", "data.frame"))

As you can see, the averages are incorrect because they are all equal. I do not know why this is happening.

+4
1

round_any

library(dplyr)
data %>%
    as.data.frame %>% 
    group_by(grp = plyr::round_any(pos, 1e5, f = floor)) %>% 
    summarise(cm = mean(cm))
# A tibble: 5 x 2
#        grp       cm
#      <dbl>    <dbl>
#1   4500000 2.252500
#2   4700000 3.340000
#3  10500000 4.353333
#4  45300000 5.263333
#5 124800000 7.433333
+6

Source: https://habr.com/ru/post/1691347/


All Articles