Combine data columns using two conditions using a population

Question

Combine data columns using two conditions using a population

I have such a matrix

PABC 1 2 0 5 2 1 1 3 3 0 4 7 1 1 1 0 3 1 1 0 3 0 2 1 2 3 3 4

I want to combine / sort rows by P and by each column. Thus, each P value for each column is once and the value for each P in each column is summed. The result should be:

  PABC 1 3 0 0 1 0 1 0 1 0 0 5 2 4 0 0 2 0 4 0 2 0 0 7 3 1 0 0 3 0 7 0 3 0 0 8

I already tried aggregate , but it only helps me to sum each P value for all columns, so I only have one row for each P.

+5

r aggregate

Miguel123 Dec 19 '16 at 9:53

source share

4 answers

One idea is to split the data frame by P and use a user-defined function ( fun1 ) that creates a matrix with 0 and replaces the diagonal with the sum of the columns. i.e.

 fun1 <- function(x){ m1 <- matrix(0, ncol = ncol(x) - 1, nrow = ncol(x) - 1) diag(m1) <- sapply(x[-1], sum) return(m1) } l1 <- split(df, df$P) do.call(rbind, lapply(l1, fun1)) # [,1] [,2] [,3] # [1,] 3 0 0 # [2,] 0 1 0 # [3,] 0 0 5 # [4,] 4 0 0 # [5,] 0 4 0 # [6,] 0 0 7 # [7,] 1 0 0 # [8,] 0 7 0 # [9,] 0 0 8

Or, to get the desired result,

 final_df <- as.data.frame(cbind(rep(names(l1), each = ncol(df)-1), do.call(rbind, lapply(l1, fun1)))) names(final_df) <- names(df) final_df # PABC #1 1 3 0 0 #2 1 0 1 0 #3 1 0 0 5 #4 2 4 0 0 #5 2 0 4 0 #6 2 0 0 7 #7 3 1 0 0 #8 3 0 7 0 #9 3 0 0 8

+4

Sotos Dec 19 '16 at 10:20

source share

We get the maximum frequency value from the column 'P' ('i1'), aggregate columns grouped by 'P' to get sum ('df2'), replicate the rows' df2 'to' i1 ', split data set to' P 'and change the off-diagonal elements in other columns to 0 and return it as data.frame , order and change the row names to NULL.

 i1 <- max(table(df1$P)) df2 <- aggregate(.~P, df1, sum) df3 <- df2[rep(1:nrow(df2), i1)] res <- unsplit(lapply(split(df3, df3$P), function(x) { x[-1] <- diag(3)*x[-1] x}), df3$P) res1 <- res[order(res$P),] row.names(res1) <- NULL res1 # PABC #1 1 3 0 0 #2 1 0 1 0 #3 1 0 0 5 #4 2 4 0 0 #5 2 0 4 0 #6 2 0 0 7 #7 3 1 0 0 #8 3 0 7 0 #9 3 0 0 8

Or using data.table , convert 'data.frame' to 'data.table' ( setDT(df1) ), loop through a subset of Data.table ( .SD ), get a sum grouped by “P”, copy the lines of the generic set data and change the off-diagonal elements to 0 (as discussed in the first solution).

 library(data.table) setDT(df1)[, lapply(.SD, sum), by = P ][rep(1:.N, i1) ][, .SD*diag(ncol(df1)-1), by = P] # PABC #1: 1 3 0 0 #2: 1 0 1 0 #3: 1 0 0 5 #4: 2 4 0 0 #5: 2 0 4 0 #6: 2 0 0 7 #7: 3 1 0 0 #8: 3 0 7 0 #9: 3 0 0 8

Or using dplyr

 library(dplyr) library(purrr) d1 <- as.data.frame(diag(i1)) df2 <- df1 %>% group_by(P) %>% summarise_each(funs(sum)) %>% replicate(i1, ., simplify = FALSE) %>% bind_rows() %>% arrange(P) df2[-1] <- map2(df2[-1], d1, ~.x * .y) df2 # A tibble: 9 × 4 # PABC # <int> <dbl> <dbl> <dbl> #1 1 3 0 0 #2 1 0 1 0 #3 1 0 0 5 #4 2 4 0 0 #5 2 0 4 0 #6 2 0 0 7 #7 3 1 0 0 #8 3 0 7 0 #9 3 0 0 8

+3

akrun Dec 19 '16 at 9:57

source share

If I didn’t miss something, it looks the same. Start by calculating the sums on the "P":

 s = as.matrix(rowsum(dat[-1], dat$P))

Create the final matrix:

 k = s[rep(1:nrow(s), each = ncol(s)), ]

Calculate indices to replace with "0" s:

 k[col(k) != (row(k) - 1) %% ncol(k) + 1] = 0 k # ABC #1 3 0 0 #1 0 1 0 #1 0 0 5 #2 4 0 0 #2 0 4 0 #2 0 0 7 #3 1 0 0 #3 0 7 0 #3 0 0 8

Data:

 dat = structure(list(P = c(1L, 2L, 3L, 1L, 3L, 3L, 2L), A = c(2L, 1L, 0L, 1L, 1L, 0L, 3L), B = c(0L, 1L, 4L, 1L, 1L, 2L, 3L), C = c(5L, 3L, 7L, 0L, 0L, 1L, 4L)), .Names = c("P", "A", "B", "C"), class = "data.frame", row.names = c(NA, -7L))

Having calculated s , user20650 is a simpler alternative:

 matrix(diag(ncol(s)), nrow(s) * ncol(s), ncol(s), byrow = TRUE) * c(t(s))

or, also, messing around with other interesting alternatives on the same idea:

 kronecker(rep_len(1, nrow(s)), diag(ncol(s))) * c(t(s)) diag(ncol(s))[rep(1:ncol(s), nrow(s)), ] * s[rep(1:nrow(s), each = ncol(s)), ]

+1

alexis_laz Dec 19 '16 at 12:00

source share

Chirayu chamoli · Accepted Answer · 2016-12-19T10:53:15+0000

Another idea is to use the diag function to create a matrix. Then you can link these matrices together.

 xx=aggregate(. ~ P, df, sum) yy=xx[,-1] yy=as.data.frame(t(yy)) cbind(rep(1:ncol(yy),nrow(yy)),do.call("rbind", lapply(yy, function(xx) diag(xx, nrow = nrow(yy), ncol = nrow(yy))))) [,1] [,2] [,3] [,4] [1,] 1 3 0 0 [2,] 2 0 1 0 [3,] 3 0 0 5 [4,] 1 4 0 0 [5,] 2 0 4 0 [6,] 3 0 0 7 [7,] 1 1 0 0 [8,] 2 0 7 0 [9,] 3 0 0 8

Combine data columns using two conditions using a population

More articles: