Generating a matrix, applying a function to the entire possible combination of variables in r

Here is my small data set, and here is the function:

dat <- data.frame (
 A1 = c("AA", "AA", "AA", "AA"),
 B1 = c("BB", "BB", "AB", "AB"), 
 C1 = c("AB", "BB", "AA", "AB"))

Function

syfun <- function (x, y){

if(x == "AA" & y == "AA" | x == "BB" & y == "BB"){
        sxy = 1
}
if(x == "AA" & y == "AB" | x == "AB" & y == "AA"){
    sxy = 0.5
}
if (x == "AA" & y == "BB"| x == "BB" & y == "AA"){
    sxy = 0
}
return(sxy)
}

out <- rep (NA, NROW(dat))

for (i in 1:NROW(dat)){
out[i] <- syfun (dat[i,1], dat[i,1])
}

mean(out)
1

Here, what I'm trying to do is apply a function with the first column (variable A) with the same variable (variable A1) and average the output value. I want to save this output in a matrix cell.

Similarly between A1 and B1.

   for (i in 1:NROW(dat)){
    out[i] <- syfun (dat[i,1], dat[i,2])
    }
    mean(out)
    0.25

Now, like a correlation matrix, I want to save all possible combination between a variable to make such a matrix.

         A1    B1    C1
A1       1.0  0.25  0.5
B1       0.25  1.0  NA
C1       0.5   NA   1.0

Editing: A more complete function that does not create NA

syfun <- function (x, y){
  sxy <- NA
  if(x == "AA" & y == "AA" | x == "BB" & y == "BB"){
        sxy = 1
  }
  if(x == "AA" & y == "AB" | x == "AB" & y == "AA"){
        sxy = 0.5
  }
  if (x == "AA" & y == "BB"| x == "BB" & y == "AA"){
        sxy = 0
  }
  if (x == "BB" & y == "AB"| x == "AB" & y == "BB"){
        sxy = 0.5
  }

  if(x == "AB" & y ==  "AB") {
    sxy = 0.5
    }
  return(sxy)
}
+4
source share
2 answers

-, syfun NA, . , :

syfun <- function (x, y){
  sxy <- NA
  if(x == "AA" & y == "AA" | x == "BB" & y == "AA"){
        sxy = 1
  }
  if(x == "AA" & y == "AB" | x == "AB" & y == "AA"){
        sxy = 0.5
  }
  if (x == "AA" & y == "BB"| x == "BB" & y == "AA"){
        sxy = 0
  }
  return(sxy)
}

-, outer . Vectorize :

mat <- outer(names(dat), names(dat), function(x, y) 
  Vectorize(function(a, b) mean(Vectorize(syfun)(dat[[a]], dat[[b]])))(x,y))

-, 1:

diag(mat) <- 1

-, :

dimnames(mat) <- list(names(dat), names(dat))

:

     A1   B1  C1
A1 1.00 0.25 0.5
B1 0.25 1.00  NA
C1 0.50   NA 1.0
+3

, As As , . , : ( , ?)

dat <- data.frame (
 A1 = c("AA", "AA", "AA", "AA"),
 B1 = c("BB", "BB", "AB", "AB"), 
 C1 = c("AB", "BB", "AA", "AB"))

## this function takes the columns from dat,  pastes all the genes together, then counts the number of each that appears. It then divides the smaller by the larger to give you a percent similar (only does it for 'A' right now, but I could expand that to more genes if necessary)

fun <-  function(x,y){
  x.prop <- table(unlist(strsplit(Reduce(paste0, x),'*')))
  y.prop <- table(unlist(strsplit(Reduce(paste0, y),'*')))
  ans <- ifelse(x.prop['A']>y.prop['A'], y.prop['A']/x.prop['A'], x.prop['A']/y.prop['A'])
  return(ans)
}

final_mat <- matrix(ncol=3,nrow=3) ## creates an empty final matrix
colnames(final_mat) <- colnames(dat)  
rownames(final_mat) <- colnames(dat)


### this applies 'fun' to each of the 2 combinations of column names
final_mat[upper.tri(final_mat)] <- apply(combn(colnames(dat),2),2,function(x) fun(dat[,x[1]], dat[,x[2]]))

final_mat[lower.tri(final_mat)] <- apply(combn(colnames(dat),2),2,function(x) fun(dat[,x[1]], dat[,x[2]]))

diag(final_mat) <- 1

final_mat
     A1   B1  C1
A1 1.00 0.25 0.5
B1 0.25 1.00 0.5
C1 0.50 0.50 1.0
+2

Source: https://habr.com/ru/post/1532867/


All Articles