Separate character in column and name

I want to separate characters. Despite the fact that I have a large working frame for work, the following is a small example to show what needs to be done.

mydf <- data.frame (name = c("L1", "L2", "L3"), M1 = c("AC", "AT", NA), M2 = c("CC", "--", "TC"), M3 = c("AT", "TT", "AG")) 

I want to split characters for variables M1-M3 (in a real dataset I have> 6000 variables)

  name M1a M1b M2a M2b M3a M3b L1 ACCCAT L2 AT - - TT L3 NA NA TCAG 

I tried the following codes:

 func<- function(x) {sapply( strsplit(x, ""), match, table= c("A","C","T","G", "--", NA))} odataframe <- data.frame(apply(mydf, 1, func) ) colnames(odataframe) <- paste(rep(names(mydf), each = 2), c("a", "b"), sep = "") odataframe 
+4
source share
2 answers

Here you go:

 splitCol <- function(x){ x <- as.character(x) x[is.na(x)] <- "$$" z <- matrix(unlist(strsplit(x, split="")), ncol=2, byrow=TRUE) z[z=="$"] <- NA z } newdf <- as.data.frame(do.call(cbind, lapply(mydf[, -1], splitCol))) names(newdf) <- paste(rep(names(mydf[, -1]), each=2), c("a", "b"), sep="") newdf <- data.frame(mydf[, 1, drop=FALSE], newdf) newdf name M1a M1b M2a M2b M3a M3b 1 L1 ACCCAT 2 L2 AT - - TT 3 L3 <NA> <NA TCAG 
+3
source

Andrie code as a reproducible function

 splitCol <- function(dataframe, splitVars=names(dataframe)){ split.DF <- dataframe[,splitVars] keep.DF <- dataframe[, !names(dataframe) %in% c(splitVars)] X <- function(x)matrix(unlist(strsplit(as.character(x), split="")), ncol=2, byrow=TRUE) newdf <- as.data.frame(do.call(cbind, suppressWarnings(lapply(split.DF, X))) ) names(newdf) <- paste(rep(names(split.DF), each=2), c(".a", ".b"), sep="") data.frame(keep.DF,newdf) } 

Check him

 splitCol(mydf) splitCol(mydf, c('M1','M2')) 

Please do not vote for the correct answer. Andri's answer is, of course, the first correct answer. It is simply an extension of its code for more situations. Thanks for the question and thanks for the Andrie code.

+1
source

Source: https://habr.com/ru/post/1379219/


All Articles