Paste together two data items by item in R

Question

Paste together two data items by item in R

I need to insert element by element, the contents of two data frames for input into another program. I have a data frame and a standard errors standard error data frame.

I tried using the R paste () function, but it doesn't seem to cope with data frames. When using a vector, it seems to combine all the elements of the first vector into a string and all the elements of the second into a separate row. Instead, I need every inverse element in two data frames to be merged together.

Any suggestions on how to approach this? I included dummy input (datMean and datSE) and my desired result (datNew). My real data frames are about 10 rows of 150 columns.

# means and SEM datMean <- data.frame(a=rnorm(10, 3), b=rnorm(10, 3), d=rnorm(10, 3)) datSE <- data.frame(a=rnorm(10, 3)/100, b=rnorm(10, 3)/100, d=rnorm(10, 3)/100) # what the output should look like # i've chosen some arbitrary values here, and show only the first row. datNew <- data.frame(a="2.889-2.926", b="1.342-1.389", d="2.569-2.576")

The idea is that each element in datNew is a range consisting of "average" and "average + se", separated by a dash "-". The paste () function can do this for a single element, how can this be done across the entire data file?

 paste(datMean[1,1] - datSE[1,1], datMean[1,1] + datSE[1,1], sep="-")

EDIT 1: Looking at some of the answers, I understand that I left important information in this question. Each row of the original data frames is called, and I need to restore the final data frame with these names. For instance:

 rownames(datMean) <- LETTERS[1:10] rownames(datSE) <- LETTERS[1:10]

I need datNew to eventually add these 10 outlets again. This can be problematic with some solutions using melt ().

+6

r

Steve Jun 20 '11 at 7:25

source share

4 answers

Here's how to do it without manually specifying each column. First, we make the data and put it into an array using the abind package, rounding to 3, because it looks better:

 datMean <- data.frame(a=rnorm(10, 3), b=rnorm(10, 3), d=rnorm(10, 3)) datSE <- data.frame(a=rnorm(10, 3)/100, b=rnorm(10, 3)/100, d=rnorm(10, 3)/100) library(abind) datArray <- round(abind(datMean,datSE,along=3),3)

Then we can apply the paste function to each element and column of this array:

 apply(datArray,1:2,function(x)paste(x[1]-x[2],"-",x[1]+x[2])) abd [1,] "3.537 - 3.581" "3.358 - 3.436" "3.282 - 3.312" [2,] "2.452 - 2.516" "1.372 - 1.44" "3.041 - 3.127" [3,] "3.017 - 3.101" "3.14 - 3.228" "5.238 - 5.258" [4,] "3.397 - 3.451" "2.783 - 2.839" "3.381 - 3.405" [5,] "1.918 - 1.988" "2.978 - 3.02" "3.44 - 3.504" [6,] "4.01 - 4.078" "3.014 - 3.068" "1.914 - 1.954" [7,] "3.475 - 3.517" "2.117 - 2.159" "1.871 - 1.929" [8,] "2.551 - 2.619" "3.907 - 3.975" "1.588 - 1.614" [9,] "1.707 - 1.765" "2.63 - 2.678" "1.316 - 1.348" [10,] "4.051 - 4.103" "3.532 - 3.628" "3.235 - 3.287"

+6

Sacha epskamp Jun 20 '11 at 8:05

source share

You can do this in each row at once, but you are using paired columns between two data.frames. Since you have a specific paste task that needs to be done each time, define a function:

 pfun <- function(x, y) paste(x - y, x + y, sep = "-")

and then create a new data.frame using the function:

  datNew <- data.frame(a = pfun(datMean$a, datSE$a), b = pfun(datMean$b, datSE$b), d = pfun(datMean$d, datSE$d))

There would be ways to use this, but perhaps it will help you better understand. You can pass integer columns to insert, but not integer data.frames.

Use a loop to match all columns in the result without specifying them separately.

First, create a list to store all the columns, we will convert it to data.frame with the names of the right columns.

 datNew <- vector("list", ncol(datMean))

Naming really assumes that the column number, names and order are an exact match between two input data frames.

 names(datNew) <- names(datMean) for (i in 1:ncol(datMean)) { datNew[[i]] <- pfun(datMean[[i]], datSE[[i]]) }

Convert to data.frame:

 datNew <- as.data.frame(datNew)

+2

mdsumner Jun 20 '11 at 7:39

source share

This is how I understand your problem. I melted the data for facilities and SE from multiple columns into one column using reshape2::melt .

 library(reshape2) datMean <- melt(datMean)$value datSE <- melt(datSE)$value dat <- cbind(datMean, datSE) apply(X = dat, MARGIN = 1, FUN = function(x) { paste(x[1] - x[2], x[1] + x[2], sep = " - ") })

And the result

  [1] "3.03886802467251 - 3.08551547263516" [2] "3.01803172559258 - 3.05247871975711" [3] "3.4609230722069 - 3.56097173966387" [4] "1.35368243309618 - 1.45548512578821" [5] "2.39936853846605 - 2.47570756724791" [6] "3.21849170272184 - 3.29653660329785"

EDIT

This solution matches your original data sizes. What I am doing is making a 3D array and working in each cell at a time while preserving the constant of the third dimension ( [x,y, 1:2] ).

 dat <- array(c(datMean, datSE), dim = c(10, 3, 2)) datNEW <- matrix(rep(NA, nrow(dat)*ncol(dat)), ncol = ncol(dat)) for (column in seq(ncol(dat))) { cls <- rep(NA, nrow(dat)) for (rows in seq(nrow(dat))) { tmp <- dat[rows, column, 1:2] cls[rows] <- paste(tmp[1] - tmp[2], tmp[1] + tmp[2], sep = " - ") } datNEW[, column] <- cls }

+2

Roman Luštrik Jun 20 '11 at 7:50

source share

Aaron · Accepted Answer · 2011-06-20T17:44:07+0000

If you first convert to matrices, you can do this without any applications or loops at all.

 MdatMean <- as.matrix(datMean) MdatSE <- as.matrix(datSE) matrix( paste(MdatMean - MdatSE, MdatMean + MdatSE, sep="-"), nrow=nrow(MdatMean), dimnames=dimnames(MdatMean) )

You can also consider formatC for better formatting.

 lo <- formatC(MdatMean - MdatSE, format="f", digits=3) hi <- formatC(MdatMean + MdatSE, format="f", digits=3) matrix( paste(lo, hi, sep="-"), nrow=nrow(MdatMean), dimnames=dimnames(MdatMean) )

If you want data.frame at the end just wrap the last line in as.data.frame .

Paste together two data items by item in R

More articles: