How to turn multiple columns into observations

I have a data frame like this:

structure(list(one = structure(1:4, .Label = c("a", "b", "c", 
"d"), class = "factor"), two = c(2, 4, 7, 3), x.1 = c("x1a", 
"x1b", "x1c", "x1d"), x.2 = c("x2a", "x2b", "x2c", "x2d"), x.3 = c("x3a", 
"x3b", "x3c", "x3d"), y.1 = c(NA, "y1b", "y1c", NA), y.2 = c(NA, 
"y2b", "y2c", NA), y.3 = c(NA, "y3b", "y3c", NA)), .Names = c("one", 
"two", "x.1", "x.2", "x.3", "y.1", "y.2", "y.3"), row.names = c(NA, 
-4L), class = "data.frame")

As you can see, the observations for events a, b, c and d (the "one" variable) are stored in columns, where x and y define individual observations, and 1, 2 and 3 define variables. The variable "two" does not matter here.

I like to reformat this data frame so that it is neat in the form in which each observation has its own row, and each variable has its own column.

The final data frame should look like this:

structure(list(one = structure(c(1L, 2L, 2L, 3L, 3L, 4L), .Label = c("a", 
"b", "c", "d"), class = "factor"), two = c(2, 4, 2, 7, 5, 3), 
var1 = c("x1a", "x1b", "y1b", "x1c", "y1c", "x1d"), var2 = c("x2a", 
"x2b", "y2b", "x2c", "y2c", "x2d"), var3 = c("x3a", "x3b", 
"y3b", "x3c", "y3c", "x3d")), .Names = c("one", "two", "var1", 
"var2", "var3"), row.names = c(1L, 2L, 5L, 3L, 6L, 4L), class = "data.frame")

I am a little familiar with what the cast and melt function does from reshape packages, but so far I have not been able to find a way to change the shape of DF in a smart way. Currently, the following is the message I received:

df.between <- melt(df.in, id.vars=c("one", "two"))
df.between$variable <- gsub("x.|y.", "", df.between$variable)

"" (1, 2 3). , , - grepl.

, .

+4
3

melt devel data.table i.e. v1.9.5, patterns measure.

library(data.table)
melt(setDT(df1), measure=patterns('.1', '.2', '.3'),
      na.rm=TRUE, value.name=paste0('var', 1:3))[, variable:=NULL][order(one)]
#   one two var1 var2 var3
#1:   a   2  x1a  x2a  x3a
#2:   b   4  x1b  x2b  x3b
#3:   b   4  y1b  y2b  y3b
#4:   c   7  x1c  x2c  x3c
#5:   c   7  y1c  y2c  y3c
#6:   d   3  x1d  x2d  x3d

EDIT: c patterns, ( @Jaap).

+5

melt "data.table" , , merged.stack splitstackshape:

library(splitstackshape)
na.omit(merged.stack(mydf, var.stubs = c(".1", ".2", ".3"),
                     sep = "var.stubs", atStart = FALSE))

#    one two .time_1  .1  .2  .3
# 1:   a   2       x x1a x2a x3a
# 2:   b   4       x x1b x2b x3b
# 3:   b   4       y y1b y2b y3b
# 4:   c   7       x x1c x2c x3c
# 5:   c   7       y y1c y2c y3c
# 6:   d   3       x x1d x2d x3d
+3

You were almost there with a change of route, so I finished it for you. All you need is a differentiation between the variables x and y. (which are later easily removed if you do not want or do not need them). I left the blanks because they are easy to remove and prevent silent removal of missing data.

df.between <- melt(df.in, id.vars=c("one", "two"))
#replace with 'var' so no numeric column names.
df.between$variable_n <- gsub("x.|y.", "var", df.between$variable)
df.between$variable_xy <- gsub(".[0-9]","",df.between$variable)

res <- dcast(one+two+variable_xy~variable_n,value.var="value",data=df.between)

    > res
  one two variable_xy var1 var2 var3
1   a   2           x  x1a  x2a  x3a
2   a   2           y <NA> <NA> <NA>
3   b   4           x  x1b  x2b  x3b
4   b   4           y  y1b  y2b  y3b
5   c   7           x  x1c  x2c  x3c
6   c   7           y  y1c  y2c  y3c
7   d   3           x  x1d  x2d  x3d
8   d   3           y <NA> <NA> <NA>
+2
source

Source: https://habr.com/ru/post/1608042/


All Articles