Convert row to column data by a specific row name in R

Question

Convert row to column data by a specific row name in R

Hey, so I'm pretty new to R and only know some features. I have row data of about 2,000,000 rows.

Raw data is similar to this, the product has four types of tariffs (AHS, BND, MFN, PRF). Some data have PRF, and some do not. The goal is to convert the tariff of each item into a column by type of tariff.

AHS 3.00 BND 3.80 MFN 4.00 PRF 2.00 AHS 4.00 BND 3.80 MFN 4.00

How to convert raw data as follows:

 AHS BND MFN PRF 3.00 3.80 4.00 2.00 4.00 3.80 4.00 NA

I tried rbind, for those who do not have PRF, R will assign AHS PRF.

Can someone tell me how to do this conversion? Many thanks!

+5

merge r transpose transformation

StatCC Oct 3 '14 at 23:13

source share

2 answers

You can use ave in the R base or a comparable approach in the package to create an id variable. Since some "PRF" values are missing, you probably also need to use cummax during the id creation phase.

Here are a few alternatives, all of which use @ G.Grothendieck sample data. My vote will go for the "data.table" approach.

 DF <- data.frame( V1 = c("AHS", "BND", "MFN", "PRF", "AHS", "BND", "MFN"), V2 = c(3, 3.8, 4, 2, 4, 3.8, 4), stringsAsFactors = FALSE)

Base R: `reshape`

Notorious for its syntax ... and probably not recommended for working with 2M strings ....

 reshape(within(DF, { id <- cummax(ave(V1, V1, FUN = seq_along)) }), direction = "wide", idvar = "id", timevar = "V1")

Base R: `xtabs`

It’s easier to remember the syntax, but less flexible. Also returns matrix , so you will need to use as.data.frame.matrix if you want to get data.frame . Fills in missing values with "0", which may be undesirable.

 xtabs(V2 ~ id + V1, within(DF, { id <- cummax(ave(V1, V1, FUN = seq_along)) }))

"data.table"

Fast. Predictable behavior from dcast.data.table after behavior long established by dcast from "reshape2".

 library(data.table) dcast.data.table( as.data.table(DF)[, id := sequence(.N), by = V1][, id := cummax(id)], id ~ V1, value.var = "V2") # id AHS BND MFN PRF # 1: 1 3 3.8 4 2 # 2: 2 4 3.8 4 NA

+3

A5C1D2H2I1M1N2O1R2T1 Oct 4 '14 at 4:21

source share

G. grothendieck · Accepted Answer · 2014-10-03T23:35:10+0000

Create a grp variable that is 1 for the first group, second for the second, etc. Then use tapply

 grp <- cumsum(DF$V1 == "AHS") tapply(DF$V2, list(grp, DF$V1), sum)

giving:

  AHS BND MFN PRF 1 3 3.8 4 2 2 4 3.8 4 NA

We used this as data:

 DF <- data.frame(V1 = c("AHS", "BND", "MFN", "PRF", "AHS", "BND", "MFN"), V2 = c(3, 3.8, 4, 2, 4, 3.8, 4), stringsAsFactors = FALSE)

Convert row to column data by a specific row name in R

Base R: reshape

Base R: xtabs

"data.table"

More articles:

Base R: `reshape`

Base R: `xtabs`