You can use ave in the R base or a comparable approach in the package to create an id variable. Since some "PRF" values ββare missing, you probably also need to use cummax during the id creation phase.
Here are a few alternatives, all of which use @ G.Grothendieck sample data. My vote will go for the "data.table" approach.
DF <- data.frame( V1 = c("AHS", "BND", "MFN", "PRF", "AHS", "BND", "MFN"), V2 = c(3, 3.8, 4, 2, 4, 3.8, 4), stringsAsFactors = FALSE)
Base R: reshape
Notorious for its syntax ... and probably not recommended for working with 2M strings ....
reshape(within(DF, { id <- cummax(ave(V1, V1, FUN = seq_along)) }), direction = "wide", idvar = "id", timevar = "V1")
Base R: xtabs
Itβs easier to remember the syntax, but less flexible. Also returns matrix , so you will need to use as.data.frame.matrix if you want to get data.frame . Fills in missing values ββwith "0", which may be undesirable.
xtabs(V2 ~ id + V1, within(DF, { id <- cummax(ave(V1, V1, FUN = seq_along)) }))
"data.table"
Fast. Predictable behavior from dcast.data.table after behavior long established by dcast from "reshape2".
library(data.table) dcast.data.table( as.data.table(DF)[, id := sequence(.N), by = V1][, id := cummax(id)], id ~ V1, value.var = "V2") # id AHS BND MFN PRF # 1: 1 3 3.8 4 2 # 2: 2 4 3.8 4 NA