Splitting a long line into smaller lines

Question

Splitting a long line into smaller lines

I have a dataframe that includes a column of numbers like this:

360010001001002 360010001001004 360010001001005 360010001001006

I would like to break the pieces into 2 digits, 3 digits, 5 digits, 1 digit, 4 digits:

 36 001 00010 0 1002 36 001 00010 0 1004 36 001 00010 0 1005 36 001 00010 0 1006

It seems like it should be simple, but I am reading the strsplit documentation and I cannot figure out how to do this in length.

+6

string split r

Amanda May 07 '13 at 10:07 PM

source share

5 answers

You can use substring (assuming the string / number length is fixed):

 xx <- c(360010001001002, 360010001001004, 360010001001005, 360010001001006) out <- do.call(rbind, lapply(xx, function(x) as.numeric(substring(x, c(1,3,6,11,12), c(2,5,10,11,15))))) out <- as.data.frame(out)

+8

Arun May 07, '13 at 22:14

source share

Functional Version:

 split.fixed.len <- function(x, lengths) { cum.len <- c(0, cumsum(lengths)) start <- head(cum.len, -1) + 1 stop <- tail(cum.len, -1) mapply(substring, list(x), start, stop) } a <- c(360010001001002, 360010001001004, 360010001001005, 360010001001006) split.fixed.len(a, c(2, 3, 5, 1, 4)) # [,1] [,2] [,3] [,4] [,5] # [1,] "36" "001" "00010" "0" "1002" # [2,] "36" "001" "00010" "0" "1004" # [3,] "36" "001" "00010" "0" "1005" # [4,] "36" "001" "00010" "0" "1006"

+4

flodel May 07, '13 at 22:32

source share

(Wow, this task is incredibly awkward and painful compared to Python. Anyhoo ...)

PS Now I see that your main intention was to convert the substring length vector to index pairs. You can use cumsum() and then sort the indices together:

 ll <- c(2,3,5,1,4) sort( c(1, cumsum(ll), (cumsum(ll)+1)[1:(length(ll)-1)]) ) # now extract these as pairs.

But it is rather painful. flodel answer is better for this.

As for the real problem of splitting df columns into df and does it efficiently:

stringr::str_sub() blends elegantly with plyr::ddply() / ldply

 require(plyr) require(stringr) df <- data.frame(value=c(360010001001002,360010001001004,360010001001005,360010001001006)) df$valc = as.character(df$value) df <- ddply(df, .(value), mutate, chk1=str_sub(valc,1,2), chk3=str_sub(valc,3,5), chk6=str_sub(valc,6,10), chk11=str_sub(valc,11,11), chk14=str_sub(valc,12,15) ) # value valc chk1 chk3 chk6 chk11 chk14 # 1 360010001001002 360010001001002 36 001 00010 0 1002 # 2 360010001001004 360010001001004 36 001 00010 0 1004 # 3 360010001001005 360010001001005 36 001 00010 0 1005 # 4 360010001001006 360010001001006 36 001 00010 0 1006

0

smci Mar 09 '14 at 15:18

source share

You can use this function from stringi package

 splitpoints <- cumsum(c(2, 3, 5, 1,4)) stri_sub("360010001001002",c(1,splitpoints[-length(splitpoints)]+1),splitpoints)

0

bartektartanus Mar 13 '14 at 11:43

source share

G. grothendieck · Accepted Answer · 2013-05-08T01:05:27+0000

Assuming this data:

 x <- c("360010001001002", "360010001001004", "360010001001005", "360010001001006")

try the following:

 read.fwf(textConnection(x), widths = c(2, 3, 5, 1, 4))

If x is numeric, replace x with as.character(x) in this statement.

Splitting a long line into smaller lines

More articles: