Splitting a long line into smaller lines

I have a dataframe that includes a column of numbers like this:

360010001001002 360010001001004 360010001001005 360010001001006 

I would like to break the pieces into 2 digits, 3 digits, 5 digits, 1 digit, 4 digits:

 36 001 00010 0 1002 36 001 00010 0 1004 36 001 00010 0 1005 36 001 00010 0 1006 

It seems like it should be simple, but I am reading the strsplit documentation and I cannot figure out how to do this in length.

+6
source share
5 answers

Assuming this data:

 x <- c("360010001001002", "360010001001004", "360010001001005", "360010001001006") 

try the following:

 read.fwf(textConnection(x), widths = c(2, 3, 5, 1, 4)) 

If x is numeric, replace x with as.character(x) in this statement.

+4
source

You can use substring (assuming the string / number length is fixed):

 xx <- c(360010001001002, 360010001001004, 360010001001005, 360010001001006) out <- do.call(rbind, lapply(xx, function(x) as.numeric(substring(x, c(1,3,6,11,12), c(2,5,10,11,15))))) out <- as.data.frame(out) 
+8
source

Functional Version:

 split.fixed.len <- function(x, lengths) { cum.len <- c(0, cumsum(lengths)) start <- head(cum.len, -1) + 1 stop <- tail(cum.len, -1) mapply(substring, list(x), start, stop) } a <- c(360010001001002, 360010001001004, 360010001001005, 360010001001006) split.fixed.len(a, c(2, 3, 5, 1, 4)) # [,1] [,2] [,3] [,4] [,5] # [1,] "36" "001" "00010" "0" "1002" # [2,] "36" "001" "00010" "0" "1004" # [3,] "36" "001" "00010" "0" "1005" # [4,] "36" "001" "00010" "0" "1006" 
+4
source

(Wow, this task is incredibly awkward and painful compared to Python. Anyhoo ...)

PS Now I see that your main intention was to convert the substring length vector to index pairs. You can use cumsum() and then sort the indices together:

 ll <- c(2,3,5,1,4) sort( c(1, cumsum(ll), (cumsum(ll)+1)[1:(length(ll)-1)]) ) # now extract these as pairs. 

But it is rather painful. flodel answer is better for this.

As for the real problem of splitting df columns into df and does it efficiently:

stringr::str_sub() blends elegantly with plyr::ddply() / ldply

 require(plyr) require(stringr) df <- data.frame(value=c(360010001001002,360010001001004,360010001001005,360010001001006)) df$valc = as.character(df$value) df <- ddply(df, .(value), mutate, chk1=str_sub(valc,1,2), chk3=str_sub(valc,3,5), chk6=str_sub(valc,6,10), chk11=str_sub(valc,11,11), chk14=str_sub(valc,12,15) ) # value valc chk1 chk3 chk6 chk11 chk14 # 1 360010001001002 360010001001002 36 001 00010 0 1002 # 2 360010001001004 360010001001004 36 001 00010 0 1004 # 3 360010001001005 360010001001005 36 001 00010 0 1005 # 4 360010001001006 360010001001006 36 001 00010 0 1006 
0
source

You can use this function from stringi package

 splitpoints <- cumsum(c(2, 3, 5, 1,4)) stri_sub("360010001001002",c(1,splitpoints[-length(splitpoints)]+1),splitpoints) 
0
source

Source: https://habr.com/ru/post/944484/


All Articles