Split line with first number

I would like to split the lines between the last letter and the first number:

dat <- read.table(text = " xy a1 0.1 a2 0.2 a3 0.3 a4 0.4 df1 0.1 df2 0.2 df13 0.3 df24 0.4 fcs111 0.1 fcs912 0.2 fcs113 0.3 fcsb8114 0.4", header=TRUE, stringsAsFactors=FALSE) desired.result <- read.table(text = " x1 x2 ya 1 0.1 a 2 0.2 a 3 0.3 a 4 0.4 df 1 0.1 df 2 0.2 df 13 0.3 df 24 0.4 fcs 111 0.1 fcs 912 0.2 fcs 113 0.3 fcsb 8114 0.4", header=TRUE, stringsAsFactors=FALSE) 

There are a number of similar questions in StackOverflow, but I cannot find this exact situation. I know that this should be the main question. If I put a couple of hours into it, I probably could figure it out. I'm sorry. Thank you for any suggestions. I prefer the base R. If it is a duplicate, I can delete it.

+6
source share
4 answers

You can use the strsplit function and provide a regex template for the split argument

 cbind(dat, do.call(rbind, strsplit(dat$x, split = "(?<=[a-zA-Z])(?=[0-9])", perl = T))) ## xy 1 2 ## 1 a1 0.1 a 1 ## 2 a2 0.2 a 2 ## 3 a3 0.3 a 3 ## 4 a4 0.4 a 4 ## 5 df1 0.1 df 1 ## 6 df2 0.2 df 2 ## 7 df13 0.3 df 13 ## 8 df24 0.4 df 24 ## 9 fcs111 0.1 fcs 111 ## 10 fcs912 0.2 fcs 912 ## 11 fcs113 0.3 fcs 113 ## 12 fcsb8114 0.4 fcsb 8114 
+4
source

You can use search queries:

 (?<=[a-zA-Z])(?=[0-9]) 
+5
source

Method using gsub and strsplit :

 data.frame(do.call(rbind, strsplit(gsub("([a-zA-Z])([0-9])", "\\1_\\2", dat$x), "_")), y = dat$y) ## X1 X2 y ## 1 a 1 0.1 ## 2 a 2 0.2 ## 3 a 3 0.3 ## 4 a 4 0.4 ## 5 df 1 0.1 ## 6 df 2 0.2 ## 7 df 13 0.3 ## 8 df 24 0.4 ## 9 fcs 111 0.1 ## 10 fcs 912 0.2 ## 11 fcs 113 0.3 ## 12 fcsb 8114 0.4 

This shows what happens at each stage:

 (a <- gsub("([a-zA-Z])([0-9])", "\\1_\\2", dat$x)) (b <- strsplit(a, "_")) (d <- do.call(rbind, b)) data.frame(d, y = dat$y) 
+2
source

The stringr package makes this somewhat more readable. In the following example, [[:alpha:]] and [[:digit:]] are language independent character classes for letters and numbers, respectively.

 library(stringr) matches <- str_match(dat$x, "([[:alpha:]]+)([[:digit:]])") desired.result <- data.frame( x1 = matches[, 2], x2 = as.numeric(matches[, 3]), y = dat$y ) 
+1
source

Source: https://habr.com/ru/post/957317/


All Articles