Separation line without loss of characters

I want to split strings into a specific character by storing that character in the second resulting string. I can achieve almost all the desired operations, except that I lose the characters that I specify in strsplit , which I assume is called the delimiter.

Is there a way to request strsplit keep the delimiter? Or should I use some kind of regular expression? Thanks for any advice. This seems like a very simple question. Sorry if this is a duplicate. I prefer to use base R.

Here is an example showing what I have so far:

 my.table <- read.table(text = ' model npar AICc AA(~region+state+county+city)BB(~region+state+county+city)CC(~1) 17 11111.11 AA(~region+state+county)BB(~region+state+county)CC(~123) 14 22222.22 AA(~region+state)BB(~region+state)CC(~33) 13 33333.33 AA(~region)BB(~region)CC(~4321) 6 44444.44 ', header = TRUE, stringsAsFactors = FALSE) desired.result <- read.table(text = ' model CC npar AICc AA(~region+state+county+city)BB(~region+state+county+city) CC(~1) 17 11111.11 AA(~region+state+county)BB(~region+state+county) CC(~123) 14 22222.22 AA(~region+state)BB(~region+state) CC(~33) 13 33333.33 AA(~region)BB(~region) CC(~4321) 6 44444.44 ', header = TRUE, stringsAsFactors = FALSE) split.model <- strsplit(my.table$model, 'CC\\(') split.models <- matrix(unlist(split.model), ncol=2, byrow=TRUE, dimnames = list(NULL, c("model", "CC"))) desires.result2 <- data.frame(split.models, my.table[,2:ncol(my.table)]) desires.result2 # model CC npar AICc # 1 AA(~region+state+county+city)BB(~region+state+county+city) ~1) 17 11111.11 # 2 AA(~region+state+county)BB(~region+state+county) ~123) 14 22222.22 # 3 AA(~region+state)BB(~region+state) ~33) 13 33333.33 # 4 AA(~region)BB(~region) ~4321) 6 44444.44 
+4
source share
3 answers

The basic idea is to use look-around operations from regular expressions in strsplit to get the desired result. However, this is a bit more complicated than with strsplit and a positive look. Read this great post from @ JoshO'Brien for an explanation.

 pattern <- "(?<=\\))(?=CC)" strsplit(my.table$model, pattern, perl=TRUE) # [[1]] # [1] "AA(~region+state+county+city)BB(~region+state+county+city)" # [2] "CC(~1)" # [[2]] # [1] "AA(~region+state+county)BB(~region+state+county)" # [2] "CC(~123)" # [[3]] # [1] "AA(~region+state)BB(~region+state)" "CC(~33)" # [[4]] # [1] "AA(~region)BB(~region)" "CC(~4321)" 

Of course, I leave the task do.call(rbind, ...) and cbind to get the final desired.output for you.

+9
source

Almost immediately after I sent the message, I thought about using gsub to insert a space and then split into a space. Although, I like that Arun answers better.

 my.table <- read.table(text = ' model npar AICc AA(~region+state+county+city)BB(~region+state+county+city)CC(~1) 17 11111.11 AA(~region+state+county)BB(~region+state+county)CC(~123) 14 22222.22 AA(~region+state)BB(~region+state)CC(~33) 13 33333.33 AA(~region)BB(~region)CC(~4321) 6 44444.44 ', header = TRUE, stringsAsFactors = FALSE) my.table$model <- gsub("CC", " CC", my.table$model) split.model <- strsplit(my.table$model, ' ') split.models <- matrix(unlist(split.model), ncol=2, byrow=TRUE, dimnames = list(NULL, c("model", "CC"))) desires.result <- data.frame(split.models, my.table[,2:ncol(my.table)]) desires.result # model CC npar AICc # 1 AA(~region+state+county+city)BB(~region+state+county+city) CC(~1) 17 11111.11 # 2 AA(~region+state+county)BB(~region+state+county) CC(~123) 14 22222.22 # 3 AA(~region+state)BB(~region+state) CC(~33) 13 33333.33 # 4 AA(~region)BB(~region) CC(~4321) 6 44444.44 
0
source

... why not just peel off the separator further? It would seem to save a ton of problems with regular expressions.

 split.model <- lapply(strsplit(my.table$model, 'CC\\('), function(x) { x[2] <- paste0("CC(", x[2]) x }) 
0
source

Source: https://habr.com/ru/post/1491190/


All Articles