Split a column of a table into multiple columns based on input delimiters

Perhaps I want to make this impossible, so I asked this question to find out if there is a way or not. After reading this question here in stackoverflow, I saw that there is a way to split the column in different columns, but this is not what I wanted, I have a brilliant application where I can have tables with values ​​such as:

Phones                             price
Nokia 1234D - J298 6732 - LM 2       103$
Samsung 3342L - J2YY 4372 - YU 3     130$
Samsung 3042X - IKAA 3221 - GN 4    102$

So, the user comes and says that I want to separate these values ​​in the Phones column as I want, so the idea that came to my mind was for the user to write something like ("," - "," " , "-"), because I mean separate nokia, 1234D, J298, 6732, LM2 in 5 columns with the specified delimiters.

Here is a sample code:

library(stringr)
c=c(" "," - "," "," - ")
mytable <-data.table(Phones=c("Nokia 1234D - J298 6732 - LM 2",      
                                      "Samsung 3342L - J2YY 4372 - YU 3",
                                      "Samsung 3042X - IKAA 3221 - GN 4"),price= c("103$", "130$", "102$") )
 aux = str_split_fixed(mytable$Phones, c, 5)
  mytable<-data.table( aux, mytable$price)

But I get the following result, which is not what I want it to separate on my own, duplicates the first line:

              V1        V2   V3   V4          V5   V2
1:         Nokia     1234D    - J298 6732 - LM 2 103$
2: Samsung 3342L J2YY 4372 YU 3                  130$
3:       Samsung     3042X    - IKAA 3221 - GN 4 102$
4:   Nokia 1234D J298 6732 LM 2                  103$     

If you have a better solution, this will be very helpful.

+4
source share
1 answer

separate "" 5 extra= "merge" "LM 2", "YU 3" ..

library(tidyr)
library(dplyr)
mytable %>% 
  separate(Phones, into = paste0("V", 1:5), remove = FALSE, extra = "merge")
#                             Phones      V1    V2   V3   V4   V5 price
#1:   Nokia 1234D - J298 6732 - LM 2   Nokia 1234D J298 6732 LM 2  103$
#2: Samsung 3342L - J2YY 4372 - YU 3 Samsung 3342L J2YY 4372 YU 3  130$
#3: Samsung 3042X - IKAA 3221 - GN 4 Samsung 3042X IKAA 3221 GN 4  102$

, extract

mytable %>%
   extract(Phones, into = paste0("V", 1:4), remove = FALSE, 
     "^(\\w+\\s+\\w+)\\s*-\\s*(\\w+)\\s+(\\w+)\\s*-\\s*(\\w+\\s+\\w+)")
#                             Phones            V1   V2   V3   V4 price
#1:   Nokia 1234D - J298 6732 - LM 2   Nokia 1234D J298 6732 LM 2  103$
#2: Samsung 3342L - J2YY 4372 - YU 3 Samsung 3342L J2YY 4372 YU 3  130$
#3: Samsung 3042X - IKAA 3221 - GN 4 Samsung 3042X IKAA 3221 GN 4  102$

^ , (\\w+), (\\s+) (\\w+), ((...)) ,

. , , - ,


"c"

library(stringr)
c <- c(" "," - "," "," - ") #it is better to avoid function names for object names
fsplit <- function(str1, splt) {
       lst <- str_split(str1, splt, n = 2)
       v1 <- sapply(lst, `[`, 1)
       v2 <- sapply(lst, `[`, 2)
       list(v1, v2)
    }  

mytable[, V5 := Phones]
nm1 <- paste0("V", seq_along(c))
for(i in seq_along(c)){
  tmp <- fsplit(mytable$V5, c[i])
  mytable[, (nm1[i]) := tmp[[1]]]
  mytable[, V5 := tmp[[2]]][]
}
setcolorder(mytable,  c("Phones", nm1, "V5", "price"))
mytable
#                             Phones      V1    V2   V3   V4   V5 price
#1:   Nokia 1234D - J298 6732 - LM 2   Nokia 1234D J298 6732 LM 2  103$
#2: Samsung 3342L - J2YY 4372 - YU 3 Samsung 3342L J2YY 4372 YU 3  130$
#3: Samsung 3042X - IKAA 3221 - GN 4 Samsung 3042X IKAA 3221 GN 4  102$
+3

Source: https://habr.com/ru/post/1684288/


All Articles