How to split a string based on the general format of a split block?

I need to break a long string. Places where schisms should have nothing in common but the fact that they are dates followed by time. Therefore, I need to split the string based on the appearance of a specific pattern, namely dd/mm/yyyy, hh:mm. Although I know the functions strsplitand string manipulators associated with them, they don't seem to help. Sample data below.

25/06/15, 21:37 - kjadshjabsdjab
25/06/15, 21:39 - bsadhi2342/342jbjsd
25/06/15, 21:40 -hkgsad/213/1sadjaa
25/06/15, 21:41 - hsdjhakhjbk12/21s/sda:sdfjbj
25/06/15, 21:42 - jkadbsh2:/\sdsadjv
25/06/15, 21:42 -
+4
source share
3 answers

We can use regular expressions to separate

strsplit(str1, "(?<=[0-9]{2}:[0-9]{2})", perl = TRUE)

If we need to include "Date" as well

strsplit(str1, "(?<=[0-9]{2}/[0-9]{2}/[0-9]{2}, [0-9]{2}:[0-9]{2})", perl = TRUE)

If we do not need Date time, then

setdiff(strsplit(str1, "[0-9]{2}/[0-9]{2}/[0-9]{2}, [0-9]{2}:[0-9]{2}\\s*-\\s*")[[1]], "")
#[1] "kjadshjabsdjab"               "bsadhi2342/342jbjsd" 
#[3] "hkgsad/213/1sadjaa"           "hsdjhakhjbk12/21s/sda:sdfjbj" 
#[5] "jkadbsh2:/\\sdsadjv" 
+3
source

regex mutate + sub -, :

library(stringi)
library(purrr)

lines <- readLines(textConnection('25/06/15, 21:37 - kjadshjabsdjab\n25/06/15, 21:39 - bsadhi2342/342jbjsd\n25/06/15, 21:40 -hkgsad/213/1sadjaa\n25/06/15, 21:41 - hsdjhakhjbk12/21s/sda:sdfjbj\n25/06/15, 21:42 - jkadbsh2:/\\sdsadjv\n25/06/15, 21:42 -'))

stri_match_all_regex(lines, "([[:digit:]]{2}/[[:digit:]]{2}/[[:digit:]]{2}, [[:digit:]]{2}:[[:digit:]]{2})(.*)") %>%
  map_df(~setNames(as.list(.[,2:3]), c("ts", "string")))
## # A tibble: 6 × 2
##                ts                          string
##             <chr>                           <chr>
## 1 25/06/15, 21:37                - kjadshjabsdjab
## 2 25/06/15, 21:39           - bsadhi2342/342jbjsd
## 3 25/06/15, 21:40             -hkgsad/213/1sadjaa
## 4 25/06/15, 21:41  - hsdjhakhjbk12/21s/sda:sdfjbj
## 5 25/06/15, 21:42           - jkadbsh2:/\\sdsadjv
## 6 25/06/15, 21:42                               -
+2

"-", 15 . sapply substr :

> ss = "25/06/15, 21:37 - kjadshjabsdjab25/06/15, 21:39 - bsadhi2342/342jbjsd25/06/15, 21:40 - hkgsad/213/1sadjaa25/06/15, 21:41 - hsdjhakhjbk12/21s/sda:sdfjbj25/06/15, 21:42 - jkadbsh2:sdsadjv25/06/15, 21:42 -"
> 
> sapply(strsplit(ss, " - "), function(x) substr(x, 1, nchar(x)-15))
     [,1]                          
[1,] ""                            
[2,] "kjadshjabsdjab"              
[3,] "bsadhi2342/342jbjsd"         
[4,] "hkgsad/213/1sadjaa"          
[5,] "hsdjhakhjbk12/21s/sda:sdfjbj"
[6,] "jkadbsh2:sdsadjv25"          
+1
source

Source: https://habr.com/ru/post/1664833/


All Articles