Split a string into substrings of a given length with the remainder

For a string such as:

text <- "abcdefghijklmnopqrstuvwxyz" 

I would like to cut a string into substrings, for example a length of 10, and save the remainder:

 "abcdefghij" "klmnopqrst" "uvwxyz" 

All the methods that I know for creating substrings do not give me a residual substring with 6 characters. I have tried answers to previous similar questions, such as:

 > substring(text, seq(1, nchar(text), 10), seq(10, nchar(text), 10)) [1] "abcdefghij" "klmnopqrst" "" 

Any advice on how to get all the substrings of the desired length and any remaining strings would be greatly appreciated.

+6
source share
3 answers

The vectors that you use for the first and last arguments in substring can exceed the number of characters in a string without errors / warnings / problems. So you can do

 text <- "abcdefghijklmnopqrstuvwxyz" sq <- seq.int(to = nchar(text), by = 10) substring(text, sq, sq + 9) # [1] "abcdefghij" "klmnopqrst" "uvwxyz" 
+8
source

Try

 strsplit(text, '(?<=.{10})', perl=TRUE)[[1]] #[1] "abcdefghij" "klmnopqrst" "uvwxyz" 

Or you can use library(stringi) for a faster approach

 library(stringi) stri_extract_all_regex(text, '.{1,10}')[[1]] #[1] "abcdefghij" "klmnopqrst" "uvwxyz" 
+10
source

The following is an example of using strapplyc using a fairly simple regular expression. It works because .{1,10} always matches the longest line with a maximum of 10 characters:

 library(gsubfn) strapplyc(text, ".{1,10}", simplify = c) 

giving:

 [1] "abcdefghij" "klmnopqrst" "uvwxyz" 

Visualization . This regular expression is simple enough that it doesn’t actually need visualization, but here it’s all the same:

 .{1,10} 

Regular expression visualization

Demo version of Debuggex

+3
source

Source: https://habr.com/ru/post/979653/


All Articles