Extract fixed-length character in R

I have an attribute consisting of DNA sequences and would like to translate it into his name. So I need to break the sequence into a character of fixed length, which is 3. Here is an example of data

data=c("AATAGACGT","TGACCC","AAATCACTCTTT")

How can I extract it:

[1] "AAT" "AGA" "CGT"
[2] "TGA" "CCC" 
[3] "AAA" "TCA" "CTC" "TTT"

So far, I can only find how to break a string by specifying a specific regular expression as a delimiter

+4
source share
4 answers

Try

strsplit(data, '(?<=.{3})', perl=TRUE)

or

library(stringi)
stri_extract_all_regex(data, '.{1,3}')
+5
source
as.list(gsub("(.{3})", "\\1 ", data))
[[1]]
[1] "AAT AGA CGT "

[[2]]
[1] "TGA CCC "

[[3]]
[1] "AAA TCA CTC TTT "

or

 regmatches(data, gregexpr(".{3}", data))
[[1]]
[1] "AAT" "AGA" "CGT"

[[2]]
[1] "TGA" "CCC"

[[3]]
[1] "AAA" "TCA" "CTC" "TTT"
+3
source

, , , ( lapply):

lapply(data, function(u) substring(u, seq(1, nchar(u), 3), seq(3, nchar(u),3)))
#[[1]]
#[1] "AAT" "AGA" "CGT"

#[[2]]
#[1] "TGA" "CCC"

#[[3]]
#[1] "AAA" "TCA" "CTC" "TTT"
+3

:

library(gsubfn)
strapply(data, "...")
+1

Source: https://habr.com/ru/post/1584659/


All Articles