Split a large string into substrings

I have a huge line like:

ABCDEFGHIJKLM ...

and I would like to split it into substrings of length 5 as follows:

<p → 1
Abcde
> 2
Bcdef
> 3
CDEFG

[...]

UPDATE

Decision:
ok, thank you guys, I was able to find a way to do it fast !. This is my solution combining several ideas from here:

str = "ABCDEFGHIJKLMNOP"
splitfive () {echo $ 1 | cut -c $ 2- | sed -r 's / (. {5}) / \ 1 \ n / g'; }
for ((i = 0; i <= 5; i ++)); do splitfive "$ str" $ i; done | grep -v "^ $"

+6
source share
8 answers
${string:position:length} 

Retrieves the $ length characters of a substring from $ string at position $.

 stringZ=abcABC123ABCabc # 0123456789..... # 0-based indexing. echo ${stringZ:0} # abcABC123ABCabc echo ${stringZ:1} # bcABC123ABCabc echo ${stringZ:7} # 23ABCabc echo ${stringZ:0:5} # abcAB # Five characters of substring. 

Then use a loop to jump and add 1 to the position to extract each substring of length 5.

 for i in seq 0 ${#stringZ}; do echo ${stringZ:$i:5} done 

All of Bash String Processing

+12
source

sed can do it in one shot:

 kent$ echo "abcdefghijklmnopqr"|sed -r 's/(.{5})/\1 /g' abcde fghij klmno pqr 

or

depends on your needs:

 kent$ echo "abcdefghijklmnopqr"|sed -r 's/(.{5})/\1\n/g' abcde fghij klmno pqr 

Update

I thought it was just a split line issue, I didn’t read the question very carefully. Now it should give what you need:

another snapshot, but with awk this time:

 kent$ echo "abcdefghijklmnopqr"|awk '{while(length($0)>=5){print substr($0,1,5);gsub(/^./,"")}}' abcde bcdef cdefg defgh efghi fghij ghijk hijkl ijklm jklmn klmno lmnop mnopq nopqr 
+6
source

In bash:

 s=ABCDEFGHIJ for (( i=0; i < ${#s}-4; i++ )); do printf ">%d\n%s\n" $((i+1)) ${s:$i:5} done 

exits

 >1 ABCDE >2 BCDEF >3 CDEFG >4 DEFGH >5 EFGHI >6 FGHIJ 
+2
source
 str=ABCDEFGHIJKLM splitfive(){ echo "${1:$2:5}" ; } for (( i=0 ; i < ${#str} ; i++ )) ; do splitfive "$str" $i ; done 

Or maybe you want to do something more intelligent with the results

 #!/usr/bin/env bash splitstr(){ printf '%s\n' "${1:$2:$3}" } n=$1 offset=$2 declare -a by_fives while IFS= read -r str ; do for (( i=0 ; i < ${#str} ; i++ )) ; do by_fives=("${by_fives[@]}" "$(splitstr "$str" $i $n)") done done echo ${by_fives[$offset]} 

And then call him

 $ split-by 5 2 <<<"ABCDEFGHIJKLM" CDEFG 

You can adapt it from there.

EDIT: trivial version in C, for performance comparison:

 #include <stdio.h> int main(void){ FILE* f; int n=0; char five[6]; five[5] = '\0'; f = fopen("inputfile", "r"); if(f!=0){ fread(&five, sizeof(char), 5, f); while(!feof(f)){ printf("%s\n", five); fseek(f, ++n, SEEK_SET); fread(&five, sizeof(char), 5, f); } } return 0; } 

Forgive my bad C, I really do not speak the language.

+1
source

Will it do it ?:

 $ sed 's/\(.....\)/\1\n/g' < filecontaininghugestring 
+1
source

... or use the split command:

 $ ls $ echo "abcdefghijklmnopqr" | split -b5 $ ls xaa xab xac xad $ cat xaa abcde 

split also works with files ...

+1
source

sed can do this:

  sed -nr ':a;h;s/(.{5}).*/\1/p;g;s/.//;ta;' <<<"ABCDEFGHIJKLM" | # split string sed '=' | sed '1~2s/^/>/' # add line numbers and insert '>' 
+1
source

You can use cut and specify characters instead of fields , and then change the output delimiter to whatever you need, like a new line :

 echo "ABCDEFGHIJKLMNOP" | cut --output-delimiter=$'\n' -c1-5,6-10,11-15 

Exit

 ABCDE FGHIJ KLMNO 

or

 echo "ABCDEFGHIJKLMNOP" | cut --output-delimiter=$':' -c1-5,6-10,11-15 

Exit

 ABCDE:FGHIJ:KLMNO 
0
source

Source: https://habr.com/ru/post/898138/


All Articles