Split a large string into substrings

Question

Split a large string into substrings

I have a huge line like:

ABCDEFGHIJKLM ...

and I would like to split it into substrings of length 5 as follows:

<p → 1
Abcde
> 2
Bcdef
> 3
CDEFG
[...]

UPDATE

Decision:
ok, thank you guys, I was able to find a way to do it fast !. This is my solution combining several ideas from here:

str = "ABCDEFGHIJKLMNOP"
splitfive () {echo $ 1 | cut -c $ 2- | sed -r 's / (. {5}) / \ 1 \ n / g'; }
for ((i = 0; i <= 5; i ++)); do splitfive "$ str" $ i; done | grep -v "^ $"

+6

string bash shell

didymos Sep 27 '11 at 11:08

source share

8 answers

chown · Answer 1 · 2011-09-27T11:15:11+0000

${string:position:length}

Retrieves the $ length characters of a substring from $ string at position $.

 stringZ=abcABC123ABCabc # 0123456789..... # 0-based indexing. echo ${stringZ:0} # abcABC123ABCabc echo ${stringZ:1} # bcABC123ABCabc echo ${stringZ:7} # 23ABCabc echo ${stringZ:0:5} # abcAB # Five characters of substring.

Then use a loop to jump and add 1 to the position to extract each substring of length 5.

 for i in seq 0 ${#stringZ}; do echo ${stringZ:$i:5} done

All of Bash String Processing

Kent · Answer 2 · 2011-09-27T11:56:35+0000

sed can do it in one shot:

 kent$ echo "abcdefghijklmnopqr"|sed -r 's/(.{5})/\1 /g' abcde fghij klmno pqr

or

depends on your needs:

 kent$ echo "abcdefghijklmnopqr"|sed -r 's/(.{5})/\1\n/g' abcde fghij klmno pqr

Update

I thought it was just a split line issue, I didn’t read the question very carefully. Now it should give what you need:

another snapshot, but with awk this time:

 kent$ echo "abcdefghijklmnopqr"|awk '{while(length($0)>=5){print substr($0,1,5);gsub(/^./,"")}}' abcde bcdef cdefg defgh efghi fghij ghijk hijkl ijklm jklmn klmno lmnop mnopq nopqr

glenn jackman · Answer 3 · 2011-09-27T13:30:31+0000

In bash:

 s=ABCDEFGHIJ for (( i=0; i < ${#s}-4; i++ )); do printf ">%d\n%s\n" $((i+1)) ${s:$i:5} done

exits

 >1 ABCDE >2 BCDEF >3 CDEFG >4 DEFGH >5 EFGHI >6 FGHIJ

Sorpigal · Answer 4 · 2011-09-27T11:16:38+0000

 str=ABCDEFGHIJKLM splitfive(){ echo "${1:$2:5}" ; } for (( i=0 ; i < ${#str} ; i++ )) ; do splitfive "$str" $i ; done

Or maybe you want to do something more intelligent with the results

 #!/usr/bin/env bash splitstr(){ printf '%s\n' "${1:$2:$3}" } n=$1 offset=$2 declare -a by_fives while IFS= read -r str ; do for (( i=0 ; i < ${#str} ; i++ )) ; do by_fives=("${by_fives[@]}" "$(splitstr "$str" $i $n)") done done echo ${by_fives[$offset]}

And then call him

 $ split-by 5 2 <<<"ABCDEFGHIJKLM" CDEFG

You can adapt it from there.

EDIT: trivial version in C, for performance comparison:

 #include <stdio.h> int main(void){ FILE* f; int n=0; char five[6]; five[5] = '\0'; f = fopen("inputfile", "r"); if(f!=0){ fread(&five, sizeof(char), 5, f); while(!feof(f)){ printf("%s\n", five); fseek(f, ++n, SEEK_SET); fread(&five, sizeof(char), 5, f); } } return 0; }

Forgive my bad C, I really do not speak the language.

holygeek · Answer 5 · 2011-09-27T12:00:53+0000

Will it do it ?:

 $ sed 's/\(.....\)/\1\n/g' < filecontaininghugestring

Fredrik pihl · Answer 6 · 2011-09-27T12:05:56+0000

... or use the split command:

 $ ls $ echo "abcdefghijklmnopqr" | split -b5 $ ls xaa xab xac xad $ cat xaa abcde

split also works with files ...

potong · Answer 7 · 2011-09-27T16:25:55+0000

sed can do this:

  sed -nr ':a;h;s/(.{5}).*/\1/p;g;s/.//;ta;' <<<"ABCDEFGHIJKLM" | # split string sed '=' | sed '1~2s/^/>/' # add line numbers and insert '>'

stefanB · Answer 8 · 2013-10-30T05:59:25+0000

You can use cut and specify characters instead of fields , and then change the output delimiter to whatever you need, like a new line :

 echo "ABCDEFGHIJKLMNOP" | cut --output-delimiter=$'\n' -c1-5,6-10,11-15

Exit

 ABCDE FGHIJ KLMNO

or

 echo "ABCDEFGHIJKLMNOP" | cut --output-delimiter=$':' -c1-5,6-10,11-15

Exit

 ABCDE:FGHIJ:KLMNO

Split a large string into substrings

More articles: