Get specific rows from a repeating range pattern in a text file

Question

Get specific rows from a repeating range pattern in a text file

Wow, that sounds so complicated in the name, but I guess that is not entirely true.

I have text files that have basically this layout:

Stimulus ... ... ... ... Response Stimulus ... ... ... ... Response

I used sed to get everything in between, and then get the extra information I need.

 sed -n -e '/Stimulus/,/Response/ p'

However, sometimes the participants do not respond, and in this case the file looks as follows:

 Stimulus ... ... ... ... Stimulus ... ... ... ... Response

In this special case, my script will not get what I'm looking for. So, I'm looking for a way to extract information if and only if pattern2 will follow pattern1, not pattern1.

Let me know if I state it unclear. I am more than happy to provide further information.

+6

regex awk pattern-matching perl sed

Andrej Jun 28 '13 at 13:36

source share

6 answers

This is pure bash

 tmp=() while read l; do [[ $l =~ ^Stimulus ]] && tmp=("$l") && continue [ ${#tmp[@]} -eq 0 ] && continue tmp+=("$l") [[ $l =~ ^Response ]] && printf "%s\n" "${tmp[@]}" && tmp=() done <infile

It begins to populate the tmp array if a list is found starting with Stimulus . If another Stimulus arrives, it just clears tmp and starts the task again. If Response found, it prints the contents of the tmp array. Actually the built-in printf implicit loop.

Input:

 cat >infile <<XXX ... Response 0 ... Stimulus 1 ... Stimulus 2 ... Response 2 ... Stimulus 3 ... Response 3 ... Response 4 XXX

Output:

 Stimulus 2 ... Response 2 Stimulus 3 ... Response 3

+5

Truey Jun 28 '13 at 14:08

source share

Another option is to switch to perl and its trigger (range operator):

 perl -lne ' BEGIN { ## Create regular expression to match the initial and final words. ($from_re, $to_re) = map { qr/\A$_/ } qw|Stimulus Response|; } ## Range, similar to "sed". if ( $r = ( m/$from_re/o ... m/$to_re/o ) ) { ## If inside the range and found the initial word again, remove ## all lines saved. if ( $r > 1 && m/$from_re/o ) { @data = (); } ## Save line. push @data, $_; ## At the end of the range, print all lines saved. if ( $r =~ m/E0\z/ ) { printf qq|%s\n|, join qq|\n|, @data; @data = (); } } ' infile

Assuming an input file as:

 Stimulus 1... ... ... ... Stimulus 2... ... ... ... Response 2 Stimulus 3... ... ... ... Response 3 Stimulus 4... ... ... ... Stimulus 5...

This gives:

 Stimulus 2... ... ... ... Response 2 Stimulus 3... ... ... ... Response 3

+4

Birei Jun 28 '13 at 14:06

source share

Here's a clean bash that tries to minimize silly side effects:

 #!/bin/bash out=() while read -rl; do case "$l" in Stimulus*) out=( "$l" ) ;; Response*) ((${#out[@]}!=0)) && { printf "%s\n" "${out[@]}" "$l"; out=(); } ;; *) ((${#out[@]}!=0)) && out+=( "$l" ) ;; esac done < infile

It also handles the case when a Response , but not Stimulus .

+4

gniourf_gniourf Jun 28 '13 at 14:19

source share

Updated to handle isolated responses

 awk ' /Response/ { if (p==1) { for(;k<length(a);) { print a[++k] } print $0 } delete a;k=p=0 } /Stimulus/ { if (p==1) { delete a; i=0 } p=1 } p { a[++i]=$0 }' log

+4

jaypal singh Jun 28 '13 at 14:47

source share

Really nice and easy work for GNU sed , one-way, without unwanted pipes and tools:

 sed -n 'H;/^Stimulus/{h;d};/^Response/{x;s/^Response//;tk;p;:k;d}' file

Input file:

  Stimulus 1 ...
 bad
 bad
 bad
 Stimulus 2 ...
 ...
 ...
 ...
 Response 2
 Stimulus 3 ...
 ...
 ...
 ...
 Response 3
 Stimulus 4 ...
 bad
 bad
 bad
 bad
 Stimulus 5 ...
 ...
 ...
 ...
 ...
 Response 5
 bad
 bad
 bad
 bad
 Response 6
 bad
 bad
 bad

And the conclusion:

  $ sed -n 'H; / ^ Stimulus / {h; d}; / ^ Response / {x; s / ^ Response //; tk; p;: k; d}' file
 Stimulus 2 ...
 ...
 ...
 ...
 Response 2
 Stimulus 3 ...
 ...
 ...
 ...
 Response 3
 Stimulus 5 ...
 ...
 ...
 ...
 ...
 Response 5

And my code for GNU awk

 awk '{a[++i]=$0};/^Response/ && a[1] !~ /^Response/ {for (k=1; k<=i; k++) {print a[k]}};/^Stimulus|^Response/ { delete a; i=0; a[++i]=$0}' file

As you can see, I need too much awk code ...

+4

captcha Jun 28 '13 at 19:45

source share

Birei · Accepted Answer · 2013-06-28T13:58:21+0000

One dirty way, although it seemed to work in my test, could reverse the contents of the file, search from Response to Stimulus and return the result again.

Assuming the following input:

 Stimulus 1... ... ... ... Stimulus 2... ... ... ... Response 2 Stimulus 3... ... ... ... Response 3 Stimulus 4... ... ... ... Stimulus 5...

Command:

 tac infile | sed -ne '/Response/,/Stimulus/ p' | tac -

Productivity:

 Stimulus 2... ... ... ... Response 2 Stimulus 3... ... ... ... Response 3

EDIT : An example with isolated parts of Response . Filter twice (based on OP comment):

 tac infile | sed -ne '/Response/,/Stimulus/ p' | tac - | sed -ne '/Stimulus/,/Response/ p'

Get specific rows from a repeating range pattern in a text file

Really nice and easy work for GNU sed , one-way, without unwanted pipes and tools:

More articles: