Get specific rows from a repeating range pattern in a text file

Wow, that sounds so complicated in the name, but I guess that is not entirely true.

I have text files that have basically this layout:

Stimulus ... ... ... ... Response Stimulus ... ... ... ... Response 

I used sed to get everything in between, and then get the extra information I need.

 sed -n -e '/Stimulus/,/Response/ p' 

However, sometimes the participants do not respond, and in this case the file looks as follows:

 Stimulus ... ... ... ... Stimulus ... ... ... ... Response 

In this special case, my script will not get what I'm looking for. So, I'm looking for a way to extract information if and only if pattern2 will follow pattern1, not pattern1.

Let me know if I state it unclear. I am more than happy to provide further information.

+6
source share
6 answers

One dirty way, although it seemed to work in my test, could reverse the contents of the file, search from Response to Stimulus and return the result again.

Assuming the following input:

 Stimulus 1... ... ... ... Stimulus 2... ... ... ... Response 2 Stimulus 3... ... ... ... Response 3 Stimulus 4... ... ... ... Stimulus 5... 

Command:

 tac infile | sed -ne '/Response/,/Stimulus/ p' | tac - 

Productivity:

 Stimulus 2... ... ... ... Response 2 Stimulus 3... ... ... ... Response 3 

EDIT : An example with isolated parts of Response . Filter twice (based on OP comment):

 tac infile | sed -ne '/Response/,/Stimulus/ p' | tac - | sed -ne '/Stimulus/,/Response/ p' 
+7
source

This is pure bash

 tmp=() while read l; do [[ $l =~ ^Stimulus ]] && tmp=("$l") && continue [ ${#tmp[@]} -eq 0 ] && continue tmp+=("$l") [[ $l =~ ^Response ]] && printf "%s\n" "${tmp[@]}" && tmp=() done <infile 

It begins to populate the tmp array if a list is found starting with Stimulus . If another Stimulus arrives, it just clears tmp and starts the task again. If Response found, it prints the contents of the tmp array. Actually the built-in printf implicit loop.

Input:

 cat >infile <<XXX ... Response 0 ... Stimulus 1 ... Stimulus 2 ... Response 2 ... Stimulus 3 ... Response 3 ... Response 4 XXX 

Output:

 Stimulus 2 ... Response 2 Stimulus 3 ... Response 3 
+5
source

Another option is to switch to perl and its trigger (range operator):

 perl -lne ' BEGIN { ## Create regular expression to match the initial and final words. ($from_re, $to_re) = map { qr/\A$_/ } qw|Stimulus Response|; } ## Range, similar to "sed". if ( $r = ( m/$from_re/o ... m/$to_re/o ) ) { ## If inside the range and found the initial word again, remove ## all lines saved. if ( $r > 1 && m/$from_re/o ) { @data = (); } ## Save line. push @data, $_; ## At the end of the range, print all lines saved. if ( $r =~ m/E0\z/ ) { printf qq|%s\n|, join qq|\n|, @data; @data = (); } } ' infile 

Assuming an input file as:

 Stimulus 1... ... ... ... Stimulus 2... ... ... ... Response 2 Stimulus 3... ... ... ... Response 3 Stimulus 4... ... ... ... Stimulus 5... 

This gives:

 Stimulus 2... ... ... ... Response 2 Stimulus 3... ... ... ... Response 3 
+4
source

Here's a clean bash that tries to minimize silly side effects:

 #!/bin/bash out=() while read -rl; do case "$l" in Stimulus*) out=( "$l" ) ;; Response*) ((${#out[@]}!=0)) && { printf "%s\n" "${out[@]}" "$l"; out=(); } ;; *) ((${#out[@]}!=0)) && out+=( "$l" ) ;; esac done < infile 

It also handles the case when a Response , but not Stimulus .

+4
source

Updated to handle isolated responses

 awk ' /Response/ { if (p==1) { for(;k<length(a);) { print a[++k] } print $0 } delete a;k=p=0 } /Stimulus/ { if (p==1) { delete a; i=0 } p=1 } p { a[++i]=$0 }' log 
+4
source

Really nice and easy work for GNU , one-way, without unwanted pipes and tools:

 sed -n 'H;/^Stimulus/{h;d};/^Response/{x;s/^Response//;tk;p;:k;d}' file 

Input file:

  Stimulus 1 ...
 bad
 bad
 bad
 Stimulus 2 ...
 ...
 ...
 ...
 Response 2
 Stimulus 3 ...
 ...
 ...
 ...
 Response 3
 Stimulus 4 ...
 bad
 bad
 bad
 bad
 Stimulus 5 ...
 ...
 ...
 ...
 ...
 Response 5
 bad
 bad
 bad
 bad
 Response 6
 bad
 bad
 bad

And the conclusion:

  $ sed -n 'H; / ^ Stimulus / {h; d}; / ^ Response / {x; s / ^ Response //; tk; p;: k; d}' file
 Stimulus 2 ...
 ...
 ...
 ...
 Response 2
 Stimulus 3 ...
 ...
 ...
 ...
 Response 3
 Stimulus 5 ...
 ...
 ...
 ...
 ...
 Response 5

And my code for GNU

 awk '{a[++i]=$0};/^Response/ && a[1] !~ /^Response/ {for (k=1; k<=i; k++) {print a[k]}};/^Stimulus|^Response/ { delete a; i=0; a[++i]=$0}' file 

As you can see, I need too much awk code ...

+4
source

Source: https://habr.com/ru/post/948321/


All Articles