Awk combining multiple lines conditionally

I want to combine values ​​from several lines of different lengths into one line if they correspond to identifiers.

Input Example:

ID: Value: a-1 49 a-2 75 b-1 120 b-2 150 b-3 211 c-1 289 d-1 301 d-2 322 

An example of the desired output:

 ID: Value: a 49,75 b 120,150,211 c 289 d 301,322 

How can I write an awk expression (or sed or grep or something else) to check if matches match identifiers, and then print all of these values ​​on one line? Of course, I can just print them into different columns and combine them later, so the real problem is only conditional printing if the identifiers match and if a new line does not start.

+6
source share
5 answers

In awk, if your identifiers are grouped together:

 awk 'NR==1 {print $0} NR > 1 {sub("-.*", "", $1)} NR == 2 {prev=$1; printf "%s %s", $1, $2} NR > 2 && prev == $1 {printf ",%s", $2} NR > 2 && prev != $1 {prev=$1; printf "\n%s %s", $1, $2}' your_input_file 
+5
source

Given your input:

 awk ' NR == 1 {print; next} { split($1,a,/-/) sep = values[a[1]] == "" ? "" : "," values[a[1]] = values[a[1]] sep $2 } END {for (key in values) print key, values[key]} ' 

produces

 ID: Value: a 49,75 b 120,150,211 c 289 d 301,322 

A language that supports "hash lists" will also be convenient. Here is the version of perl

 perl -lne ' if ($. == 1) {print; next} if (/^(.+?)-\S+\s+(.*)/) { push @{$values{$1}}, $2; } END { $, = " "; foreach $key (keys %values) { print $key, join(",", @{$values{$key}}); } } ' 
+3
source

In sed, assuming identifiers are grouped together:

 sed -n -e '1p;2{s/-.* / /;h};3,${H;x;s/\(.*\) \(.*\)\n\1-.* /\1 \2,/;/\n/{P;s/.*\n//;s/-.* / /};x};${x;p}' your_input_file 

Bellow is a sed script comment that you can run with sed -n -f script your_input_file :

 # Print the 1st line as is. 1p # For the 2nd line, remove what is after - in the ID and save in the hold space. 2{s/-.* / /;h} # For all the other lines... 3,${ # Append the line to the hold space and place it in the pattern space. H;x # Substitute identical ids by a ,. s/\(.*\) \(.*\)\n\1-.* /\1 \2,/ # If we have a \n left in the pattern space, it is a new ID, so print the old and prepare the next. /\n/{P;s/.*\n//;s/-.* / /} # Save what remains in hold space for next line. x} # For the last line, print what is left in the hold space. ${x;p} 
+3
source

Given your inputs in the input.txt file:

 awk '{split($1, a, "-"); hsh[a[1]]=hsh[a[1]]$2","}END{for (i in hsh){print i" "hsh[i]}}' input.txt | sed 's/,$//' 

EXIT

 a 49,75 b 120,150,211 c 289 d 301,322 
+1
source

A solution based on standard tools, as an alternative to the excellent solutions presented above ...

 $ for INDEX in $(cut -f1 input | uniq); do echo -n "$INDEX ";grep "^$INDEX" input | cut -f2 | tr '\n' ' ';echo; done a 49 75 b 120 150 211 c 289 d 301 322 

Using slightly modified input without a header and index created with

 awk 'NR>1' input | sed 's/-[0-9]*//' a 49 a 75 b 120 b 150 b 211 c 289 d 301 d 322 
0
source

Source: https://habr.com/ru/post/895082/


All Articles