I am trying to extract specific fields from my file. In fact, the output fields contain only a matching expression, the output of which begins after the matched entries.
This is an example of my input. Sometimes the fields are in different orders and also have a different number of lines in front of the heading I'm trying to match.
It was difficult for me to learn how to achieve this with the cut and sed commands, and could not find the awk method.
CGATS.17 FORMAT_VERSION 1 KEYWORD "SampleID" KEYWORD "SAMPLE_NAME" NUMBER_OF_FIELDS 45 WEIGHTING_FUNCTION "ILLUMINANT, D50" WEIGHTING_FUNCTION "OBSERVER, 2 degree" BEGIN_DATA_FORMAT SampleID SAMPLE_NAME CMYK_C CMYK_M CMYK_Y CMYK_K LAB_L LAB_A LAB_B nm380 nm390 nm400 END_DATA_FORMAT NUMBER_OF_SETS 182 BEGIN_DATA 1 1 40 40 40 0 62.5 6.98 4.09 0.195213 0.205916 0.212827 2 2 0 40 40 0 73.69 25.48 24.89 0.200109 0.211081 0.218222 3 3 40 40 0 0 63.95 12.14 -20.91 0.346069 0.365042 0.377148 4 4 0 70 70 0 58.91 47.69 35.54 0.080033 0.084421 0.087317 END_DATA
This is the dirty code that I used, which basically did the job, but without a conditional search for the field header. The awk command is simply to remove the empty lines surrounding the output.
cut -f 7-9 -s input.txt | sed -E 's/(LAB_.)//g' | awk 'NF' > file.txt
The result that I expect will be as follows. It still has a tab delimiter containing only the values โโof fields starting directly (LAB _.)
62.5 6.98 4.09 73.69 25.48 24.89 63.95 12.14 -20.91 58.91 47.69 35.54
source share