I am dealing with specific file names and must extract information from them.
The file name structure is similar to: "20100613_M4_28007834.005_F_RANDOMSTR.raw.gz"
with RANDOMSTR row with max. 22 characters and which may contain a substring (or not) with the format "-W [0-9]. [0-9] {2}. [0-9] {3}". This substring also has a unique launch feature with -W.
The information I need to extract is the RANDOMSTR substring without this optional substring.
I want to implement this in a bash script, and so far the best option I've found is to use gawk with a regex. My best attempt still fails:
gawk --re-interval '{match ($0,"([0-9]{8})_(M[0-9])_([0-9]{8}\\.[0-9]{3})_(.)_(.*)(-W.*)?.raw.gz",arr); print arr[5]}' <<< "20100613_M4_28007834.005_F_OTHER-STRING-W0.40+045.raw.gz"
OTHER-STRING-W0.40+045
Expected results:
gawk --re-interval '{match ($0,$regexp,arr); print arr[5]}' <<< "20100613_M4_28007834.005_F_SOME-STRING.raw.gz"
SOME-STRING
gawk --re-interval '{match ($0,$regexp,arr); print arr[5]}' <<< "20100613_M4_28007834.005_F_OTHER-STRING-W0.40+045.raw.gz"
OTHER-STRING
How can I get the desired effect.
Thank.