I rarely have to deal with scenarios, so I am against the lack of knowledge for this problem.
I have a file> 500 MB in the text, which is well separated, but I know that inside there are 5-10 "bad" sections. The data in the sections can be easily estimated by a person, I do not know how to do this in the program.
I take away the well-known good value in #Field MyField - however, if that value did not appear in #FIELD LOCATION , something went wrong.
An example of two sections within a file is as follows. The first is bad and the second is good.
#START Descriptor #FIELD LOCATION="http://path.to/file/here&Value=FOO&OtherValue=BLAH" #FIELD AnythingElse #FIELD MyField="BAR" #END #START Descriptor #FIELD LOCATION="http://path.to/file/here&Value=BAR&OtherValue=BLAH" #FIELD AnythingElse #FIELD MyField="BAR" #END
Sections begin and end logically, with #START and #END
If #FIELD LOCATION does not exist, continue to the next section.
If #FIELD MyField="BAR" and #FIELD LOCATION does not contain BAR , print all the lines from this section into a new file.
Note. #FIELD MyField="BAR" is a control value that I insert, capturing other information about the data as this file is created (in my case, it is an indicator of a language such as EN or DE so it will literally #FIELD MyField="EN" Any other value in this field will be ignored; this is not an entry that matches my criteria.
I believe that this can be done in Awk or Perl, I can do very simple single-line, but this does not match my skills.
source share