How to write a sed script for grep information from a text file

I am trying to do my homework, which is limited only to using sed to filter the input file in a specific output format. Here is the input file (named stocks ):

 Symbol;Name;Volume ================================================ BAC;Bank of America Corporation Com;238,059,612 CSCO;Cisco Systems, Inc.;28,159,455 INTC;Intel Corporation;22,501,784 MSFT;Microsoft Corporation;23,363,118 VZ;Verizon Communications Inc. Com;5,744,385 KO;Coca-Cola Company (The) Common;3,752,569 MMM;3M Company Common Stock;1,660,453 ================================================ 

And the output should be:

 BAC, CSCO, INTC, MSFT, VZ, KO, MMM 

I came up with a solution, but it is inefficient. Here is my sed script (named try.sed ):

 /.*;.*;[0-9].*/ { N N N N N N s/\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*/\1, \2, \3, \4, \5, \6, \7/gp } 

The command that I run on the shell:

 $ sed -nf try.sed stocks 

My question is: is there a better way to use sed to get the same result? script I wrote only works with 7 rows of data. If the data is longer, I need to re-edit my script. I'm not sure how I can do this better, so I ask for help!

Thanks for any recommendations.

+6
source share
4 answers

Another way to use sed :

 sed -ne '/^====/,/^====/ { /;/ { s/;.*$// ; H } }; $ { g ; s/\n// ; s/\n/, /g ; p }' stocks 

Conclusion:

 BAC, CSCO, INTC, MSFT, VZ, KO, MMM 

Explanation:

 -ne # Process each input line without printing and execute next commands... /^====/,/^====/ # For all lines between these... { /;/ # If line has a semicolon... { s/;.*$// # Remove characters from first semicolon until end of line. H # Append content to 'hold space'. } }; $ # In last input line... { g # Copy content of 'hold space' to 'pattern space' to work with it. s/\n// # Remove first newline character. s/\n/, /g # substitute the rest with output separator, comma in this case. p # Print to output. 
+2
source

Edit: I edited my algorithm since I neglected to consider the header and footer (I thought they were only for our good).

sed by its construction, refers to each line of the input file and then executes expressions on those that correspond to some specification (or to no one). If you set your script to a certain number of lines, you are definitely doing something wrong! I will not write you a script, as this is homework, but the general idea of ​​one way to do this is to write a script that does the following. Think about ordering as the order of things should be in the script.

  • Skip the first three lines using d , which removes the pattern space and immediately moves to the next line.
  • For each line that is not an empty line, follow these steps. (This will all be in one set of braces.)
    • Replace everything after and after the first semicolon ( ; ) with a semicolon and a space (",") with the s (substitute) command.
    • Add the current template space to the storage buffer (see H ).
    • Delete the template space and continue to the next line, as in step 1.
  • For each line that reaches this point in the script (there should be a first empty line), extract the contents of the hold space into the template space. (That would be after the braces above.)
  • Substitute all newlines in the pattern space without anything.
  • Then replace the last comma and space in the template space with nothing.
  • Finally, exit the program so as not to process more lines. My script worked without this, but I'm not 100% sure why.

Saying this is the only way to do it. sed often offers various methods of varying complexity to complete a task. The solution I wrote using this method is 10 lines.

As a note, I'm not going to suppress printing (using -n ) or manual printing (using p ); each line is printed by default. My script works as follows:

 $ sed -f companies.sed companies BAC, CSCO, INTC, MSFT, VZ, KO, MMM 
+2
source

This sed command should produce the desired result:

 sed -rn '/[0-9]+$/{s/^([^;]*).*$/\1/p;}' file.txt 

OR on Mac:

 sed -En '/[0-9]+$/{s/^([^;]*).*$/\1/p;}' file.txt 
0
source

This might work for you:

 sed '1d;/;/{s/;.*//;H};${g;s/.//;s/\n/, /g;q};d' stocks 
  • We do not want the headers to allow them to be deleted. 1d
  • All data elements are separated by a symbol ; , so let's focus on these lines. /;/
  • From the above things, delete everything from the first ; to the end of the line, and then fill it into the hold space (HS) {s/;.*//;H}
  • When you get to the last line, rewrite it with the HS command using the g command, delete the first new line (generated by the H command), replace all subsequent lines of the new line with a comma and a space, and print out what remains. ${g;s/.//;s/\n/, /g;q}
  • Delete everything else d

Here's a terminal session showing an incremental refinement of the construction of the sed command:

 cat <<! >stock # paste the file into a here doc and pass it on to a file > Symbol;Name;Volume > ================================================ > > BAC;Bank of America Corporation Com;238,059,612 > CSCO;Cisco Systems, Inc.;28,159,455 > INTC;Intel Corporation;22,501,784 > MSFT;Microsoft Corporation;23,363,118 > VZ;Verizon Communications Inc. Com;5,744,385 > KO;Coca-Cola Company (The) Common;3,752,569 > MMM;3M Company Common Stock;1,660,453 > > ================================================ > ! sed '1d;/;/!d' stock # delete headings and everything but data lines BAC;Bank of America Corporation Com;238,059,612 CSCO;Cisco Systems, Inc.;28,159,455 INTC;Intel Corporation;22,501,784 MSFT;Microsoft Corporation;23,363,118 VZ;Verizon Communications Inc. Com;5,744,385 KO;Coca-Cola Company (The) Common;3,752,569 MMM;3M Company Common Stock;1,660,453 sed '1d;/;/{s/;.*//p};d' stock # delete all non essential data BAC CSCO INTC MSFT VZ KO MMM sed '1d;/;/{s/;.*//;H};${g;l};d' stock # use the l command to see what really there! \nBAC\nCSCO\nINTC\nMSFT\nVZ\nKO\nMMM$ sed '1d;/;/{s/;.*//;H};${g;s/.//;s/\n/, /g;l};d' stock # refine refine BAC, CSCO, INTC, MSFT, VZ, KO, MMM$ sed '1d;/;/{s/;.*//;H};${g;s/.//;s/\n/, /g;q};d' stock # all done! BAC, CSCO, INTC, MSFT, VZ, KO, MMM 
0
source

Source: https://habr.com/ru/post/907603/


All Articles