How to restrict sed to replace only data appearing after the first closing square bracket?

I have a CSV file that uses a highly customizable format. Here, each number represents data in each of the four columns:

1 2 [3] 4 

I need to limit sed only finding and modifying the data appearing in the fourth column. Essentially, it should ignore all the data in the line that appears before the first attachment of the closing square bracket and space, ] and only change the data that appears after. For example, file1.txt may contain the following:

 penguin bird [lives in Antarctica] The penguin lives in cold places. wolf dog [lives in Antarctica with penguins] The wolf likes to eat penguins. 

The replacement may be sed 's/penguin/animal/g' file1.txt . After running the script, the output will look like this:

 penguin bird [lives in Antarctica] The animal lives in cold places. wolf dog [lives in Antarctica with penguins] The wolf likes to eat animal. 

In this case, all appearances of penguin were ignored before the first ] and were changed only in the lines that appear after.

  • Additional closing brackets may appear later in the line, but only the first should be considered as a division.

How can I sed ignore the first three columns of this custom CSV format when it finds and replaces text?

I have a version of GNU version 4.2.1.

+4
source share
4 answers

I usually do this as described in the shelter (if I just typed the quick sed command line), but it has the disadvantage that after you start matching the input part to save it (using \1 , etc. ) you need to match and replace everything and no longer use simple replacements, for example s/penguin/animal/ . If you want to add some kind of pattern around the replacement, you can hide the beginning of the line in the hold buffer and then return it:

 sed -e 'h' \ -e 's/.*\] //' \ -e 's/penguin/animal/' \ -e 'x' \ -e 's/\] .*/] /' \ -e 'G' \ -e 's/\n//' 

h stores the original row in hold space. Then we remove the prefix and make any substitution (for example, your example) or a series of substitutions at the end of the line. Then x replaces the end and the saved copy. We remove the original end from the saved copy and use G to return them together. G adds a new line that we don’t want, so we delete it.

+2
source

You tell sed to look for the combination '' ', and then .* (Anything), and then as part of your replacement you return the characters ] .

The only problem is that sed usually "thinks" that ] char is part of the definition of a character class, so you need to avoid it. Try

 echo "ab [c] d" | sed 's/\] .*$/\] XYZ/' ab [c] XYZ 

Note that due to the lack of opening [ char to mean char -class def, you can get away with

 echo "ab [c] d" | sed 's/] .*$/] XYZ/' ab [c] XYZ 

Edit

To correct only the 4th word,

 echo "ab [c] de" | sed 's/\] [^ ][^ ]*/\] XYZ/' ab [c] XYZ e 

Adding above [^ ][^ ]/ says "any-char -that-is-not-a-space" followed by any number "any-w983> -that-is-not-a-space", therefore, when the interlocutor finds the next place, a coincidence of stops occurs.

final editing

 echo "penguin bird [lives in Antarctica] The penguin lives in cold places. wold dog [lives in Antarctica with penguins] The wolf likes to eat penguins." \ | sed 's/\] The penguin \(.*$\)/] The animal \1/' 

and when you use gnu sed you don't need to go beyond (... ).

 echo "penguin bird [lives in Antarctica] The penguin lives in cold places. wold dog [lives in Antarctica with penguins] The wolf likes to eat penguins." \ | sed 's/\] The penguin (*$)/] The animal \1/' 

Output

 penguin bird [lives in Antarctica] The animal lives in cold places. wolf dog [lives in Antarctica with penguins] The wolf likes to eat penguins. 

Depending on the version of sed you are using. There is a pretty big difference between sed for AIX , vs solaris , VS, which GNU seds usually reside on lunix.

If you have other questions about using sed, it is usually helpful to include the output of sed --version or sed -V . If there is no response from these commands, try what sed . Else includes the OS name for uname .

Ihth

+3
source

Assuming you have a single closing parenthesis occurrence, I would use awk for this:

 awk 'BEGIN {FS=OFS="]"} { gsub(/penguin/, "animal", $2) }1' file.txt 

Results:

 penguin bird [lives in Antarctica] The animal lives in cold places. wolf dog [lives in Antarctica with penguins] The wolf likes to eat animals. 
+2
source

This may work for you (GNU sed);

 sed -i 's/\]/&\n/;h;s/.*\n//;s/penguin/animal/g;H;g;s/\n.*.\n//' file 

Explanation:

  • s/\]/&\n/ dividing line with a marker \n
  • h copy line
  • s/.*\n// delete the part of the line that you do not want to change.
  • s/penguin/animal/g change the part you want to change.
  • H;g add it back to the original line
  • s/\n.*\n// delete the part of the original line that you want to change

This applies to each line, if the change is conditional, use:

 sed -i '/\]/!b;s//&\n/;h;s/.*\n//;s/penguin/animal/g;H;g;s/\n.*.\n//' file 

Alternative (possibly a simpler method):

 sed ':a;s/\(\].*\)penguin/\1animal/;ta' file 
+1
source

Source: https://habr.com/ru/post/1434507/


All Articles