How to avoid a custom search term that you do not want to evaluate for sed?

I am trying to avoid a custom search string that can contain any arbitrary character and pass it to sed, but cannot figure out how to make it safe to use sed. In sed, we do s/search/replace/ , and I want to search for exactly the characters in the search string without interpreting their sed (for example, '/' in 'my / path' does not close the sed expression).

I read this related question regarding how to avoid the replacement condition. I would think that you would do the same for the search, but apparently not because sed complains.

Here is an example program that creates a file called "my_searches". Then it reads each line of this file and performs a search and replace with sed.

 #!/bin/bash # The contents of this heredoc will be the lines of our file. read -d '' SAMPLES << 'EOF' /usr/include P@ $$W0RD$? "I didn't", said Jane O'Brien. `ls -l` ~!@ #$%^&*()_+-=:'}{[]/.,`"\| EOF echo "$SAMPLES" > my_searches # Now for each line in the file, do some search and replace while read line do echo "------===[ BEGIN $line ]===------" # Escape every character in $line (eg, ab/c becomes \a\b\/\c). I got # this solution from the accepted answer in the linked SO question. ES=$(echo "$line" | awk '{gsub(".", "\\\\&");print}') # Search for the line we read from the file and replace it with # the text "replaced" sed 's/'"$ES"'/replaced/' < my_searches # Does not work # Search for the text "Jane" and replace it with the line we read. sed 's/Jane/'"$ES"'/' < my_searches # Works # Search for the line we read and replace it with itself. sed 's/'"$ES"'/'"$ES"'/' < my_searches # Does not work echo "------===[ END ]===------" echo done < my_searches 

When you start the program, you get sed: xregcomp: Invalid content of \{\} for the last line of the file when it is used as the term โ€œsearchโ€, but not the term โ€œreplaceโ€. I noted lines that give this error with # Does not work above.

 ------===[ BEGIN ~!@ #$%^&*()_+-=:'}{[]/.,`"| ]===------ sed: xregcomp: Invalid content of \{\} ------===[ END ]===------ 

If you cannot escape the characters in $line (i.e. sed 's/'"$line"'/replaced/' < my_searches ), you will get this error because sed is trying to interpret various characters:

 ------===[ BEGIN ~!@ #$%^&*()_+-=:'}{[]/.,`"| ]===------ sed: bad format in substitution expression sed: No previous regexp. ------===[ END ]===------ 

So, how do I avoid the search query for sed so that the user can provide any arbitrary text to search for? Or, more precisely, that I can replace the ES= line in my code so that the sed command works for arbitrary text from a file?

I use sed because I am limited to the subset of utilities included in busybox . Although I can use another method (for example, a C program), it would be nice to know if there is a solution to this problem or not.

+4
source share
8 answers

This relatively well-known problem posed by a string creates a pattern that matches only that string. Some languages โ€‹โ€‹are lighter than others, and sed is one of the annoying ones. My advice: avoid sed and write a user program in another language.

  • You can write a custom C program using the standard strstr library function. If this is not fast enough, you can use any Boyer-Moore character set that you can find on Google & mdash: they will search very quickly (sublinear time).

  • You can write this fairly easily in Lua :

     local function quote(s) return (s:gsub('%W', '%%%1')) end local function replace(first, second, s) return (s:gsub(quote(first), second)) end for l in io.lines() do io.write(replace(arg[1], arg[2], l), '\n') end 

    If you're not fast enough, speed this up by applying quote to arg[1] only once and the built-in frunciton replace .

+1
source

this: echo "$line" | awk '{gsub(".", "\\\\&");print}' echo "$line" | awk '{gsub(".", "\\\\&");print}' escapes each character in $line , which is wrong !. do echo $ES after that, and $ ES will be \/\u\s\r\/\i\n\c\l\u\d\e . Then, when you go to the next sed, (below)

 sed 's/'"$ES"'/replaced/' my_searches 

it will not work because there is no line with the pattern \/\u\s\r\/\i\n\c\l\u\d\e . The right way:

 $ sed 's|\([@$#^&*!~+-={}/]\)|\\\1|g' file \/usr\/include P\@\$\$W0RD\$? "I didn't", said Jane O'Brien. \`ls -l\` \~\!\@\#\$%\^\&\*()_\+-\=:'\}\{[]\/.,\`"\| 

you put all the characters you want to escape inside [] , and select the appropriate separator for sed that is not in your character class, for example, I chose "|". Then use the "g" (global) flag.

let us know what you are actually trying to do, that is, the actual problem you are trying to solve.

0
source

As ghostdog mentioned, awk '{gsub(".", "\\\\&");print}' invalid because it preempts non-specific characters. What you really want to do is perhaps something like:

 awk 'gsub(/[^[:alpha:]]/, "\\\\&")' 

This will avoid characters other than alpha. For some reason I have yet to determine, I still cannot replace "I didn't", said Jane O'Brien. although my code above runs away from it correctly on

\"I\ didn\'t\"\,\ said\ Jane\ O\'Brien\.

This is pretty weird because it works great

 $ echo "\"I didn't\", said Jane O'Brien." | sed s/\"I\ didn\'t\"\,\ said\ Jane\ O\'Brien\./replaced/ replaced` 
0
source

This is similar to working with FreeBSD sed:

 # using FreeBSD & Mac OS X sed ES="$(printf "%q" "${line}")" ES="${ES//+/\\+}" sed -E s$'\777'"${ES}"$'\777'replaced$'\777' < my_searches sed -E s$'\777'Jane$'\777'"${line}"$'\777' < my_searches sed -E s$'\777'"${ES}"$'\777'"${line}"$'\777' < my_searches 
0
source

The -E option of the FreeBSD command is used to enable extended regular expressions.

The same is available for GNU sed via the -r or --regexp-extended options, respectively.

For differences between basic and extended regular expressions, see, for example:

http://www.gnu.org/software/sed/manual/sed.html#Extended-regexps

Perhaps you can use the FreeBSD-compatible mini version instead of GNU sed?

 # example using FreeBSD-compatible minised, # http://www.exactcode.de/site/open_source/minised/ # escape some punctuation characters with printf help printf printf "%s\n" '!"#$%&'"'"'()*+,-./:;<=> ?@ [\]^_`{|}~' printf "%q\n" '!"#$%&'"'"'()*+,-./:;<=> ?@ [\]^_`{|}~' # example line line='!"#$%&'"'"'()*+,-./:;<=> ?@ [\]^_`{|}~ ... and Jane ...' # escapes in regular expression ES="$(printf "%q" "${line}")" # escape some punctuation characters ES="${ES//./\\.}" # . -> \. ES="${ES//\\\\(/(}" # \( -> ( ES="${ES//\\\\)/)}" # \) -> ) # escapes in replacement string lineEscaped="${line//&/\&}" # & -> \& minised s$'\777'"${ES}"$'\777'REPLACED$'\777' <<< "${line}" minised s$'\777'Jane$'\777'"${lineEscaped}"$'\777' <<< "${line}" minised s$'\777'"${ES}"$'\777'"${lineEscaped}"$'\777' <<< "${line}" 
0
source

To avoid a possible backslash confusion, we could (or rather should) use a backslash variable as follows:

 backSlash='\\' ES="${ES//${backSlash}(/(}" # \( -> ( ES="${ES//${backSlash})/)}" # \) -> ) 

(By the way, using variables in this way seems like a good approach to solving problems with parameter expansion ...)

0
source

... or to eliminate backslash confusion ...

 backSlash='\\' lineEscaped="${line//${backSlash}/${backSlash}}" # double backslashes lineEscaped="${lineEscaped//&/\&}" # & -> \& 
0
source

If you have bash and you just do a template replacement, just do it natively in bash. The ${parameter/pattern/string} extension in Bash will work very well for you, since you can just use the variable instead of the "template" and replace the "string", and the contents of the variable will be safe from the word extension. And this is an extension of the word that makes the pipeline sit down with such a hassle. :)

This will be faster than branching the child process and connecting to the server anyway. You already know how to do the whole thing while read line , so by creatively applying the possibilities in Bash to the existing documentation for expanding parameters, you can reproduce almost everything that you can do with sed. Open the Bash start page to get started ...

0
source

Source: https://habr.com/ru/post/1302312/


All Articles