Replace string with lowercase substring with sed / awk / tr / perl?

I have a plaintext file containing multiple instances of the $$DATABASE_*$$ template, and any character string can be an asterisk. I would like to replace the entire instance with everything that is in the asterisk part, but in lower case.

Here is the test file:

 $$DATABASE_GIBSON$$ test me $$DATABASE_GIBSON$$ test me $$DATABASE_GIBSON$$ test $$DATABASE_GIBSON$$ test $$DATABASE_GIBSON$$ $$DATABASE_GIBSON$$$$DATABASE_GIBSON$$ 

Here is the desired result:

 gibson test me gibson test me gibson test gibson test gibson gibsongibson 

How to do this with sed / awk / tr / perl?

+4
source share
9 answers

Here's the perl version that I ended up using.

 perl -p -i.bak -e 's/\$\$DATABASE_(.*?)\$\$/lc($1)/eg' inputFile 
+3
source

This works with complex examples.

 perl -ple 's/\$\$DATABASE_(.*?)\$\$/lc($1)/eg' filename.txt 

And for simpler examples:

 echo '$$DATABASE_GIBSON$$' | sed ' s@ $$DATABASE_\(.*\)\$\ $@ \L\ 1@ ' 

in , \L means lowercase ( \E to stop if necessary)

+1
source

Unfortunately, there is no easy, reliable way with awk, but here is one approach:

 $ cat tst.awk { gsub(/[$][$]/,"\n") head = "" tail = $0 while ( match(tail, "\nDATABASE_[^\n]+\n") ) { head = head substr(tail,1,RSTART-1) trgt = substr(tail,RSTART,RLENGTH) tail = substr(tail,RSTART+RLENGTH) gsub(/\n(DATABASE_)?/,"",trgt) head = head tolower(trgt) } $0 = head tail gsub("\n","$$") print } $ cat file The quick brown $$DATABASE_FOX$$ jumped over the lazy $$DATABASE_DOG$$s back. The grey $$DATABASE_SQUIRREL$$ ate $$DATABASE_NUT$$s under a $$DATABASE_TREE$$. Put a dollar $$DATABASE_DOL$LAR$$ in the $$ string. $ awk -f tst.awk file The quick brown fox jumped over the lazy dogs back. The grey squirrel ate nuts under a tree. Put a dollar dol$lar in the $$ string. 

Note the trick of converting $$ to a new char string so that we can deny that char in the match (RE) is without it (ie if we used ". +" Instead of "[^ \ n] +"), then due to the greedy match of RE, if the same pattern appears twice on the same input line, the match string will go from the beginning of the first template to the end of the second template.

+1
source

Using only awk:

 > echo '$$DATABASE_AWESOME$$' | awk '{sub(/.*_/,"");sub(/\$\$$/,"");print tolower($0);}' awesome 

Note that I'm on FreeBSD, so this is not GNU awk.

But this can only be done with bash:

 [ ghoti@pc ~]$ foo='$$DATABASE_AWESOME$$' [ ghoti@pc ~]$ foo=${foo##*_} [ ghoti@pc ~]$ foo=${foo%\$\$} [ ghoti@pc ~]$ foo=${foo,,} [ ghoti@pc ~]$ echo $foo awesome 

Of the above substitutions, all but the last ( ${foo,,} ) will work in the standard Bourne shell. If you don't have bash, you can use tr for this step instead:

 $ echo $foo AWESOME $ foo=$(echo "$foo" | tr '[:upper:]' '[:lower:]') $ echo $foo awesome $ 

UPDATE

In the comments, it seems that the OP really wants to separate the substring from any text in which it is included - that is, our decisions should take into account the possibility of leading or trailing spaces before or after the line that he provided in his question.

 > echo 'foo $$DATABASE_KITTENS$$ bar' | sed -nE '/\$\$[^$]+\$\$/{;s/.*\$\$DATABASE_//;s/\$\$.*//;p;}' | tr '[:upper:]' '[:lower:]' kittens 

And if you have pcregrep in your path (from the devel/pcre FreeBSD port), you can use it instead with lookaheads:

 > echo 'foo $$DATABASE_KITTENS$$ bar' | pcregrep -o '(?!\$\$DATABASE_)[AZ]+(?=\$\$)' | tr '[:upper:]' '[:lower:]' kittens 

(For Linux users reading this: this is equivalent to using grep -P .)

And in pure bash:

 $ shopt -s extglob $ foo='foo $$DATABASE_KITTENS$$ bar' $ foo=${foo##*(?)\$\$DATABASE_} $ foo=${foo%%\$\$*(?)} $ foo=${foo,,} $ echo $foo kittens 

Note that NONE of these three updated solutions will handle situations where multiple tags with database names exist on the same input line. This is also not indicated as a requirement in the question, but I just say that ...

0
source

You can do this pretty well with the supercool command :)

 echo '$$DATABASE_AWESOME$$' | cut -d'$' -f3 | cut -d_ -f2 | tr 'AZ' 'az' 
0
source

This may work for you (GNU sed):

 sed 's/$\$/\n/g;s/\nDATABASE_\([^\n]*\)\n/\L\1/g;s/\n/$$/g' file 
0
source

Here is the shortest (GNU) awk solution that I could come up with that does everything the OP asks for:

 awk -vRS='[$][$]DATABASE_([^$]+[$])+[$]' '{ORS=tolower(substr(RT,12,length(RT)-13))}1' 

Even if the line indicated by an asterisk ( * ) contains one or more dollar signs ( $ ) and / or line breaks, this soul should still work.

0
source
 awk '{gsub(/\$\$DATABASE_GIBSON\$\$/,"gibson")}1' file gibson test me gibson test me gibson test gibson test gibson gibsongibson 
0
source

echo $$DATABASE_WOOLY$$ | awk '{print tolower($0)}'

awk will take what it ever introduces, in this case the first merge and use the tolower function and return the results.

For your bash script, you can do something like this and use the DBLOWER variable

 DBLOWER=$(echo $$DATABASE_WOOLY$$ | awk '{print tolower($0)}'); 
-1
source

Source: https://habr.com/ru/post/1442126/


All Articles