Replace string with lowercase substring with sed / awk / tr / perl?

Question

Replace string with lowercase substring with sed / awk / tr / perl?

I have a plaintext file containing multiple instances of the $$DATABASE_*$$ template, and any character string can be an asterisk. I would like to replace the entire instance with everything that is in the asterisk part, but in lower case.

Here is the test file:

 $$DATABASE_GIBSON$$ test me $$DATABASE_GIBSON$$ test me $$DATABASE_GIBSON$$ test $$DATABASE_GIBSON$$ test $$DATABASE_GIBSON$$ $$DATABASE_GIBSON$$$$DATABASE_GIBSON$$

Here is the desired result:

 gibson test me gibson test me gibson test gibson test gibson gibsongibson

How to do this with sed / awk / tr / perl?

+4

awk perl sed tr

Dynamitereeed Oct 25 '12 at 17:10

source share

9 answers

This works with complex examples.

 perl -ple 's/\$\$DATABASE_(.*?)\$\$/lc($1)/eg' filename.txt

And for simpler examples:

 echo '$$DATABASE_GIBSON$$' | sed ' s@ $$DATABASE_\(.*\)\$\ $@ \L\ 1@ '

in sed , \L means lowercase ( \E to stop if necessary)

+1

Gilles quenot Oct 25 '12 at 17:16

source share

Unfortunately, there is no easy, reliable way with awk, but here is one approach:

 $ cat tst.awk { gsub(/[$][$]/,"\n") head = "" tail = $0 while ( match(tail, "\nDATABASE_[^\n]+\n") ) { head = head substr(tail,1,RSTART-1) trgt = substr(tail,RSTART,RLENGTH) tail = substr(tail,RSTART+RLENGTH) gsub(/\n(DATABASE_)?/,"",trgt) head = head tolower(trgt) } $0 = head tail gsub("\n","$$") print } $ cat file The quick brown $$DATABASE_FOX$$ jumped over the lazy $$DATABASE_DOG$$s back. The grey $$DATABASE_SQUIRREL$$ ate $$DATABASE_NUT$$s under a $$DATABASE_TREE$$. Put a dollar $$DATABASE_DOL$LAR$$ in the $$ string. $ awk -f tst.awk file The quick brown fox jumped over the lazy dogs back. The grey squirrel ate nuts under a tree. Put a dollar dol$lar in the $$ string.

Note the trick of converting $$ to a new char string so that we can deny that char in the match (RE) is without it (ie if we used ". +" Instead of "[^ \ n] +"), then due to the greedy match of RE, if the same pattern appears twice on the same input line, the match string will go from the beginning of the first template to the end of the second template.

+1

Ed morton Oct 25 '12 at 19:47

source share

Using only awk:

 > echo '$$DATABASE_AWESOME$$' | awk '{sub(/.*_/,"");sub(/\$\$$/,"");print tolower($0);}' awesome

Note that I'm on FreeBSD, so this is not GNU awk.

But this can only be done with bash:

 [ ghoti@pc ~]$ foo='$$DATABASE_AWESOME$$' [ ghoti@pc ~]$ foo=${foo##*_} [ ghoti@pc ~]$ foo=${foo%\$\$} [ ghoti@pc ~]$ foo=${foo,,} [ ghoti@pc ~]$ echo $foo awesome

Of the above substitutions, all but the last ( ${foo,,} ) will work in the standard Bourne shell. If you don't have bash, you can use tr for this step instead:

 $ echo $foo AWESOME $ foo=$(echo "$foo" | tr '[:upper:]' '[:lower:]') $ echo $foo awesome $

UPDATE

In the comments, it seems that the OP really wants to separate the substring from any text in which it is included - that is, our decisions should take into account the possibility of leading or trailing spaces before or after the line that he provided in his question.

 > echo 'foo $$DATABASE_KITTENS$$ bar' | sed -nE '/\$\$[^$]+\$\$/{;s/.*\$\$DATABASE_//;s/\$\$.*//;p;}' | tr '[:upper:]' '[:lower:]' kittens

And if you have pcregrep in your path (from the devel/pcre FreeBSD port), you can use it instead with lookaheads:

 > echo 'foo $$DATABASE_KITTENS$$ bar' | pcregrep -o '(?!\$\$DATABASE_)[AZ]+(?=\$\$)' | tr '[:upper:]' '[:lower:]' kittens

(For Linux users reading this: this is equivalent to using grep -P .)

And in pure bash:

 $ shopt -s extglob $ foo='foo $$DATABASE_KITTENS$$ bar' $ foo=${foo##*(?)\$\$DATABASE_} $ foo=${foo%%\$\$*(?)} $ foo=${foo,,} $ echo $foo kittens

Note that NONE of these three updated solutions will handle situations where multiple tags with database names exist on the same input line. This is also not indicated as a requirement in the question, but I just say that ...

0

ghoti Oct 25 '12 at 17:48

source share

You can do this pretty well with the supercool command :)

 echo '$$DATABASE_AWESOME$$' | cut -d'$' -f3 | cut -d_ -f2 | tr 'AZ' 'az'

0

miono Oct 25 '12 at 19:59

source share

This may work for you (GNU sed):

 sed 's/$\$/\n/g;s/\nDATABASE_\([^\n]*\)\n/\L\1/g;s/\n/$$/g' file

0

potong Oct 26 '12 at 8:29

source share

Here is the shortest (GNU) awk solution that I could come up with that does everything the OP asks for:

 awk -vRS='[$][$]DATABASE_([^$]+[$])+[$]' '{ORS=tolower(substr(RT,12,length(RT)-13))}1'

Even if the line indicated by an asterisk ( * ) contains one or more dollar signs ( $ ) and / or line breaks, this soul should still work.

0

mschilli Aug 28 '13 at 10:10

source share

 awk '{gsub(/\$\$DATABASE_GIBSON\$\$/,"gibson")}1' file gibson test me gibson test me gibson test gibson test gibson gibsongibson

0

Claes wikner May 01 '16 at 12:08

source share

echo $$DATABASE_WOOLY$$ | awk '{print tolower($0)}'

awk will take what it ever introduces, in this case the first merge and use the tolower function and return the results.

For your bash script, you can do something like this and use the DBLOWER variable

 DBLOWER=$(echo $$DATABASE_WOOLY$$ | awk '{print tolower($0)}');

-1

Adam Oct 25 '12 at 17:22

source share

Dynamitereeed · Accepted Answer · 2012-10-25T19:25:49+0000

Here's the perl version that I ended up using.

 perl -p -i.bak -e 's/\$\$DATABASE_(.*?)\$\$/lc($1)/eg' inputFile

Replace string with lowercase substring with sed / awk / tr / perl?

More articles: