Why sed does not replace overlay templates

Question

Why sed does not replace overlay templates

I have a database upload file with a field separated by a <TAB> symbol. I run this file through sed to replace any occurrences of <TAB> <TAB> with <TAB> \ N <TAB>. This is so that when a file is uploaded to MySQL, \ N is interpreted as NULL.

Command sed / \ t \ t / \ t \ N \ t / g; 'almost works, except that it replaces only the first instance, for example. "... <TAB> <TAB> <TAB> ..." becomes "... <TAB> \ N <TAB> <TAB> ...".

If I use 's / \ t \ t / \ t \ N \ t / g; s / \ t \ t / \ t \ N \ t / g; ' it replaces more instances.

I have an opinion that, despite the / g modifier, this is due to the fact that the end of one match is the beginning of another.

Can someone explain what is happening and suggest a sed command that will work, or do I need to execute a loop.

I know that maybe I can switch to awk, perl, python, but I want to know what happens in sed.

+6

unix shell sed

hairyone Sep 14 '11 at 18:23

source share

5 answers

As a workaround, replace each tab with the + \ N tab; then delete all occurrences \ N that are not immediately followed by the tab.

 sed -e 's/\t/\t\\N/g' -e 's/\\N\([^\t]\)/\1/g'

... if your sed uses a backslash before grouping parentheses (there are saddle dialects that don't want a backslash, try without them if that doesn't work for you.)

+2

tripleee Sep 14 '11 at 18:49

source share

That's right, even with /g sed will not match the text that he replaced again. So it reads <TAB><TAB> and outputs <TAB>\N<TAB> , and then reads the next from the input stream. See http://www.grymoire.com/Unix/Sed.html#uh-7

In a regex language that supports lookaheads, you can get around this with lookahead.

+1

Thom blake Sep 14 '11 at 18:39

source share

Well, sed just works as designed. The input line is scanned once, not several times. Maybe this helps to look at the consequences if sed used re-scanning the input line to handle overlapping default patterns: in this case even simple replacements will work in a completely different way - some may say that it is intuitively intuitive, for example

s/^/ / Inserting a space at the beginning of a line will never end
s/$/foo/ adding foo to each line is also
s/[AZ][AZ]*/CENSORED/ replace upper case words with CENSORED - similarly

There are probably many other situations. Of course, all this could be fixed, say, with a substitution modifier, but at the time sed was designed, the current behavior was chosen.

+1

Jens Sep 14 '11 at 18:41

source share

No different from perl solution, it works for me using pure sed

 sed ':repeat; /\t\t/{ s|\t\t|\t\n\t|g; b repeat }'

Description

:repeat is the label used for branching commands, similar to batch
/\t\t/ means matching the tabs of template 2. If the template matches it, the command following the second / is executed.
{} - In this case, the command following the match command is a group. Thus, all the commands in the group are executed if the match pattern is completed.
s|\t\t|\t\n\t|g; - Standard replaces 2 tabs with tab-new line. I still use global, because if you say 15 tabs, you only need to loop twice, not 14 times.
b repeat means always goto (branch) label repeat

And so it goes. Continue to repeat (goto repeat ) if there is a match with a two-tab template.

While it can be argued that you could just make two identical global substitutions and call them good, the same method can work in more complex scenarios.

As @ thorn-blake notes, sed just doesn't support advanced features like lookahead, so you need to do such a loop.

Short version

Which can be reduced to

 sed ':r;/\t\t/{s|\t\t|\t\n\t|g; br}'

MacOS

And the Mac version (still compatible with Linux / Windows):

 sed $':r\n/\t\t/{ s|\t\t|\t\\\n\t|g; br\n}'

In BSD, sed must be literal
Newlines must be both literal and escaped at the same time, so a single slash (which \ before it is processed by the value $, making it the only literal slash) plus \ n, which becomes the actual new line
Label (: r) and branch (br) names must end with a newline. semicolons and spaces are used by the tag / branch name command in BSD, which makes it very confusing.

+1

Andy Jun 27 '17 at 17:43

source share

KevinDTimm · Accepted Answer · 2011-09-14T18:36:59+0000

I know that you want sed, but sed does not like it at all, it seems that it specifically (see here ) will not do what you want. However, perl will do this (AFAIK):

perl -pe 'while (s#\t\t#\t\n\t#) {}' <filename>

Why sed does not replace overlay templates

Description

Short version

MacOS

More articles: