For reasonably modern versions of sed, edit standard input to get standard output with
$ echo 'τέχνη βιβλίο γη κήπος' | sed -E -e 's/[[:blank:]]+/\n/g' τέχνη βιβλίο γη κήπος
If your vocabulary words are in files named lesson1 and lesson2 , redirect seds standard output to the all-vocab file with
sed -E -e 's/[[:blank:]]+/\n/g' lesson1 lesson2 > all-vocab
What does it mean:
- The character class
[[:blank:]] matches either a single space or a single tab character.- Use
[[:space:]] instead of matching any single space character (usually a space, tab, new line, carriage return, form feed and tab with a vertical tab). - A coefficient of
+ means matching one or more previous patterns. - So
[[:blank:]]+ is a sequence of one or more characters that is a space or tab.
\n in the replacement is the new line you want.- The
/g modifier at the end means replacing as often as possible, and not just once. - The
-E option tells sed to use the POSIX extended regular expression syntax and, in particular, the + quantifier for this case. Without -E your sed command will become sed -e 's/[[:blank:]]\+/\n/g' . (Note the use of \+ , not the simple + .)
Perl Compatible Regexes
For those familiar with Perge-compatible regular expressions and sed with PCRE support, use \s+ to match runs of at least one space character, as in
sed -E -e 's/\s+/\n/g' old > new
or
sed -e 's/\s\+/\n/g' old > new
These commands read the input from the old file and write the result to a file named new in the current directory.
Maximum portability, maximum toughness
Returning to almost any version of sed with Version 7 of Unix , calling the command is a bit more baroque.
$ echo 'τέχνη βιβλίο γη κήπος' | sed -e 's/[ \t][ \t]*/\ /g' τέχνη βιβλίο γη κήπος
Notes:
- Here we do not even assume the existence of a modest quantifier
+ and simulate it with a single space-or-tab ( [ \t] ), followed by zero or more ( [ \t]* ). - Similarly, if sed does not understand
\n for a new line, we must include it in the command line verbatim.\ and the end of the first line of the command is a continuation marker that comes out of the next line of the new line, and the rest of the command is on the next line.- Note. There should be no spaces preceding an escaped newline. That is, the end of the first line must be exactly the backslash, followed by the end of the line.
- This error-prone process helps to understand why the world moves to visible characters, and you will want to be careful when trying to execute a copy and paste command.
Backslash and citation
Commands above all used single quotes ( '' ), not double quotes ( "" ). Consider:
$ echo '\\\\' "\\\\" \\\\ \\
That is, the shell applies different escaping rules to single-frame strings compared to double-quoted strings. You usually want to protect all backslashes that are common in single-quoted regular expressions.
Greg Bacon Dec 05 '09 at 18:40 2009-12-05 18:40
source share