Translate part of a line

I have a bunch of files that I switch from one wiki (based on Markdown) to another (based on creoles). I have written several sed scripts for things like converting link formats and header formats. But the new wiki allows a directory structure, and I would rather use it than the existing pseudo-directory structure. I already renamed the files, but I need to convert all links from _ with delimiters to / .

Basic information:

 Creole link: [[url]] [[url|name]] 

I want to convert non-containing links . or / .

I would really appreciate it if you could explain what the team you are giving means so that I can learn from it.


Example

 this is a line with a [[Link_to_something]] and [[Something_else|something else]] this site is cool [[http://example.com/this_page]] 

to

 this is a line with a [[Link/to/something]] and [[Something/else|something else]] this site is cool [[http://example.com/this_page]] 

What i tried

y/// only works on the entire line.

s//\u\2 only supports case translations.

+4
source share
3 answers

I think I would use Perl. This can be done as a single line, thus:

 perl -pe 's{\[\[([^/.|]+)(|[^]]+)?\]\]}{$x=$1;$y=$2;$x=~s%_%/%g;"[[$x$y]]"}gex;' <<'EOF' this is a line with a [[Link_to_something]] and [[Something_else|something else]] this site is cool [[http://example.com/this_page]] EOF 

The way out of this:

 this is a line with a [[Link/to/something]] and [[Something/else|something else]] this site is cool [[http://example.com/this_page]] 

Is this a good style etc. completely open to discussion.

I will explain this version of the code, which is isomorphic to the code above:

 perl -e 'use strict; use warnings; while (my $line = <>) { $line =~ s{ \[\[ ([^/.|]+) (|[^]]+)? \]\] } { my($x, $y) = ($1, $2); $x =~ s%_%/%g; "[[$x$y]]" }gex; print $line; } ' 

The while basically matches -p in the first version. I explicitly named the input variable as $line instead of using the implicit $_ , as in the first version. I also had to declare $x and $y due to use strict; use warnings; use strict; use warnings; .

Command substitution takes the form s{pattern}{replace} because the regular expressions themselves have slashes. The modifier x allows (non-essential) spaces in two parts, which simplifies its calculation. The g modifier repeats the replacement as often as the pattern matches. The e modifier says: "Treat the right side of the lookup as an expression."

The corresponding pattern looks for a pair of open square brackets, and then remembers a sequence of characters other than / ,. or | optionally followed by | and a sequence of characters other than ] , ending in a pair of close square brackets. Two captures: $1 and $2 .

The replacement expression stores the values โ€‹โ€‹of $1 and $2 in the variables $x and $y . He then applies a simpler replacement to $x , changing the underscores to slashes. Then the result is the string [[$x$y]] . You cannot change $1 or $2 directly in a replacement expression. And the internal clobbers are $1 and $2 , so I needed $x and $y .

Perhaps there is another way to do this - this is Perl, so TMTOWTDI: there is more than one way to do this. But that at least works.

+3
source

This might work for you:

 awk -vORS='' -vRS='[[][[][^].]*[]][]]' '{gsub(/_/,"/",RT);print $0 RT}' file this is a line with a [[Link/to/something]] and [[Something/else|something else]] this site is cool [[http://example.com/this_page]] 
  • Set the output record separator to null
  • Set the record separator to [[...]] (where ... does not contain..
  • Replace all _ with what is placed in the RT record separator variable with / 's
  • Print the concatenated record and record separator. those. $0 RT

This is the sed solution:

 sed 's/\[\[[^].]*]]/\a\n&\a\n/g' file | sed '/^\[\[[^]]*\]\]\a/y/_/\//;H;$!d;g;s/\a\n//g;s/.//' this is a line with a [[Link/to/something]] and [[Something/else|something else]] this site is cool [[http://example.com/this_page]] 
  • Surround [[...]] on \a\n NB \a is selected as the unlikely symbol that should appear in the file.
  • Translate ' _ on / in lines starting with [[
  • Delete all occurrences \a\n

If you have GNU sed, this will do:

 sed '/\[\[[^].]*]]/{s||'\''$(sed "y/_/\\//" <<<"&")'\''|g;s/.*/echo '\''&'\''/}' file this is a line with a [[Link/to/something]] and [[Something/else|something else]] this site is cool [[http://example.com/this_page]] 
+2
source

You can use python to simplify the regex:

 $ python3 -c ' > import re > import sys > for line in sys.stdin: > print(re.sub(r"\[\[(?!http).*?\]\]", lambda m:m.group(0).replace("_", "/"), line), end="") > ' <input.txt this is a line with a [[Link/to/something]] and [[Something/else|something else]] this site is cool [[http://example.com/this_page]] 

Note : $ and > at the beginning of the line is the command line.


You can also do this in vim visually:

 /\[\[\(http\)\@!.\{-}\]\] :% s@ @\=substitute(submatch(0), '_', '/', '')@g 
+1
source

Source: https://habr.com/ru/post/1398307/


All Articles