Regular expression - replace all spaces at the beginning of the line with periods

I don't care if I can achieve this via vim, sed, awk, python, etc. I tried everything, could not do it.

For input of this type:

top f1 f2 f3 sub1 f1 f2 f3 sub2 f1 f2 f3 sub21 f1 f2 f3 sub3 f1 f2 f3 

I want to:

 top f1 f2 f3 ...sub1 f1 f2 f3 ...sub2 f1 f2 f3 ......sub21 f1 f2 f3 ...sub3 f1 f2 f3 

Then I want to just load this into Excel (space-limited) and still look at the hierarchy of the first column!

I tried a lot of things, but in the end I lose information about the hierarchy

+5
source share
5 answers

With this as an input:

 $ cat file top f1 f2 f3 sub1 f1 f2 f3 sub2 f1 f2 f3 sub21 f1 f2 f3 sub3 f1 f2 f3 

Try:

 $ sed -E ':a; s/^( *) ([^ ])/\1.\2/; ta' file top f1 f2 f3 ...sub1 f1 f2 f3 ...sub2 f1 f2 f3 ......sub21 f1 f2 f3 ...sub3 f1 f2 f3 

How it works:

  • :a

    This creates a label a .

  • s/^( *) ([^ ])/\1.\2/

    If a line starts with spaces, this replaces the last space in the leading spaces with a period.

    In more detail ^( *) matches all leading entries except the last one and saves them in group 1. The regular expression ([^ ]) (which, despite looking like stackoverflow, consists of a space followed by ([^ ]) ), matches a space followed by a nebuk, and retains a nonempty one in group 2.

    \1.\2 replaces the agreed text with group 1, followed by a period, followed by group 2.

  • ta

    If the replaced command resulted in a replacement, return to label a and try again.

Compatibility:

  • Above was tested on modern GNU sed. For BSD / OSX sed, you may or may not need to use:

     sed -E -e :a -e 's/^( *) ([^ ])/\1.\2/' -e ta file 

    In ancient GNU sed, you need to use -r instead of -E :

     sed -r ':a; s/^( *) ([^ ])/\1.\2/; ta' file 
  • The above suggested that spaces were spaces. If they are tabs, you will need to decide what your tabstop is and make appropriate replacements.

+4
source

There are two different ways in vim to do this.

  • With regex:

     :%s/^\s\+/\=repeat('.', len(submatch(0))) 

    It is quite simple, but a little detailed. It uses the register eval ( \= ) to generate the string '.' the same length as the number of spaces at the beginning of each line.

  • With the norm team:

     :%norm ^hviwr. 

    This is a much more convenient short command, although it is a little more difficult to understand. It visually selects spaces at the beginning of a line and replaces the entire selection with dots. If there is no leading space, the command will fail at ^h because the cursor is trying to go beyond.

    To find out how this works, try ^hviwr. in a line with leading spaces to see how this happens.

+5
source

Since you said python :

 #!/usr/bin/env python import re, sys for line in sys.stdin: sys.stdout.write(re.sub('^ +', lambda m: len(m.group(0)) * '.', line)) 

(for each line we replace the longest run of the prefix spaces '^ +' same long line of points, len(m.group(0)) * '.' ).

With the end result:

 $ ./dottify.py <file top f1 f2 f3 ...sub1 f1 f2 f3 ...sub2 f1 f2 f3 ......sub21 f1 f2 f3 ...sub3 f1 f2 f3 

Since you said awk :

 $ awk '{ match($0,/^ +/); p=substr($0,0,RLENGTH); gsub(" ",".",p); print p""substr($0,RLENGTH+1) }' file top f1 f2 f3 ...sub1 f1 f2 f3 ...sub2 f1 f2 f3 ......sub21 f1 f2 f3 ...sub3 f1 f2 f3 

(where for each line we map the longest space prefix to match , extract it with substr , replace each space with a dot on gsub and print this modified prefix p followed by the rest of the input line ( RSTART and RLENGTH ) populated after match() and save the starting position and length of the matching pattern).

+3
source

In awk. It continues to replace the first space with a period, while a space precedes only periods:

 $ awk '{while(/^\.* / && sub(/ /,"."));}1' file top f1 f2 f3 ...sub1 f1 f2 f3 ...sub2 f1 f2 f3 ......sub21 f1 f2 f3 ...sub3 f1 f2 f3 

and here is one in perl:

 $ perl -p -e 'while(s/(^\.*) /\1./){;}' file top f1 f2 f3 ...sub1 f1 f2 f3 ...sub2 f1 f2 f3 ......sub21 f1 f2 f3 ...sub3 f1 f2 f3 
+3
source

A bit long but fun exercise nonetheless:

 # Function to count the number of leading spaces in a string # Basically, this counts the number of consecutive elements that satisfy being spaces def count_leading_spaces(s): if not s: return 0 else: curr_char = s[0] if curr_char != ' ': return 0 else: idx = 1 curr_char = s[idx] while curr_char == ' ': idx += 1 try: curr_char = s[idx] except IndexError: return idx return idx 

Finally, open the file and do some work:

 with open('file.txt', 'r') as f: data = [] for i, line in enumerate(f): # Don't do anything to the field names if i == 0: new_line = line.rstrip() else: n_leading_spaces = count_leading_spaces(line) # Impute periods for spaces new_line = ('.'*n_leading_spaces + line.lstrip()).rstrip() data.append(new_line) 

Results:

 >>> print('\n'.join(data)) top f1 f2 f3 ...sub1 f1 f2 f3 ...sub2 f1 f2 f3 ......sub21 f1 f2 f3 ...sub3 f1 f2 f3 

You can also make it so much easier:

 with open('file.txt', 'r') as f: data = [] for i, line in enumerate(f): # Don't do anything to the field names if i == 0: new_line = line.rstrip() else: n_leading_spaces = len(line) - len(line.lstrip()) # Impute periods for spaces new_line = line.lstrip().rjust(len(line), '.').rstrip() data.append(new_line) 
+1
source

Source: https://habr.com/ru/post/1272318/


All Articles