Regular expression - replace all spaces at the beginning of the line with periods

Question

Regular expression - replace all spaces at the beginning of the line with periods

I don't care if I can achieve this via vim, sed, awk, python, etc. I tried everything, could not do it.

For input of this type:

top f1 f2 f3 sub1 f1 f2 f3 sub2 f1 f2 f3 sub21 f1 f2 f3 sub3 f1 f2 f3

I want to:

 top f1 f2 f3 ...sub1 f1 f2 f3 ...sub2 f1 f2 f3 ......sub21 f1 f2 f3 ...sub3 f1 f2 f3

Then I want to just load this into Excel (space-limited) and still look at the hierarchy of the first column!

I tried a lot of things, but in the end I lose information about the hierarchy

+5

python vim regex awk sed

shikhanshu Oct 3 '17 at 23:31

source share

5 answers

There are two different ways in vim to do this.

With regex:
```
 :%s/^\s\+/\=repeat('.', len(submatch(0))) 
```
It is quite simple, but a little detailed. It uses the register eval ( \= ) to generate the string '.' the same length as the number of spaces at the beginning of each line.
With the norm team:
```
 :%norm ^hviwr. 
```
This is a much more convenient short command, although it is a little more difficult to understand. It visually selects spaces at the beginning of a line and replaces the entire selection with dots. If there is no leading space, the command will fail at ^h because the cursor is trying to go beyond.
To find out how this works, try ^hviwr. in a line with leading spaces to see how this happens.

+5

DJMcMayhem Oct 3 '17 at 23:46

source share

Since you said python :

 #!/usr/bin/env python import re, sys for line in sys.stdin: sys.stdout.write(re.sub('^ +', lambda m: len(m.group(0)) * '.', line))

(for each line we replace the longest run of the prefix spaces '^ +' same long line of points, len(m.group(0)) * '.' ).

With the end result:

 $ ./dottify.py <file top f1 f2 f3 ...sub1 f1 f2 f3 ...sub2 f1 f2 f3 ......sub21 f1 f2 f3 ...sub3 f1 f2 f3

Since you said awk :

 $ awk '{ match($0,/^ +/); p=substr($0,0,RLENGTH); gsub(" ",".",p); print p""substr($0,RLENGTH+1) }' file top f1 f2 f3 ...sub1 f1 f2 f3 ...sub2 f1 f2 f3 ......sub21 f1 f2 f3 ...sub3 f1 f2 f3

(where for each line we map the longest space prefix to match , extract it with substr , replace each space with a dot on gsub and print this modified prefix p followed by the rest of the input line ( RSTART and RLENGTH ) populated after match() and save the starting position and length of the matching pattern).

+3

randomir Oct 4 '17 at 12:41

source share

In awk. It continues to replace the first space with a period, while a space precedes only periods:

 $ awk '{while(/^\.* / && sub(/ /,"."));}1' file top f1 f2 f3 ...sub1 f1 f2 f3 ...sub2 f1 f2 f3 ......sub21 f1 f2 f3 ...sub3 f1 f2 f3

and here is one in perl:

 $ perl -p -e 'while(s/(^\.*) /\1./){;}' file top f1 f2 f3 ...sub1 f1 f2 f3 ...sub2 f1 f2 f3 ......sub21 f1 f2 f3 ...sub3 f1 f2 f3

+3

James brown Oct 4 '17 at 3:49

source share

A bit long but fun exercise nonetheless:

 # Function to count the number of leading spaces in a string # Basically, this counts the number of consecutive elements that satisfy being spaces def count_leading_spaces(s): if not s: return 0 else: curr_char = s[0] if curr_char != ' ': return 0 else: idx = 1 curr_char = s[idx] while curr_char == ' ': idx += 1 try: curr_char = s[idx] except IndexError: return idx return idx

Finally, open the file and do some work:

 with open('file.txt', 'r') as f: data = [] for i, line in enumerate(f): # Don't do anything to the field names if i == 0: new_line = line.rstrip() else: n_leading_spaces = count_leading_spaces(line) # Impute periods for spaces new_line = ('.'*n_leading_spaces + line.lstrip()).rstrip() data.append(new_line)

Results:

 >>> print('\n'.join(data)) top f1 f2 f3 ...sub1 f1 f2 f3 ...sub2 f1 f2 f3 ......sub21 f1 f2 f3 ...sub3 f1 f2 f3

You can also make it so much easier:

 with open('file.txt', 'r') as f: data = [] for i, line in enumerate(f): # Don't do anything to the field names if i == 0: new_line = line.rstrip() else: n_leading_spaces = len(line) - len(line.lstrip()) # Impute periods for spaces new_line = line.lstrip().rjust(len(line), '.').rstrip() data.append(new_line)

+1

blacksite Oct 4 '17 at 0:00

source share

John1024 · Accepted Answer · 2017-10-03T23:41:58+0000

With this as an input:

 $ cat file top f1 f2 f3 sub1 f1 f2 f3 sub2 f1 f2 f3 sub21 f1 f2 f3 sub3 f1 f2 f3

Try:

 $ sed -E ':a; s/^( *) ([^ ])/\1.\2/; ta' file top f1 f2 f3 ...sub1 f1 f2 f3 ...sub2 f1 f2 f3 ......sub21 f1 f2 f3 ...sub3 f1 f2 f3

How it works:

:a
This creates a label a .
s/^( *) ([^ ])/\1.\2/
If a line starts with spaces, this replaces the last space in the leading spaces with a period.
In more detail ^( *) matches all leading entries except the last one and saves them in group 1. The regular expression ([^ ]) (which, despite looking like stackoverflow, consists of a space followed by ([^ ]) ), matches a space followed by a nebuk, and retains a nonempty one in group 2.
\1.\2 replaces the agreed text with group 1, followed by a period, followed by group 2.
ta
If the replaced command resulted in a replacement, return to label a and try again.

Compatibility:

Above was tested on modern GNU sed. For BSD / OSX sed, you may or may not need to use:
```
 sed -E -e :a -e 's/^( *) ([^ ])/\1.\2/' -e ta file 
```
In ancient GNU sed, you need to use -r instead of -E :
```
 sed -r ':a; s/^( *) ([^ ])/\1.\2/; ta' file 
```
The above suggested that spaces were spaces. If they are tabs, you will need to decide what your tabstop is and make appropriate replacements.

Regular expression - replace all spaces at the beginning of the line with periods

How it works:

Compatibility:

More articles: