Python Regex Split Stores Separation Templates

The easiest way to explain this is with an example: I have this line: "Documents / src / Scripts / temp" What I know is how to break up two different ways:

re.split('/', 'Docs/src/Scripts/temp') -> ['Docs', 'src', 'Scripts', 'temp'] re.split('(/)', 'Docs/src/Scripts/temp') -> ['Docs', '/', 'src', '/', 'Scripts', '/', 'temp'] 

Is there a way to break into a slash, but keep the slash of the word? For example, I want the above line to look like this:

 ['Docs/', '/src/', '/Scripts/', '/temp'] 

Any help would be appreciated!

+6
source share
6 answers

An interesting question, I would suggest doing something like this:

 >>> 'Docs/src/Scripts/temp'.replace('/', '/\x00/').split('\x00') ['Docs/', '/src/', '/Scripts/', '/temp'] 

The idea here is to first replace all characters / with two characters / , separated by a special character that would not be part of the original string. I used a null byte ( '\x00' ), but you could change it to something else, and then split it into this special character.

Regex is actually not very good because you cannot split into zero-length matches, and re.findall() does not find matching matches, so you may need to do a few passes over the line.

In addition, re.split('/', s) will do the same as s.split('/') , but the second is more efficient.

+8
source

Solution without split() but with views:

 >>> s = 'Docs/src/Scripts/temp' >>> r = re.compile(r"(?=((?:^|/)[^/]*/?))") >>> r.findall(s) ['Docs/', '/src/', '/Scripts/', '/temp'] 

Explanation:

 (?= # Assert that it possible to match... ( # and capture... (?:^|/) # the start of the string or a slash [^/]* # any number of non-slash characters /? # and (optionally) an ending slash. ) # End of capturing group ) # End of lookahead 

Since the lookahead statement is checked at every position in the line and does not consume any characters, it has no problems with matching matches.

+5
source

1) You do not need regular expressions to split into one fixed character:

 >>> 'Docs/src/Scripts/temp'.split('/') 

['Docs', 'src', 'Scripts', 'temp']

2) Consider this method:

 import os.path def components(path): start = 0 for end, c in enumerate(path): if c == os.path.sep: yield path[start:end+1] start = end yield path[start:] 

It does not rely on clever tricks such as split-join-splitting, which makes it more readable in my opinion.

+3
source

If you don't insist on a slash on both sides, this is actually pretty simple:

 >>> re.findall(r"([^/]*/)", 'Docs/src/Scripts/temp') ['Docs/', 'src/', 'Scripts/'] 

Neither re nor division really is cut out for overlapping lines, so if this is what you really want, I would just add a slash to the beginning of every result except the first.

+2
source

Try the following:

 re.split(r'(/)', 'Docs/src/Scripts/temp') 

From python documentation

re.split (pattern, string, maxsplit = 0, flags = 0)

Divide the string into the occurrences of the pattern. If the brackets in parentheses are used in the template, then the text of all groups in the template is also returned as part of the resulting list. If maxsplit is nonzero, no more than maxsplit split and the rest of the string is returned as the final item in the list. (Incompatibility note: in the original Python 1.5 release, maxsplit was ignored. This was fixed in subsequent releases.)

+2
source

I'm not sure there is an easy way to do this. This is the best I could come up with ...

 import re lSplit = re.split('/', 'Docs/src/Scripts/temp') print [lSplit[0]+'/'] + ['/'+x+'/' for x in lSplit][1:-1] + ['/'+lSplit[len(lSplit)-1]] 

Kind of a mess, but it does what you want.

+1
source

Source: https://habr.com/ru/post/910945/


All Articles