Python Regex Split Stores Separation Templates

Question

Python Regex Split Stores Separation Templates

The easiest way to explain this is with an example: I have this line: "Documents / src / Scripts / temp" What I know is how to break up two different ways:

re.split('/', 'Docs/src/Scripts/temp') -> ['Docs', 'src', 'Scripts', 'temp'] re.split('(/)', 'Docs/src/Scripts/temp') -> ['Docs', '/', 'src', '/', 'Scripts', '/', 'temp']

Is there a way to break into a slash, but keep the slash of the word? For example, I want the above line to look like this:

 ['Docs/', '/src/', '/Scripts/', '/temp']

Any help would be appreciated!

+6

python regex

user1274774 Mar 16 '12 at 19:08

source share

6 answers

Andrew Clark · Answer 1 · 2012-03-16T19:15:41+0000

An interesting question, I would suggest doing something like this:

 >>> 'Docs/src/Scripts/temp'.replace('/', '/\x00/').split('\x00') ['Docs/', '/src/', '/Scripts/', '/temp']

The idea here is to first replace all characters / with two characters / , separated by a special character that would not be part of the original string. I used a null byte ( '\x00' ), but you could change it to something else, and then split it into this special character.

Regex is actually not very good because you cannot split into zero-length matches, and re.findall() does not find matching matches, so you may need to do a few passes over the line.

In addition, re.split('/', s) will do the same as s.split('/') , but the second is more efficient.

Tim pietzcker · Answer 2 · 2012-03-16T22:25:20+0000

Solution without split() but with views:

 >>> s = 'Docs/src/Scripts/temp' >>> r = re.compile(r"(?=((?:^|/)[^/]*/?))") >>> r.findall(s) ['Docs/', '/src/', '/Scripts/', '/temp']

Explanation:

 (?= # Assert that it possible to match... ( # and capture... (?:^|/) # the start of the string or a slash [^/]* # any number of non-slash characters /? # and (optionally) an ending slash. ) # End of capturing group ) # End of lookahead

Since the lookahead statement is checked at every position in the line and does not consume any characters, it has no problems with matching matches.

hop · Answer 3 · 2012-03-16T19:44:16+0000

1) You do not need regular expressions to split into one fixed character:

 >>> 'Docs/src/Scripts/temp'.split('/')

['Docs', 'src', 'Scripts', 'temp']

2) Consider this method:

 import os.path def components(path): start = 0 for end, c in enumerate(path): if c == os.path.sep: yield path[start:end+1] start = end yield path[start:]

It does not rely on clever tricks such as split-join-splitting, which makes it more readable in my opinion.

alexis · Answer 4 · 2012-03-16T21:51:19+0000

If you don't insist on a slash on both sides, this is actually pretty simple:

 >>> re.findall(r"([^/]*/)", 'Docs/src/Scripts/temp') ['Docs/', 'src/', 'Scripts/']

Neither re nor division really is cut out for overlapping lines, so if this is what you really want, I would just add a slash to the beginning of every result except the first.

sodas tsai · Answer 5 · 2012-05-06T04:15:03+0000

Try the following:

 re.split(r'(/)', 'Docs/src/Scripts/temp')

From python documentation

re.split (pattern, string, maxsplit = 0, flags = 0)
Divide the string into the occurrences of the pattern. If the brackets in parentheses are used in the template, then the text of all groups in the template is also returned as part of the resulting list. If maxsplit is nonzero, no more than maxsplit split and the rest of the string is returned as the final item in the list. (Incompatibility note: in the original Python 1.5 release, maxsplit was ignored. This was fixed in subsequent releases.)

b10hazard · Answer 6 · 2012-03-16T19:23:40+0000

I'm not sure there is an easy way to do this. This is the best I could come up with ...

 import re lSplit = re.split('/', 'Docs/src/Scripts/temp') print [lSplit[0]+'/'] + ['/'+x+'/' for x in lSplit][1:-1] + ['/'+lSplit[len(lSplit)-1]]

Kind of a mess, but it does what you want.

Python Regex Split Stores Separation Templates

More articles: