Python split () without removing the delimiter

Question

Python split () without removing the delimiter

This code almost does what I need ...

for line in all_lines: s = line.split('>')

In addition, all delimiters '>' are removed.

So,

 <html><head>

Included in

 ['<html','<head']

Is there a way to use the split () method, but keep the separator instead of removing it?

With these results ..

 ['<html>','<head>']

+48

python split delimiter

some1 Oct 23 '11 at 12:28

source share

4 answers

If you are parsing HTML with parsing, you are most likely to be mistaken, unless you are writing a one-time script designed for fixed and protected content. If it should work on any HTML input, how will you handle something like <a title='growth > 8%' href='#something'> ?

In any case, the following works for me:

 >>> import re >>> re.split('(<[^>]*>)', '<body><table><tr><td>')[1::2] ['<body>', '<table>', '<tr>', '<td>']

+22

gb. Oct 23 '11 at

source share

How about this:

 import re s = '<html><head>' re.findall('[^>]+>', s)

+8

Óscar López Oct 23 '11 at 12:45

source share

Just split it, then for each element of the array / list (except the last) add the final ">" to it.

0

orangething Oct 23 '11 at 12:33

source share

P.Melch · Accepted Answer · 2011-10-23 12:38

 d = ">" for line in all_lines: s = [e+d for e in line.split(d) if e]

Python split () without removing the delimiter

More articles: