Python split () without removing the delimiter

This code almost does what I need ...

for line in all_lines: s = line.split('>') 

In addition, all delimiters '>' are removed.

So,

 <html><head> 

Included in

 ['<html','<head'] 

Is there a way to use the split () method, but keep the separator instead of removing it?

With these results ..

 ['<html>','<head>'] 
+48
python split delimiter
Oct 23 '11 at 12:28
source share
4 answers
 d = ">" for line in all_lines: s = [e+d for e in line.split(d) if e] 
+27
Oct 23 '11 at 12:38
source share

If you are parsing HTML with parsing, you are most likely to be mistaken, unless you are writing a one-time script designed for fixed and protected content. If it should work on any HTML input, how will you handle something like <a title='growth > 8%' href='#something'> ?

In any case, the following works for me:

 >>> import re >>> re.split('(<[^>]*>)', '<body><table><tr><td>')[1::2] ['<body>', '<table>', '<tr>', '<td>'] 
+22
Oct 23 '11 at
source share

How about this:

 import re s = '<html><head>' re.findall('[^>]+>', s) 
+8
Oct 23 '11 at 12:45
source share

Just split it, then for each element of the array / list (except the last) add the final ">" to it.

0
Oct 23 '11 at 12:33
source share



All Articles