import re DATA = 'Joe, Dave, Professional, Ph.D. and Someone else' regx = re.compile('\s*(?:,|and)\s*') print regx.split(DATA)
result
['Joe', 'Dave', 'Professional', 'Ph.D.', 'Someone else']
Where is the difficulty?
Please note that with (?:,|and) delimiters are not displayed as a result, and with (;|and) result will be
['Joe', ',', 'Dave', ',', 'Professional', ',', 'Ph.D.', 'and', 'Someone else']
Change 1
errrr .... the difficulty is that with
DATA = 'Joe, Dave, Professional, Handicaped, Ph.D. and Someone else'
result
['Joe', 'Dave', 'Professional', 'H', 'icaped', 'Ph.D.', 'Someone else']
.
Fixed:
regx = re.compile('\s+and\s+|\s*,\s*')
.
Edit 2
errrr .. ah ... ah ...
Sorry, I did not notice that the Professional, Ph.D. should not be divided. But what is the criterion not to separate by comma in this line?
I chose this criterion: "a comma followed by a line that has dots before the next comma"
Another problem is the confusion of spaces and the words "and".
As well as the problem of headers and trailing spaces.
Finally, I managed to write a regex pattern that manages a lot more cases than the previous one, even if some cases are somewhat artificial (for example, lost and present at the end of the line, and why at the beginning too, too? Etc. ):
import re regx = re.compile('\s*,(?!(?:[^,.]+?\.)+[^,.]*?,)(?:\sand[,\s]+|\s)*|\s*and[,\s]+|[.\s]*\Z|\A\s*') DATA = ' Joe ,and, Dave , Professional, Ph.D., Handicapped and handyman , and Someone else and . .' print repr(DATA) print print regx.split(DATA)
result
' Joe ,and, Dave , Professional, Ph.D., Handicapped and handyman , and Someone else and . .' ['', 'Joe', '', 'Dave', 'Professional, Ph.D.', 'Handicapped', 'handyman', 'Someone else', '', '']
.
With print [x for x in regx.split(DATA) if x] we get:
['Joe', 'Dave', 'Professional, Ph.D.', 'Handicapped', 'handyman', 'Someone else']
.
Compared to the result of the regular expression Qtax on the same line:
[' Joe ', 'and', 'Dave ', 'Professional, Ph.D.', 'Handicapped', 'handyman ', 'and Someone else', '. .']