I try to catch the first names by making the assumption that they are in shape Firstname Lastlame. This works well with the code below, but I would like to catch international names, for example Pär Åberg. I found some solutions, but they, unfortunately, do not work with Python flavored regular expression. Anyone with an understanding for this?
#!/usr/bin/python
# -*- coding: utf-8 -*-
import re
text = """
This is a text containing names of people in the text such as
Hillary Clinton or Barack Obama. My problem is with names that uses stuff
outside A-Z like Swedish names such as Pär Åberg."""
for name in re.findall("(([A-Z])[\w-]*(\s+[A-Z][\w-]*)+)", text):
firstname = name[0].split()[0]
print firstname
source
share