Correspondence of names in the form First Name Last Name with international characters

Question

Correspondence of names in the form First Name Last Name with international characters

I try to catch the first names by making the assumption that they are in shape Firstname Lastlame. This works well with the code below, but I would like to catch international names, for example Pär Åberg. I found some solutions, but they, unfortunately, do not work with Python flavored regular expression. Anyone with an understanding for this?

#!/usr/bin/python
# -*- coding: utf-8 -*- 
import re

text = """
This is a text containing names of people in the text such as 
Hillary Clinton or Barack Obama. My problem is with names that uses stuff 
outside A-Z like Swedish names such as Pär Åberg."""

for name in re.findall("(([A-Z])[\w-]*(\s+[A-Z][\w-]*)+)", text):
    firstname = name[0].split()[0]
    print firstname

+4

python regex

cowboyvspirate Nov 16 '15 at 16:17

source share

1 answer

Wiktor Stribiżew · Accepted Answer · 2015-11-16T17:45:28+0000

You need an alternative regex library as you can use \p{L}. - any Unicode letter

Then use

ur'\p{Lu}[\w-]*(?:\s+\p{Lu}[\w-]*)+'

Unicode UNICODE :

ASCII LOCALE, UNICODE , UNICODE Unicode ASCII .

Correspondence of names in the form First Name Last Name with international characters

More articles: