What does "(? U)" in a regular expression do?

Question

What does "(? U)" in a regular expression do?

I looked at how tokenization is implemented in scikit-learn and found this regex ( source ):

token_pattern = r"(?u)\b\w\w+\b"

The regex is pretty simple, but I've never seen the (?u) part before. Can someone explain to me what this part does?

+5

python regex

fwind Jan 27 '16 at 16:40

source share

1 answer

Martijn pieters · Accepted Answer · 2016-01-27T16:41:18+0000

It includes the re.U ( re.UNICODE ) flag for this expression.

From the documentation:

(?iLmsux)
(one or more letters from the set of 'i' , 'L' , 'm' , 's' , 'u' , 'x' .) The group corresponds to an empty string; letters set the corresponding flags: re.I (ignore case), re.L (depends on the locale), re.M (multi-line), re.S (the dot matches all), re.U (depends on Unicode) and re.X (verbose), for the entire regular expression. (The flags are described in the section entitled “Module Content”.) This is useful if you want to include flags as part of a regular expression, instead of passing the flag argument to the re.compile() function.

What does "(? U)" in a regular expression do?

More articles: