How to implement \ p {L} in python regex

Question

How to implement \ p {L} in python regex

I tried to match the entire string containing one word in any language. My search led me to \ p {...}, which was missing from the python Re module. But I found https://pypi.python.org/pypi/regex . It should work with the commands \ p {...}. Although this is not so.

I tried to parse these lines:

7652167371 apéritif 78687 attaché 78687 époque 78678 kunngjøre 78678 ærbødig 7687 vår 12312 dfsdf 23123 322432 1321 23123 2312  32211

WITH

 def Pattern_compile(pattern_array): regexes = [regex.compile(p) for p in pattern_array] return regexes def main(): for line in sys.stdin: for regexp in Pattern_compile(p_a): if regexp.search (line): print line.strip('\n') if __name__ == '__main__': p_a = ['^\d+\t(\p{L}|\p{M})+$', ] main()

The result is only a Latin word:

 12312 dfsdf

+4

python python-2.7 regex

antonavy Jul 11 '13 at 14:23

source share

1 answer

falsetru · Accepted Answer · 2013-07-11T14:41:40+0000

You must pass unicode. (Both regex and string)

 import sys import regex def main(patterns): patterns = [regex.compile(p) for p in patterns] for line in sys.stdin: line = line.decode('utf8') for regexp in patterns: if regexp.search (line): print line.strip('\n') if __name__ == '__main__': main([ur'^\d+\t(\p{L}|\p{M})+$', ])

How to implement \ p {L} in python regex

More articles: