How to get a list of matching characters from the regex class

Question

How to get a list of matching characters from the regex class

Given the regex class / character set, how can I get a list of all the matching characters (in python 3). For instance:.

[\dA-C]

should give

['0','1','2','3','4','5','6','7','8','9','A','B','C']

+4

python string python-3.x regex

eyaler Oct 17 '16 at 19:54

source share

3 answers

Moinuddin Quadri · Answer 1 · 2016-10-17T20:08:55+0000

I think you are looking for string.printableone that returns all printable characters in Python. For instance:

>>> import string
>>> string.printable
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'

Now, to check the content satisfied by your regular expression, you can:

>>> import re
>>> x = string.printable
>>> pattern = r'[\dA-C]'
>>> print(re.findall(pattern, x))
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C']

string.printableis a combination of numbers, letters, punctuation, and spaces. Also check String Constants for a complete list of constants available with string .

unicode, :

import sys
unicode_list = [chr(i) for i in range(sys.maxunicode)]

.. , , , sys.maxunicode:

>>> sys.maxunicode
1114111

Unicode, Unicode Character Ranges .

MooingRawr · Answer 2 · 2016-10-17T19:59:57+0000

import re

x = '123456789ABCDE'
pattern = r'[\dA-C]'
print(re.findall(pattern,x))    
#prints ['1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C']

, ?

x ascii, :

import re
import string

x = string.ascii_uppercase + string.digits
pattern = r'[\dA-C]'
print(re.findall(pattern,x))

, :

 pattern = input() #with either one from above

alexis · Answer 3 · 2016-10-17T21:22:36+0000

, , : , \S, , , [^abc\d], , , (?![aeiou])\w ( , ). .

Unicode , - , - , : , \w, Unicode. [^abc\d], . , Unicode. Unicode, , , , [0000-024F] ( ) [0590-074F] ( ).

You can then intercept each of these unicode code points, checking which ones match your regular expression:

import re

myregexp = r"[\dA-C]"
interest = [ (0x0000, 0x024F),
             (0x0590, 0x06FF) ]


pattern = re.compile(myregexp)
matched = []    
for low, high in interest:
    matched.extend(chr(p) for p in range(low, high+1) if pattern.match(chr(p)))

>>> print("".join(matched))
0123456789ABC٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹

How to get a list of matching characters from the regex class

More articles: