Make sure the string contains only ASCII characters?

How to verify that a string contains only ASCII characters in Python? Something like Ruby ascii_only?

I want to find out if string data is read from a file in ascii

+5
source share
4 answers
 >>> all(ord(char) < 128 for char in 'string') >>> True >>> all(ord(char) < 128 for char in '') >>> False 

Another version:

 >>> def is_ascii(text): if isinstance(text, unicode): try: text.encode('ascii') except UnicodeEncodeError: return False else: try: text.decode('ascii') except UnicodeDecodeError: return False return True ... >>> is_ascii('text') >>> True >>> is_ascii(u'text') >>> True >>> is_ascii(u'text-') >>> False >>> is_ascii('text-') >>> False >>> is_ascii(u'text-'.encode('utf-8')) >>> False 
+7
source

You can also choose a regular expression to check only ascii characters. [\x00-\x7F] can match one ascii character:

 >>> OnlyAscii = lambda s: re.match('^[\x00-\x7F]+$', s) != None >>> OnlyAscii('string') True >>> OnlyAscii('Tannh‰user') False 
+2
source

If you have Unicode strings, you can use the "encode" function and then catch the exception:

 try: mynewstring = mystring.encode('ascii') except UnicodeEncodeError: print("there are non-ascii characters in there") 

If you have bytes, you can import the chardet module and check the encoding:

 import chardet # Get the encoding enc = chardet.detect(mystring)['encoding'] 
+1
source

The workaround to your problem is to try to encode the string in a specific encoding.

For instance:

 'H€llø'.encode('utf-8') 

This will cause the following error:

 Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128) 

Now you can catch "UnicodeDecodeError" to determine that the string does not contain only ASCII characters.

 try: 'H€llø'.encode('utf-8') except UnicodeDecodeError: print 'This string contains more than just the ASCII characters.' 
0
source

Source: https://habr.com/ru/post/1244726/


All Articles