Make sure the string contains only ASCII characters?

Question

Make sure the string contains only ASCII characters?

How to verify that a string contains only ASCII characters in Python? Something like Ruby ascii_only?

I want to find out if string data is read from a file in ascii

+5

python python-2.7

Java Mar 09 '16 at 10:52

source share

4 answers

You can also choose a regular expression to check only ascii characters. [\x00-\x7F] can match one ascii character:

 >>> OnlyAscii = lambda s: re.match('^[\x00-\x7F]+$', s) != None >>> OnlyAscii('string') True >>> OnlyAscii('Tannh‰user') False

+2

Quinn Mar 09 '16 at 15:30

source share

If you have Unicode strings, you can use the "encode" function and then catch the exception:

 try: mynewstring = mystring.encode('ascii') except UnicodeEncodeError: print("there are non-ascii characters in there")

If you have bytes, you can import the chardet module and check the encoding:

 import chardet # Get the encoding enc = chardet.detect(mystring)['encoding']

+1

rotten Mar 09 '16 at 11:35

source share

The workaround to your problem is to try to encode the string in a specific encoding.

For instance:

 'H€llø'.encode('utf-8')

This will cause the following error:

 Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)

Now you can catch "UnicodeDecodeError" to determine that the string does not contain only ASCII characters.

 try: 'H€llø'.encode('utf-8') except UnicodeDecodeError: print 'This string contains more than just the ASCII characters.'

0

Girish jadhav Mar 09 '16 at 11:45

source share

warvariuc · Accepted Answer · 2016-03-09T11:28:01+0000

 >>> all(ord(char) < 128 for char in 'string') >>> True >>> all(ord(char) < 128 for char in '') >>> False

Another version:

 >>> def is_ascii(text): if isinstance(text, unicode): try: text.encode('ascii') except UnicodeEncodeError: return False else: try: text.decode('ascii') except UnicodeDecodeError: return False return True ... >>> is_ascii('text') >>> True >>> is_ascii(u'text') >>> True >>> is_ascii(u'text-') >>> False >>> is_ascii('text-') >>> False >>> is_ascii(u'text-'.encode('utf-8')) >>> False

Make sure the string contains only ASCII characters?

More articles: