How to remove hexadecimal values ​​in python string with regular expressions?

I have an array of cells in matlab

columns = {'MagX', 'MagY', 'MagZ', ...
           'AccelerationX',  'AccelerationX',  'AccelerationX', ...
           'AngularRateX', 'AngularRateX', 'AngularRateX', ...
           'Temperature'}

I use these scripts that use the matlab hdf5write function to save an array in hdf5 format.

Then I read in the hdf5 file in python using pytables. An array of cells is represented as a numeric array of strings. I convert to a list and this is the result:

>>>columns
['MagX\x00\x00\x00\x08\x01\x008\xe6\x7f',
 'MagY\x00\x7f\x00\x00\x00\xee\x0b9\xe6\x7f',
 'MagZ\x00\x00\x00\x00\x001',
 'AccelerationX',
 'AccelerationY',
 'AccelerationZ',
 'AngularRateX',
 'AngularRateY',
 'AngularRateZ',
 'Temperature']

These hexadecimal values ​​fall into strings from somewhere, and I would like to remove them. They do not always appear in the first three paragraphs of the list, and I need a good way to deal with them or find out why they are there in the first place.

>>>print columns[0]
Mag8 
>>>columns[0]
'MagX\x00\x00\x00\x08\x01\x008\xe6\x7f'
>>>repr(columns[0])
"'MagX\\x00\\x00\\x00\\x08\\x01\\x008\\xe6\\x7f'"
>>>print repr(columns[0])
'MagX\x00\x00\x00\x08\x01\x008\xe6\x7f'

I tried using regex to remove hexadecimal values, but no luck.

>>>re.sub('(\w*)\\\\x.*', '\1', columns[0])
'MagX\x00\x00\x00\x08\x01\x008\xe6\x7f'
>>>re.sub('(\w*)\\\\x.*', r'\1', columns[0])
'MagX\x00\x00\x00\x08\x01\x008\xe6\x7f'
>>>re.sub(r'(\w*)\\x.*', '\1', columns[0])
'MagX\x00\x00\x00\x08\x01\x008\xe6\x7f'
>>>re.sub('([A-Za-z]*)\x00', r'\1', columns[0])
'MagX\x08\x018\xe6\x7f'
>>>re.sub('(\w*?)', '\1', columns[0])
'\x01M\x01a\x01g\x01X\x01\x00\x01\x00\x01\x00\x01\x08\x01\x01\x01\x00\x018\x01\xe6\x01\x7f\x01'

Any suggestions on how to handle this?

+3
3

, , :

>>> re.sub(r'[^\w]', '', 'MagX\x00\x00\x00\x08\x01\x008\xe6\x7f')
'MagX8'

[^\w] , , . re.sub ​​ , .

, , , , . :

>>> re.sub(r'[^\x20-\x7e]', '', 'MagX\x00\x00\x00\x08\x01\x008\xe6\x7f')
'MagX8'

[^\x20-\x7e] [^ -~], , .

, .*, :

>>> re.sub(r'[^ -~].*', '', 'MagX\x00\x00\x00\x08\x01\x008\xe6\x7f')
'MagX'
+7

: , Python - .

, , , :

>>> import re
>>> re.sub(r'\s', '?', "foo\x00bar")
'foo\x00bar'
>>> print re.sub(r'\s', '?', "foo\x00bar")
foobar

, , :

>>> re.sub(r'[\xa0\s]+', ' ', input_str)
+1

re. . ascii:

good_string = ''.join(c if ord(c) < 129 else '?' for c in bad_string)

0

Source: https://habr.com/ru/post/1796369/


All Articles