Regular expression for a pattern of 45 hexadecimal numbers OR 48 hexadecimal numbers - Python

Question

Regular expression for a pattern of 45 hexadecimal numbers OR 48 hexadecimal numbers - Python

My file contains either 45 hexadecimal numbers, separated by spaces, or 48 hexadecimal numbers, separated by spaces. I need ALL of these numbers individually, and not as a whole. I am currently using brute force to get 45 numbers.

pattern = re.compile("([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s")

However, even so, I still cannot figure out how to extract the remaining three numbers in an instance with a hexadecimal number. Could you help me simplify this problem?

I would avoid solutions like the ones below (I haven’t tried if it works), since then I will have to break the line for each instance, that is, given that it gives the correct output!

 (((?:[0-9a-f]{2})\s){48})|(((?:[0-9a-f]{2})\s){45})

Thanks!

+4

python regex

Proteen 25 sept. '12 at 13:10

source share

6 answers

Would it be easier to just use two patterns? Thus, you do not need complex logic to work with subgroups.

 pattern1 = re.compile("([0-9a-f]{2}\s){45}") pattern2 = re.compile("([0-9a-f]{2}\s){48}")

+5

Tim lamballais 25 sept. '12 at 13:21

source share

I believe that you are probably looking for re.findall

Depending on what the rest of this line looks like .. it worked for me to get a list of strings for each hex

 import re reg = re.compile("[0-9a-f]{2}\s") hexes = "ab 12 ab 12 ab 12 ab 12 ab 12 ab 12 ab 12 ab 12 ab 12 ab 12 ab 12 ab 12 ab 12 ab 12 ab 12 ab 12 ab 12 ab 12 ab 12 ab 12 ab 12 ab 12" hexList = re.findall(reg, hexes)

This gives you a list of all 2 char hexes. From there it is trivial to split it into 45/48, depending on what other data type is in the row that you are capturing.

This will not work if you have a ton of data sitting in a row.

Alternatively, although you said you did not want to do this, it is really very trivial to do something like this:

 reg = re.compile("([0-9a-f]{2}\s){45,48}") #Edit: Missed an open paren match = reg.search(hexes) if match: splitList = match.group().split()

And then you have a list of all the numbers that are nicely separated.

+4

Tadgh 25 sept. '12 at 13:40

source share

I like your wired approach (for your specific needs, that is), but I would generate a template string by multiplying. In my example, groups of 3 and groups of 5 are expected (just for testing):

 pattern = re.compile(r'(?:' + r'\s+'.join([ r'([a-f0-9]{2})' ] * 5) + r')|(?:' + r'\s+'.join([ r'([a-f0-9]{2})' ] * 3) + r')') m1 = pattern.match('ab cd ef') m2 = pattern.match('ab cd ef 34 56')

The result of m.groups() will look like (None, None, None, None, None, 'ab', 'cd', 'ef') for groups of 3 and something like ('ab', 'cd', 'ef', '34', '56', None, None, None) for groups of 5. Thus, you can check if m.groups()[0] None is there to find which version (45 or 48 ), and then use either groups () [: 48] or groups () [48:].

Before the lower number (45), make sure you have more (48).

This template can, of course, be used with findall , search , finditer or similar, if you have a way to find out where one group ends with a hexon, and then the next begins. In this example, the space between the hexadecimal cycles should be a space or a tab, other things (for example, new lines) separate groups of hexons from each other:

 pattern = re.compile(r'(?:' + r'[ \t]+'.join([ r'([a-f0-9]{2})' ] * 5) + # replaced \s by [ \t] r')|(?:' + r'[ \t]+'.join([ r'([a-f0-9]{2})' ] * 3) + r')') print [ i.groups() for i in pattern.finditer( 'ab cd ef 34 56\nab cd ef 34 56\nab cd ef\nab cd ef\n') ]

→

 [ ('ab', 'cd', 'ef', '34', '56', None, None, None), ('ab', 'cd', 'ef', '34', '56', None, None, None), (None, None, None, None, None, 'ab', 'cd', 'ef'), (None, None, None, None, None, 'ab', 'cd', 'ef') ]

+1

Alfe 25 sept. '12 at 14:10

source share

Can re.findall be used?

 >>> import re >>> pat = r'([0-9A-Fa-f]+)' >>> s= '45f 567B 45C67' >>> for i in re.findall(pat, s): print i 45 567B 45C67

Using this method does not matter how many numbers you have in your file.

0

Emmanuel 25 sept. '12 at 13:17

source share

If you know that the file contains hexadecimal data, just read the entire file in a line and then divide it into spaces. This works with 45, 48 or any other numbers.

 import re splitter = re.compile('\s+') data = splitter.split(file(filename,'r').read())

0

sizzzzlerz 25 sept. '12 at 13:31

source share

Fred foo · Accepted Answer · 2012-09-25T13:31:19+0000

When writing long REs, use re.VERBOSE to make them more readable.

 pattern = re.compile(r""" ^( [0-9a-fA-F]{2} (?: \s [0-9a-fA-F]{2} ){44} (?:(?: \s [0-9a-fA-F]{2} ){3} )? )$ """, re.VERBOSE)

Read how: two hexadecimal digits followed by 44 times (space followed by two hexadecimal digits), optionally followed 3 times (space followed by sixth sixth digits).

Test:

 >>> pattern.match(" ".join(["0f"] * 44)) >>> pattern.match(" ".join(["0f"] * 45)) <_sre.SRE_Match object at 0x7fd8f27e0738> >>> pattern.match(" ".join(["0f"] * 46)) >>> pattern.match(" ".join(["0f"] * 47)) >>> pattern.match(" ".join(["0f"] * 48)) <_sre.SRE_Match object at 0x7fd8f27e0990> >>> pattern.match(" ".join(["0f"] * 49))

Then, finally, to get individual numbers, do .group(0).split() as a result of a match. This is much easier than writing RE, which puts all the numbers in separate groups.

EDIT : OK, here's how to solve the original problem. Just create a RE dynamically.

 chunk = r"""([0-9a-fA-F]{2}\s)""" pattern = re.compile(chunk * 45 + "(?:" + chunk * 3 + ")?")

Regular expression for a pattern of 45 hexadecimal numbers OR 48 hexadecimal numbers - Python

More articles: