Using Python and Regex to Extract Different Date Formats

I have the following code for date matching

import re date_reg_exp2 = re.compile(r'\d{2}([-/.])(\d{2}|[a-zA-Z]{3})\1(\d{4}|\d{2})|\w{3}\s\d{2}[,.]\s\d{4}') matches_list = date_reg_exp2.findall("23-SEP-2015 and 23-09-2015 and 23-09-15 and Sep 23, 2015") print matches_list 

Expected Result

 ["23-SEP-2015","23-09-2015","23-09-15","Sep 23, 2015"] 

I get:

 [('-', 'SEP', '2015'), ('-', '09', '2015'), ('-', '09', '15'), ('', '', '')] 

Please check the regex link here .

+5
source share
3 answers

The problem is that re.findall returns the captured texts, with the exception of group 0 (a complete match). Since you need the whole match (group 0), you just need to use re.finditer and get the value of group() :

 matches_list = [x.group() for x in date_reg_exp2.finditer("23-SEP-2015 and 23-09-2015 and 23-09-15 and Sep 23, 2015")] 

Watch the IDEONE demo

re.findall(pattern, string, flags=0)
Return all matching pattern matches in a string, as a list of strings ... If one or more groups are present in the pattern, return the list of groups; this will be a list of tuples if the template has more than one group.

re.finditer(pattern, string, flags=0)
Bring back the iterator , causing MatchObject instances over all non-overlapping matches for the RE pattern in the row.

+2
source

You can try this regex

 date_reg_exp2 = re.compile(r'(\d{2}(/|-|\.)\w{3}(/|-|\.)\d{4})|([a-zA-Z]{3}\s\d{2}(,|-|\.|,)?\s\d{4})|(\d{2}(/|-|\.)\d{2}(/|-|\.)\d+)') 

Then use re.finditer()

 for m in re.finditer(date_reg_exp2,"23-SEP-2015 and 23-09-2015 and 23-09-15 and Sep 23, 2015"): print m.group() 

The output will be

23-SEP-2015
09/23/2015
09/23/15
September 23, 2015

+2
source

try it

 # The first (\d{2}-([AZ]{3}|\d{2})-(\d{4}|\d{2})) group tries to match the first three types of dates # rest will match the last type dates = "23-SEP-2015 and 23-09-2015 and 23-09-15 and Sep 23, 2015" for x in re.finditer('((\d{2}-([AZ]{3}|\d{2})-(\d{4}|\d{2}))|([a-zA-Z]{3}\s\d{1,2},\s\d{4}))', dates): print x.group(1) 
0
source

Source: https://habr.com/ru/post/1237916/


All Articles