Date parsing python and find the correct locale_setting value

I have the following date string: '3 févr. 2015 14:26:00 CET '

datetime.datetime.strptime('03 févr. 2015 14:26:00', '%d %b %Y %H:%M:%S') 

An error analysis is not possible:

 ValueError: time data '03 f\xc3\xa9vr. 2015 14:26:00' does not match format '%d %b %Y %H:%M:%S' 

I tried locale.locale_alias over all the locales using locale.locale_alias :

 for l in locale.locale_alias: try: locale.setlocale(locale.LC_TIME, l) print l,datetime.datetime.strptime('03 févr. 2015 14:26:00', '%d %b %Y %H:%M:%S') break except Exception as e: print e 

but I could not find the right one.

+1
source share
2 answers

To parse a localized date / time string using the ICU date and time format :

 #!/usr/bin/env python # -*- coding: utf-8 -*- from datetime import datetime import icu # PyICU import pytz # $ pip install pytz tz = icu.ICUtzinfo.getDefault() # any ICU timezone will do here df = icu.DateFormat.createDateTimeInstance(icu.DateFormat.MEDIUM, icu.DateFormat.MEDIUM, icu.Locale.getFrench()) df.setTimeZone(tz.timezone) ts = df.parse(u'3 févr. 2015 14:26:00 CET') #NOTE: CET is ignored naive_dt = datetime.fromtimestamp(ts, tz).replace(tzinfo=None) dt = pytz.timezone('Europe/Paris').localize(naive_dt, is_dst=None) print(dt) # -> 2015-02-03 14:26:00+01:00 

df.applyPattern() can be used to set another date / time pattern ( df.toPattern() ), or you can use icu.SimpleDateFormat to get df in format and locale directly .

You must use the explicit ICU time zone (so df.parse() and .fromtimestamp() can use the same utc offset) because icu and datetime can use different time zone definitions.

pytz used here to get the correct UTC offset for past / future dates (some time intervals may have different utc values ​​in the past / future, including reasons not related to DST transitions).

+1
source

Your format contains an abbreviation and uses 4 characters:

 '03 févr. 2015 14:26:00' # ^^ 

but if I set the locale to fr_FR and format the same date:

 >>> import locale, datetime >>> locale.setlocale(locale.LC_TIME, ('fr', 'UTF-8')) 'fr_FR.UTF-8' >>> datetime.datetime(2015, 2, 3, 14, 26).strftime('%d %b %Y %H:%M:%S') '03 f\xc3\xa9v 2015 14:26:00' >>> print datetime.datetime(2015, 2, 3, 14, 26).strftime('%d %b %Y %H:%M:%S') 03 fév 2015 14:26:00 

You will notice that only 3 characters are used and no dot is included. Parsing dates only supports the same abbreviations of 3 characters:

 >>> datetime.datetime.strptime('03 fév 2015 14:26:00', '%d %b %Y %H:%M:%S') datetime.datetime(2015, 2, 3, 14, 26) 

You could try the parsedatetime library , while others have had successful parsing of French dates with this tool.

0
source

Source: https://habr.com/ru/post/957907/


All Articles