Using dateutil.parser to parse a date in another language

Dateutil is a great tool for parsing dates in string format. eg

from dateutil.parser import parse parse("Tue, 01 Oct 2013 14:26:00 -0300") 

returns

 datetime.datetime(2013, 10, 1, 14, 26, tzinfo=tzoffset(None, -10800)) 

but,

 parse("Ter, 01 Out 2013 14:26:00 -0300") # In portuguese 

gives this error:

 ValueError: unknown string format 

Does anyone know how to make dateutil locale aware?

+7
source share
3 answers

As far as I can see, dateutil doesn't know the locale (yet!).

I can think of three alternative suggestions:

  • The names of the day and month are hardcoded in dateutil.parser (as part of the parserinfo class). You can subclass parserinfo and replace these names with the corresponding names for Portuguese.

  • Change the date to get the names of days and months based on the locale of users. So you can do something like

     import locale locale.setlocale(locale.LC_ALL, "pt_PT") from dateutil.parser import parse parse("Ter, 01 Out 2013 14:26:00 -0300") 

    Ive launched a fork that gets names from the calendar module (which is a language module) to work on this: https://github.com/alexwlchan/dateutil

    Now it works in Portuguese (or it seems), but I want to think about it a bit before I post the patch to the main branch. In particular, strangeness can occur if it encounters characters who are not used in Western European languages. I have not tested this yet. (See fooobar.com/questions/587868 / ... )

  • If you are not tied to the dateutil module, you can use datetime instead, which already supports the locale:

     from datetime import datetime, date import locale locale.setlocale(locale.LC_ALL, "pt_PT") datetime.strptime("Ter, 01 Out 2013 14:26:00 -0300", "%a, %d %b %Y %H:%M:%S %z") 

    (Note that the %z token is not supported sequentially in datetime .)

+2
source

You can use PyICU to parse a localized date / time string into this format :

 #!/usr/bin/env python # -*- coding: utf-8 -*- from datetime import datetime import icu # PyICU df = icu.SimpleDateFormat( 'EEE, dd MMM yyyy HH:mm:ss zzz', icu.Locale('pt_BR')) ts = df.parse(u'Ter, 01 Out 2013 14:26:00 -0300') print(datetime.utcfromtimestamp(ts)) # -> 2013-10-01 17:26:00 (UTC) 

It works on Python 2/3. It does not change the global state (locale).

If your actual input time string does not contain an explicit utc offset, you must specify the time zone that the ICU will use explicitly , otherwise you may get the wrong result (ICU and datetime may use different time zone definitions).

If you only need to support Python 3, and you do not mind setting the language, you can use datetime.strptime() as @alexwlchan :

 #!/usr/bin/env python3 import locale from datetime import datetime locale.setlocale(locale.LC_TIME, "pt_PT.UTF-8") print(datetime.strptime("Ter, 01 Out 2013 14:26:00 -0300", "%a, %d %b %Y %H:%M:%S %z")) # works on Python 3.2+ # -> 2013-10-01 14:26:00-03:00 
+2
source
 from dateutil.parser import parse parse("Ter, 01 Out 2013 14:26:00 -0300",fuzzy=True) 

Result:

 datetime.datetime(2013, 1, 28, 14, 26, tzinfo=tzoffset(None, -10800)) 
0
source

Source: https://habr.com/ru/post/957905/


All Articles