Configure dateutil.parser logic of the century logic

I am working on old text files with 2-digit years, where the default logic in dateutil.parserdoes not work well. For example, the attack on Pearl Harbor was not on dparser.parse("12/7/41")(which returns 2041-12-7).

The eternal "threshold" in buti-ve, to return in 1900, seems to occur at 66:

import dateutil.parser as dparser
print(dparser.parse("12/31/65")) # goes forward to 2065-12-31 00:00:00
print(dparser.parse("1/1/66")) # goes back to 1966-01-01 00:00:00

For my purposes, I would like to set this “threshold” to 17 so that:

  • "12/31/16"parses until 2016-12-31 ( yyyy-mm-dd)
  • "1/1/17" analyzes until 1917-01-01

But I would like to continue using this module, as its fuzzy match seems to work well.

The documentation does not identify the parameter for this ... is there an argument that I am missing?

+4
3

, , dateutil.parser. - parserinfo, , , - convertyear. - . , , - . 1966 . 1967 .:)

, -. - , , :

from dateutil.parser import parse, parserinfo

class MyParserInfo(parserinfo):
    def convertyear(self, year, *args, **kwargs):
        if year < 100:
            year += 1900
        return year

parse('1/21/47', MyParserInfo())
# datetime.datetime(1947, 1, 21, 0, 0)
+4

, , , - 2016:

import dateutil.parser as dparser

THRESHOLD = 2016

date_strings = ["12/31/65", "1/1/66", "12/31/16", "1/1/17"]
for date_string in date_strings:
    dt = dparser.parse(date_string)
    if dt.year > THRESHOLD:
        dt = dt.replace(year=dt.year - 100)
    print(dt)

1965-12-31 00:00:00
1966-01-01 00:00:00
2016-12-31 00:00:00
1917-01-01 00:00:00
+1

parserinfo.convertyear, , parserinfo _century _year *):

from dateutil.parser import parse, parserinfo
info = parserinfo()
info._century = 1900
info._year  = 1965
parse('12/31/65', parserinfo=info)
=> 1965-12-31 00:00:00

_century , , , .. 65 + 1900 = 1965.

_year + - 50. 50 _years,

  • < _year
  • >= _year .

:

1900          1916          1965          2015
+--- (...) ---+--- (...) ---+--- (...) ---+
^             ^             ^             ^
_century      _year - 49    _year         _year + 50

parsed years:
              16,17,...             99,00,...15

In other words, years are 00, 01, ..., 99displayed in a time range _year - 49.. _year + 50with _year, set in the middle of this 100-year period. Using these two parameters, you can specify any cut off to you.

*) Please note that these two variables are undocumented, but are used in the default implementation for parserinfo.convertyearthe newest stable version at the time of writing, 2.5.3. IMHO the default implementation is pretty smart.

+1
source

Source: https://habr.com/ru/post/1649079/


All Articles