Quick automatic guessing of date strings

Question

Quick automatic guessing of date strings

For a huge number of huge csv files (100M lines +) from different sources I need a quick fragment or library to automatically guess the date format and convert it to split mode or unix timestamp. Once successfully guessed, the fragment should be able to check subsequent occurrences of the date field for validity, since it is likely that the date format changes throughout the file.

The set of date format tests should be variable, but compiling an optimal decision tree or some of the given date formats is fine.

I came to the conclusion that nothing of the kind exists, but still I need to do "market research", therefore, my question.

My first attempt was to imitate getdate () for the 23 different date formats I have seen so far, and replace the number parsers with optimized versions that take into account date characteristics (from "4" to "9" in dozens of parts of the day, from 3 'to 9' in the tens of parts of the month, etc.)

Has anyone encountered a similar problem or even created code of this kind?

+3

c string date format parsing

hroptatyr Jul 19 '10 at 10:47

source share

2 answers

( CSV) Perl script. , script, ( > 10Klines/sec, ~ 60-100chars) ) , ... . b) () ... , , "" , .. . c) ... . ) , , . C Perl , . , 10/04/05, .. DD/MM/YY MM/DD/YY,

+1

Roaker 19 . '10 20:43

hroptatyr · Accepted Answer · 2010-08-13T19:32:08+0000

After two weeks of over looking googl ^ Wweb, I came to the conclusion that I should write this myself. FTW, I'm the first one on it: http://github.com/hroptatyr/glod

Quick automatic guessing of date strings

More articles: