Python dateutil parser, ignore non-fixed part of string

I use dateutil to analyze images and sort them by date. Since not all of my photos have metadata, dateutil tries to guess where to put them.

Most of my photos in this format: 2007-09-10_0001.jpg 2007-09-10_0002.jpg, etc ...

fileName = os.path.splitext(file)[0] print("Guesssing date from ", fileName) try: dateString = dateParser.parse(file, fuzzy=True) print("Guessed date", dateString) year=dateString.year month = dateString.month day=dateString.day except ValueError: print("Unable to determine date of ", file) 

The return I receive is the following:

 ('Guesssing date from ', '2007-09-10_00005') ('Unable to determine date of ', '2007-09-10_00005.jpg') 

Now I can remove everything due to underlining, but I would like to get a more reliable solution, if possible, if I have photos in a different format. I, although fuzzy, tried to find any date in the string and match this, but apparently does not work ...

Is there an easy way to get the analyzer to find something like a date and stop after that? If not, what is the easiest way to make the parser ignore everything after underscore? Or a way to define multiple date formats with ignore sections.

Thanks!

+6
source share
2 answers

You can try to β€œreduce” the string until you can decode it:

 from dateutil import parser def reduce_string(string): i = len(string) - 1 while string[i] >= '0' and string[i] < '9': i -= 1 while string[i] < '0' or string[i] > '9': i -= 1 return string[:i + 1] def find_date(string): while string: try: dateString = parser.parse(string, fuzzy=True) year = dateString.year month = dateString.month day = dateString.day return (year, month, day) except ValueError: pass string = reduce_string(string) return None date = find_date('2007-09-10_00005') if date: print date else: print "can't decode" 

The idea is to remove the end of the line (any numbers, not any numbers) until the parser can decode it before a valid date.

+4
source

Commenting on the future here is like some understanding of this problem.

While dateutil fuzzy search is pretty good at typing dates in normal natural language, it fails in strings like the above with tons of numeric / character noise. However, with later versions of dateutil at startup:

 >>> from dateutil.parser import parse >>> parse('2007-09-10_00005.jpg', fuzzy=True) 

parse fails with TypeError: 'NoneType' object is not iterable , which is not very idiomatic.

Another alternative is simply to search for a known date format using a regular expression. Of course, it depends on the use, but the OP mentioned that its date was always in the format YYYY-MM-DD , which makes it ideal for searching in a regular expression:

 from dateutil.parser import parse import re date_pattern = re.compile('\d{4}-\d{2}-\d{2}') def extract_date(filename): matches = re.match(date_pattern, filename) if matches: return parse(matches.group(0)) else: return None extract_date('2007-09-10_00005.jpg') # datetime.datetime(2007, 9, 10, 0, 0) 
+2
source

Source: https://habr.com/ru/post/946899/


All Articles