I am trying to extract movie metadata (title and year) from their file name.
The name pattern is not standard, but it is not random, so I try to cover as many cases as possible.
To give you an idea, these are examples of the file name:
samples = ['The Movie Title.avi', 'The Movie Title DVDRIP. Useless.info.avi', 'The Movie Title [2005].avi', 'The Movie Title (2005) [Useless.info].avi', 'The Movie Title 2005 H264 DVDRip Useless-Info.avi', 'The Movie Title 2005 XviD Useless info.avi', 'The Movie Title {2005} DVDRIP. UselessInfo.avi', 'The.Movie.Title.2005.Useless.info.avi', '[Useless.info]_The.Movie.Title.2005.Useless.avi']
Somewhere there UselessInfo , because what is written there can be anything and cannot be used to extract information (changes from file to file). Also note that 'The Movie Title' may be something with numbers or a non-character character, for example: The Movie Title 2 - The Return' for example.
The expected conclusion should be as follows:
metadata = {'title': 'The Movie Title', 'year': '2005'}
I am now using a regex chain but I don’t know what is the best way to do this.
source share