Get name and year from file name using regex

How to write a regular expression that gets a header and, if available, a year from the file name? See examples below.

This solution works for php, but im having a problem translating it to javascript Enter movie name and year from video file name

The.Great.Gatsby.2013.BluRay.1080p.DTS.x264-CHD.mkv The Forbidden Girl 2013 BRRIP Xvid AC3-BHRG.avi Pain.&.Gain.2013.720p.BluRay.DD5.1.x264-HiDt.mkv Se7en.avi Se7en.(1995).avi How to train your dragon 2.mkv 10,000BC (2010).1080p.avi 
+5
source share
1 answer

The solution below works for all test cases that you have provided (and some additional ones as titlelize, see the code below) and are intended for customization.

In short, try the screenshot below :

 // Live Test var input = document.getElementById('input'); var output = document.getElementById('output'); input.oninput = function() { output.textContent = extractData(input.value); } // Samples var tests = ['The.Great.Gatsby.2013.BluRay.1080p.DTS.x264-CHD.mkv', 'The Forbidden Girl 2013 BRRIP Xvid AC3-BHRG.avi', 'Pain.&.Gain.2013.720p.BluRay.DD5.1.x264-HiDt.mkv', 'Se7en.(1995).avi', 'How to train your dragon 2.mkv', '10,000BC (2010).1080p.avi', 'The.Great.Gatsby.BluRay.1080p.DTS.x264-CHD.mkv', 'Se7en.avi', '2001 A Space Odyssey.BluRay.1080p.DTS.x264-CHD.mkv','Sand.Castle.2017.FRENCH.1080.WEBRip.AAC2.0-NEWCiNE-WwW.Zone-Telechargement.Ws.mkv']; while (t = tests.pop()) { document.getElementById('list').innerHTML += '<b>INPUT</b>: "' + t + '"<br>'; document.getElementById('list').innerHTML += extractData(t,true) + '<hr>'; } function titlelize(title) { return title.replace(/(^|[. ]+)(\S)/g, function(all, pre, c) { return ((pre) ? ' ' : '') + c.toUpperCase(); }); }; function extractData(it, html) { var regex = /^(.+?)[.( \t]*(?:(19\d{2}|20(?:0\d|1[0-9])).*|(?:(?=bluray|\d+p|brrip|webrip)..*)?[.](mkv|avi|mpe?g|mp4)$)/i; var out = '&#8627;&nbsp;'; if ( m = regex.exec(it) ) { title = titlelize(m[1]) || '-'; year = m[2] || '-'; out += '<font color="green"><b>Title</b>: "' + title + '"&emsp; <b>Year</b>: "' + year + '"</font>'; } else { out += '<font color="red">No match</font>'; } //the replace is an hack to remove html in live input text return (html) ? out : out.replace(/<[^>]+>|&[^;]+;/g,''); } 
 <mark><b>Paste and Try!</b></mark> &rArr; <input id="input" type="text" size="70" /> <br>&#8627;&emsp;<span id="output" style="line-height:40px;">No Match</span> <hr> <div id="list"></div> 

Description

Assuming the header is structured something like this:

Name * || [ Year * ] || [ Codec ] Extension
Fields enclosed in square brackets are optional (for example, [ field1 ])
*: saved field

The key must match everything as title until the last valid year (valid years: 1900-2016) or to the extension file> (structured as a dot plus 3 letters, a simple change if necessary).

Exceptions: in the case when the film does not contain a valid year in all sections, starting with (case insensitive) bluray or [0-9]+p (for example, 720p , 1080p ) or brrip is lost from the title section.

Regular Expression Break and emsp; Regex101 demo

 /^ (.+?) # Save title into group $1 [.( \t]* # Remove some separators (?: # Non capturing group (19\d{2}|20(?:0\d|1[0-6])).* # Save years (1900-2016) in $2 | # OR (?:(?=bluray|\d+p|brrip)..*)? # Match string starting with bluray,brrip,720p... [.](mkv|avi|mpe?g)$) # Match extension (.mkv,.avi.,mpeg) add your own /i # make the regex case insensitive 

Regular expression visualization

Regular expression customization

The list of exceptions and extensions can be easily filled with new values ​​during tests if necessary / if necessary (as a file extension, for example, to add .wmv and .flv add them to the section (mkv|avi|mpe?g|wmv|flv) regex) or in order to make a general section, replace it with [.]\w{3,4}$ .

+1
source

Source: https://habr.com/ru/post/1240297/


All Articles