Python regex bad range of characters.

I use the following regular expression to match different date patterns. It works great at regex101.com. But when I import into python, I get an "invalid character range" exception.

pattern = ur"((?:\b((?:(january|jan|february|feb|march|mar|april|apr|may|jun|june|july|jul|august|aug|september|set|sep|october|oct|november|nov|december|dec)['\s\.]{0,4}(?:\d{4}|\d{2})|(?:january|jan|february|feb|march|mar|april|apr|may|jun|june|july|jul|august|aug|september|set|sep|october|oct|november|nov|december|dec)|((?:0[1-9]|[1-3][0-9]|[0-9])/(?:0[1-9]|[1-3][0-9])/(?:(19[7-9][0-9])|(20[0-1][0-9])|([7-9][0-9]|[0-1][0-9]))|((?:0[1-9]|1[0-2]|[1-9])\s{0,3}[-/']{1,3}[\s-/']{0,3}(?:(19[7-9][0-9])|(20[0-1][0-9])|([7-9][0-9]|[0-1][0-9])))))(?:(?![\r\n])\s){0,4})[-/–to]{0,2}(?:(?![\r\n])\s){0,4}(((?:january|jan|february|feb|march|mar|april|apr|may|jun|june|july|jul|august|aug|september|set|sep|october|oct|november|nov|december|dec)[-'\s\.]{0,4}(?:(19[7-9][0-9])|(20[0-1][0-9])|([7-9][0-9]|[0-1][0-9])))|((?:0[1-9]|[1-3][0-9]|[1-9])/(?:0[1-9]|[1-3][0-9])/(?:(19[7-9][0-9])|(20[0-1][0-9])|([7-9][0-9]|[0-1][0-9])))|((?:0[1-9]|1[0-2]|[1-9])\s{0,3}[-/']{1,3}[\s-/']{0,3}(?:(19[7-9][0-9])|(20[0-1][0-9])|([7-9][0-9]|[0-1][0-9]))))))" https://regex101.com/r/rU3cE9/1 
+10
source share
1 answer

The problem is mainly due to the hyphen present inside the character class [\s-/'] , so Python interprets it as a character spacing (as in [az] ). I suggest you put the hyphen in the first or last position inside the character class [-\s/'] or escape it to avoid ambiguity.

 >>> reg = re.compile(ur"((?:\b((?:(january|jan|february|feb|march|mar|april|apr|may|jun|june|july|jul|august|aug|september|set|sep|october|oct|november|nov|december|dec)['\s\.]{0,4}(?:\d{4}|\d{2})|(?:january|jan|february|feb|march|mar|april|apr|may|jun|june|july|jul|august|aug|september|set|sep|october|oct|november|nov|december|dec)|((?:0[1-9]|[1-3][0-9]|[0-9])/(?:0[1-9]|[1-3][0-9])/(?:(19[7-9][0-9])|(20[0-1][0-9])|([7-9][0-9]|[0-1][0-9]))|((?:0[1-9]|1[0-2]|[1-9])\s{0,3}[-/']{1,3}[-\s/']{0,3}(?:(19[7-9][0-9])|(20[0-1][0-9])|([7-9][0-9]|[0-1][0-9])))))(?:(?![\r\n])\s){0,4})[-/to–]{0,2}(?:(?![\r\n])\s){0,4}(((?:january|jan|february|feb|march|mar|april|apr|may|jun|june|july|jul|august|aug|september|set|sep|october|oct|november|nov|december|dec)[-'\s\.]{0,4}(?:(19[7-9][0-9])|(20[0-1][0-9])|([7-9][0-9]|[0-1][0-9])))|((?:0[1-9]|[1-3][0-9]|[1-9])/(?:0[1-9]|[1-3][0-9])/(?:(19[7-9][0-9])|(20[0-1][0-9])|([7-9][0-9]|[0-1][0-9])))|((?:0[1-9]|1[0-2]|[1-9])\s{0,3}[-/']{1,3}[-\s/']{0,3}(?:(19[7-9][0-9])|(20[0-1][0-9])|([7-9][0-9]|[0-1][0-9]))))))") >>> 
+19
source

Source: https://habr.com/ru/post/982548/


All Articles