Function for creating regular expressions corresponding to a range of numbers

I work with the Amazon Mechanical Turk API and this will allow me to use regular expressions to filter the data field.

I would like to introduce an integer range for a function, for example 256-311 or 45-1233, and return a regex that matches only that range.

Matching regular expressions 256-321 will be:

\b((25[6-9])|(2[6-9][0-9])|(3[0-1][0-9])|(32[0-1]))\b 

This part is pretty simple, but I am having problems with a loop to create this regex.

I am trying to build a function defined as follows:

 function getRangeRegex( int fromInt, int toInt) { return regexString; } 

I looked on the net and I am surprised that it does not look like someone has solved this in the past. This is a difficult problem ...

Thank you for your time.

+6
source share
6 answers

Here is a quick hack:

 <?php function regex_range($from, $to) { if($from < 0 || $to < 0) { throw new Exception("Negative values not supported"); } if($from > $to) { throw new Exception("Invalid range $from..$to, from > to"); } $ranges = array($from); $increment = 1; $next = $from; $higher = true; while(true) { $next += $increment; if($next + $increment > $to) { if($next <= $to) { $ranges[] = $next; } $increment /= 10; $higher = false; } else if($next % ($increment*10) === 0) { $ranges[] = $next; $increment = $higher ? $increment*10 : $increment/10; } if(!$higher && $increment < 10) { break; } } $ranges[] = $to + 1; $regex = '/^(?:'; for($i = 0; $i < sizeof($ranges) - 1; $i++) { $str_from = (string)($ranges[$i]); $str_to = (string)($ranges[$i + 1] - 1); for($j = 0; $j < strlen($str_from); $j++) { if($str_from[$j] == $str_to[$j]) { $regex .= $str_from[$j]; } else { $regex .= "[" . $str_from[$j] . "-" . $str_to[$j] . "]"; } } $regex .= "|"; } return substr($regex, 0, strlen($regex)-1) . ')$/'; } function test($from, $to) { try { printf("%-10s %s\n", $from . '-' . $to, regex_range($from, $to)); } catch (Exception $e) { echo $e->getMessage() . "\n"; } } test(2, 8); test(5, 35); test(5, 100); test(12, 1234); test(123, 123); test(256, 321); test(256, 257); test(180, 195); test(2,1); test(-2,4); ?> 

which produces:

 2-8 /^(?:[2-7]|8)$/ 5-35 /^(?:[5-9]|[1-2][0-9]|3[0-5])$/ 5-100 /^(?:[5-9]|[1-9][0-9]|100)$/ 12-1234 /^(?:1[2-9]|[2-9][0-9]|[1-9][0-9][0-9]|1[0-2][0-3][0-4])$/ 123-123 /^(?:123)$/ 256-321 /^(?:25[6-9]|2[6-9][0-9]|3[0-2][0-1])$/ 256-257 /^(?:256|257)$/ 180-195 /^(?:18[0-9]|19[0-5])$/ Invalid range 2..1, from > to Negative values not supported 

Do not test correctly, use at your own risk!

And yes, the generated regular expression can be written more compact in many cases, but I leave this as an exercise for the reader :)

+14
source

For those who, like me, were looking for a version of the excellent @Bart Kiers over the javascript version above

 //Credit: Bart Kiers 2011 function regex_range(from, to){ if(from < 0 || to < 0) { //throw new Exception("Negative values not supported"); return null; } if(from > to) { //throw new Exception("Invalid range from..to, from > to"); return null; } var ranges = []; ranges.push(from); var increment = 1; var next = from; var higher = true; while(true){ next += increment; if(next + increment > to) { if(next <= to) { ranges.push(next); } increment /= 10; higher = false; }else{ if(next % (increment*10) == 0) { ranges.push(next); increment = higher ? increment*10 : increment/10; } } if(!higher && increment < 10) { break; } } ranges.push(to + 1); var regex = '/^(?:'; for(var i = 0; i < ranges.length - 1; i++) { var str_from = ranges[i]; str_from = str_from.toString(); var str_to = ranges[i + 1] - 1; str_to = str_to.toString(); for(var j = 0; j < str_from.length; j++) { if(str_from[j] == str_to[j]) { regex += str_from[j]; } else { regex += "[" + str_from[j] + "-" + str_to[j] + "]"; } } regex += "|"; } return regex.substr(0, regex.length - 1 ) + ')$/'; } 
+4
source

Is there a reason this should be a regex? can't do something like this:

 if ($number >= 256 && $number <= 321){ // do something } 

Update:

There is a simple but ugly way to do this with range :

 function getRangeRegex($from, $to) { $range = implode('|', range($from, $to)); // returns: 256|257|...|321 return $range; } 
+3
source

This has already been done.

Check out this site. It contains a link to a python script that automatically generates this regular expression for you.

+1
source

This answer is duplicated from this question . I also made this blog post.


Using Regular Expressions to Validate a Number Range

To be clear: if a simple if statement is enough

 if(num < -2055 || num > 2055) { throw new IllegalArgumentException("num (" + num + ") must be between -2055 and 2055"); } 

Using regular expressions to test number ranges is not recommended.

In addition, since regular expressions analyze strings, numbers must first be translated into a string before testing them (the exception is when the number is already a string, for example, when the user enters data from the console).

(To ensure that the string starts with a number, you can use org.apache.commons.lang3.math. NumberUtils # isNumber (s) )

Despite this, figuring out how to check ranges of numbers with regular expressions is interesting and instructive.

One range of numbers

Rule: The number must be exactly 15 .

The easiest range. The regular expression matches this value.

 \b15\b 

Word borders are needed to avoid matching 15 inside 8215242 .

Range of two numbers

Rule: The number must be between 15 and 16 . Three possible regular expressions:

 \b(15|16)\b \b1(5|6)\b \b1[5-6]\b 

Range of numbers "mirror" around zero

Rule: The number must be between -12 and 12 .

Here is the regex for 0 through 12 , only positive:

 \b(\d|1[0-2])\b 

Free Interval:

 \b( //The beginning of a word (or number), followed by either \d // Any digit 0 through 9 | //Or 1[0-2] // A 1 followed by any digit between 0 and 2. )\b //The end of a word 

Doing this work for both negative and positive is as simple as adding an optional dash at the beginning:

 -?\b(\d|1[0-2])\b 

(It is assumed that there must be no invalid characters before the dash.)

To ban negative numbers, a negative lookbehind is needed:

 (?<!-)\b(\d|1[0-2])\b 

Exiting the appearance will cause 11 to -11 to match. (The first example in this post should contain this.)

Note: \d compared to [0-9]

To be compatible with all regular expression flavors, all \d -s should be changed to [0-9] . For example, .NET considers non-ASCII numbers, for example, in different languages, as legal values ​​for \d . Except in the last example, for brevity, it remains as \d .

(Thanks to TimPietzcker at /fooobar.com / ... )

Three digits, all but the first digit, are zero

Rule: Must be between 0 and 400 .

Possible regex:

 (?<!-)\b([1-3]?\d{1,2}|400)\b 

Free Interval:

  (?<!-) //Something not preceded by a dash \b( //Word-start, followed by either [1-3]? // No digit, or the digit 1, 2, or 3 \d{1,2} // Followed by one or two digits (between 0 and 9) | //Or 400 // The number 400 )\b //Word-end 

Another opportunity that should never be used :

 \b(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37|38|39|40|41|42|43|44|45|46|47|48|49|50|51|52|53|54|55|56|57|58|59|60|61|62|63|64|65|66|67|68|69|70|71|72|73|74|75|76|77|78|79|80|81|82|83|84|85|86|87|88|89|90|91|92|93|94|95|96|97|98|99|100|101|102|103|104|105|106|107|108|109|110|111|112|113|114|115|116|117|118|119|120|121|122|123|124|125|126|127|128|129|130|131|132|133|134|135|136|137|138|139|140|141|142|143|144|145|146|147|148|149|150|151|152|153|154|155|156|157|158|159|160|161|162|163|164|165|166|167|168|169|170|171|172|173|174|175|176|177|178|179|180|181|182|183|184|185|186|187|188|189|190|191|192|193|194|195|196|197|198|199|200|201|202|203|204|205|206|207|208|209|210|211|212|213|214|215|216|217|218|219|220|221|222|223|224|225|226|227|228|229|230|231|232|233|234|235|236|237|238|239|240|241|242|243|244|245|246|247|248|249|250|251|252|253|254|255|256|257|258|259|260|261|262|263|264|265|266|267|268|269|270|271|272|273|274|275|276|277|278|279|280|281|282|283|284|285|286|287|288|289|290|291|292|293|294|295|296|297|298|299|300|301|302|303|304|305|306|307|308|309|310|311|312|313|314|315|316|317|318|319|320|321|322|323|324|325|326|327|328|329|330|331|332|333|334|335|336|337|338|339|340|341|342|343|344|345|346|347|348|349|350|351|352|353|354|355|356|357|358|359|360|361|362|363|364|365|366|367|368|369|370|371|372|373|374|375|376|377|378|379|380|381|382|383|384|385|386|387|388|389|390|391|392|393|394|395|396|397|398|399|400)\b 

Final example: four digits, mirrored around zero, that don't end with zeros.

Rule: Must be between -2055 and 2055

This is from a question in stackoverflow.

Regex:

 -?\b(20(5[0-5]|[0-4][0-9])|1?[0-9]{1,3})\b 

Free Interval:

  -? //Optional dash \b( //Followed by word boundary, followed by either of the following 20( // "20", followed by either 5[0-5] // A "5" followed by a digit 0-5 | // or [0-4][0-9] // A digit 0-4, followed by any digit ) | //OR 1?[0-9]{1,3} // An optional "1", followed by one through three digits (0-9) )\b //Followed by a word boundary. 

Here is a visual representation of this regular expression:

KJvkvrmMcgLKgf-v.png

And here you can try it yourself: Debuggex demo

(Thanks to PlasmaPower on stackoverflow for help with debugging.)

Final note

Depending on what you are capturing , it is likely that all subgroups should be made into groups without capturing. For example, this:

 (-?\b(?:20(?:5[0-5]|[0-4][0-9])|1?[0-9]{1,3})\b) 

Instead of this:

 -?\b(20(5[0-5]|[0-4][0-9])|1?[0-9]{1,3})\b 

Java implementation example

  import java.util.Scanner; import java.util.regex.Matcher; import java.util.regex.Pattern; import org.apache.commons.lang.math.NumberUtils; /** <P>Confirm a user-input number is a valid number by reading a string an testing it is numeric before converting it to an it--this loops until a valid number is provided.</P> <P>{@code java UserInputNumInRangeWRegex}</P> **/ public class UserInputNumInRangeWRegex { public static final void main(String[] ignored) { int num = -1; boolean isNum = false; int iRangeMax = 2055; //"": Dummy string, to reuse matcher Matcher mtchrNumNegThrPos = Pattern.compile("-?\\b(20(5[0-5]|[0-4][0-9])|1?[0-9]{1,3})\\b").matcher(""); do { System.out.print("Enter a number between -" + iRangeMax + " and " + iRangeMax + ": "); String strInput = (new Scanner(System.in)).next(); if(!NumberUtils.isNumber(strInput)) { System.out.println("Not a number. Try again."); } else if(!mtchrNumNegThrPos.reset(strInput).matches()) { System.out.println("Not in range. Try again."); } else { //Safe to convert num = Integer.parseInt(strInput); isNum = true; } } while(!isNum); System.out.println("Number: " + num); } } 

Exit

 [C:\java_code\]java UserInputNumInRangeWRegex Enter a number between -2055 and 2055: tuhet Not a number. Try again. Enter a number between -2055 and 2055: 283837483 Not in range. Try again. Enter a number between -2055 and 2055: -200000 Not in range. Try again. Enter a number between -2055 and 2055: -300 Number: -300 
0
source

I converted Bart Kearce's answer to C ++. The function takes two integers as input and generates a regular expression for a range of numbers.

 #include <stdio.h> #include <iostream> #include <vector> #include <string> std::string regex_range(int from, int to); int main(int argc, char **argv) { std::string regex = regex_range(1,100); std::cout << regex << std::endl; return 0; } std::string regex_range(int from, int to) //Credit: Bart Kiers 2011 { if(from < 0 || to < 0) { std::cout << "Negative values not supported. Exiting." << std::endl; return 0; } if(from > to) { std::cout << "Invalid range, from > to. Exiting." << std::endl; return 0; } std::vector<int> ranges; ranges.push_back(from); int increment = 1; int next = from; bool higher = true; while(true) { next += increment; if(next + increment > to) { if(next <= to) { ranges.push_back(next); } increment /= 10; higher = false; } else if(next % (increment*10) == 0) { ranges.push_back(next); increment = higher ? increment*10 : increment/10; } if(!higher && (increment < 10)) { break; } } ranges.push_back(to + 1); std::string regex("^(?:"); for(int i = 0; i < ranges.size() - 1; i++) { int current_from = ranges.at(i); std::string str_from = std::to_string(current_from); int current_to = ranges.at(i + 1) - 1; std::string str_to = std::to_string(current_to); for(int j = 0; j < str_from.length(); j++) { if(str_from.at(j) == str_to.at(j)) { std::string str_from_at_j(&str_from.at(j)); regex.append(str_from_at_j); } else { std::string str_from_at_j(&str_from.at(j)); std::string str_to_at_j(&str_to.at(j)); regex.append("["); regex.append(str_from_at_j); regex.append("-"); regex.append(str_to_at_j); regex.append("]"); } } regex.append("|"); } regex = regex.substr(0, regex.length() - 1); regex.append(")$"); return regex; } 
0
source

Source: https://habr.com/ru/post/892847/


All Articles