I wrote a small benchmark to measure performance:
- NOP method (to get an idea of the basic iterative speed);
- The original method provided by OP;
- RegExp;
- Compiled Regexp;
- Version provided by @maraca (without toLowerCase and substring);
- The "fastIsHex" version (based on the switch), I added just for fun.
The configuration of the test apparatus is as follows:
- JVM: Java (TM) SE runtime (version 1.8.0_101-b13)
- Processor: Intel (R) Core (TM) i5-2500 CPU @ 3.30 GHz
And here are the results that I got for the original test string "0x123fa" and 10,000,000 iterations:
Method "NOP" => #10000000 iterations in 9ms Method "isHexadecimal (OP)" => #10000000 iterations in 300ms Method "RegExp" => #10000000 iterations in 4270ms Method "RegExp (Compiled)" => #10000000 iterations in 1025ms Method "isHexadecimal (maraca)" => #10000000 iterations in 135ms Method "fastIsHex" => #10000000 iterations in 107ms
as you can see that the original OP method is faster than the RegExp method (at least when using the RegExp implementation provided by the JDK).
(for your reference)
Verification Code:
public static void main(String[] argv) throws Exception { //Number of ITERATIONS final int ITERATIONS = 10000000; //NOP benchmark(ITERATIONS,"NOP",() -> nop(longHexText)); //isHexadecimal benchmark(ITERATIONS,"isHexadecimal (OP)",() -> isHexadecimal(longHexText)); //Un-compiled regexp benchmark(ITERATIONS,"RegExp",() -> longHexText.matches("0x[0-9a-fA-F]+")); //Pre-compiled regexp final Pattern pattern = Pattern.compile("0x[0-9a-fA-F]+"); benchmark(ITERATIONS,"RegExp (Compiled)", () -> { pattern.matcher(longHexText).matches(); }); //isHexadecimal (maraca) benchmark(ITERATIONS,"isHexadecimal (maraca)",() -> isHexadecimalMaraca(longHexText)); //FastIsHex benchmark(ITERATIONS,"fastIsHex",() -> fastIsHex(longHexText)); } public static void benchmark(int iterations,String name,Runnable block) { //Start Time long stime = System.currentTimeMillis(); //Benchmark for(int i = 0; i < iterations; i++) { block.run(); } //Done System.out.println( String.format("Method \"%s\" => #%d iterations in %dms",name,iterations,(System.currentTimeMillis()-stime)) ); }
NOP Method:
public static boolean nop(String value) { return true; }
fastIsHex method:
public static boolean fastIsHex(String value) { //Value must be at least 4 characters long (0x00) if(value.length() < 4) { return false; } //Compute where the data starts int start = ((value.charAt(0) == '-') ? 1 : 0) + 2; //Check prefix if(value.charAt(start-2) != '0' || value.charAt(start-1) != 'x') { return false; } //Verify data for(int i = start; i < value.length(); i++) { switch(value.charAt(i)) { case '0':case '1':case '2':case '3':case '4':case '5':case '6':case '7':case '8':case '9': case 'a':case 'b':case 'c':case 'd':case 'e':case 'f': case 'A':case 'B':case 'C':case 'D':case 'E':case 'F': continue; default: return false; } } return true; }
So, the answer is no, for short lines and tasks at hand RegExp is not faster.
When it comes to longer strings, the balance is completely different, below are the results for the long hexadecimal string 8192 generated with:
hexdump -n 8196 -v -e '/1 "%02X"' /dev/urandom
and 10,000 iterations:
Method "NOP" => #10000 iterations in 2ms Method "isHexadecimal (OP)" => #10000 iterations in 1512ms Method "RegExp" => #10000 iterations in 1303ms Method "RegExp (Compiled)" => #10000 iterations in 1263ms Method "isHexadecimal (maraca)" => #10000 iterations in 553ms Method "fastIsHex" => #10000 iterations in 530ms
As you can see, the handwritten methods (Makara and my fastIsHex) still beat RegExp, but the original method does not, (due to the substring () and toLowerCase ()).
Sidenote:
This test is very simple and only checks the “worst case” scenario (i.e. a fully valid string), real-life results with mixed data lengths and an invalid invalid ratio can be completely different.
Update:
I also tried the char [] array version:
char[] chars = value.toCharArray(); for (idx += 2; idx < chars.length; idx++) { ... }
and it was even a little slower than the getCharAt (i) version:
Method "isHexadecimal (maraca) char[] array version" => #10000000 iterations in 194ms Method "fastIsHex, char[] array version" => #10000000 iterations in 164ms
My assumption is that due to copying the array internally in a chararray.
Update (# 2):
I checked the additional iteration test 8 to /100.000 to see if there is any real speed difference between the "maraca" and "fastIsHex" methods, and also normalized them to use the exact same precondition code:
Launch # 1
Method "isHexadecimal (maraca) *normalized" => #100000 iterations in 5341ms Method "fastIsHex" => #100000 iterations in 5313ms
Launch # 2
Method "isHexadecimal (maraca) *normalized" => #100000 iterations in 5313ms Method "fastIsHex" => #100000 iterations in 5334ms
those. the difference in speed between the two methods is at best minimal and probably due to a measurement error (since I run it on my workstation, and not specifically for a clean test environment).