You can use the regexp mechanism to efficiently match your keys to the input line and replace them.
First connect all of your keys using the interleave operator, for example:
var keys = "keyA|keyB|keyC";
Then compile the template:
Pattern pattern = Pattern.compile("(" + keys + ")")
Create a match for the input text:
Matcher matcher= pattern.matcher(text);
Now apply your regular expression in a loop to find all the keys in the text and use appendReplacement (which is the "built-in" method of replacing strings) to replace all with their corresponding value:
StringBuffer sb = new StringBuffer(); while (matcher.find()) { matcher.appendReplacement(sb,dictionary.get(matcher.group(0))); } matcher.appendTail(sb);
And here you go.
Please note that at first this may seem a bit naive, but the regexp mechanism is highly optimized for the task at hand, and since the Java regexp implementation also allows for “built-in” replacements, everything works very well.
I did a little test by applying a list of color names (~ 200 different color names), as defined in / usr / share / X 11 / rgb.txt against Crime and Punishment by Fedor Dostoevsky, I downloaded from Project Gutenberg (~ 1 MB in size ) using the technique described and it worked around
x12 times faster than StringUtils.replaceEach - 900ms vs 10700 ms
for the latter (not counting the compilation time of the template).
PS if your keys may contain characters unsafe for regexp, for example. ^ $ (), you must use Pattern.quote () before adding them to your template.
Sidenote:
This method will replace the keys in the order they appear in the list of templates, for example. "a => 1 | b => 2 | aa => 3" when applied to "welcome to the bazaar" will result in "welcome to b1z11r" rather than "welcome to b1z3r" if you want to get the most long match, you must sort your keys lexicographically before adding them to the template (ie "b | aa | a"). This also applies to your original StringUtils.replaceEach () method.
Update:
The method above should work well for the problem as stated in the original question, i.e. when the size of the card replacement is (relatively) small compared to the size of the input data.
If instead you have a very long dictionary applied to short text, the linear search performed by StringUtils.replaceEach () may be faster than it.
I made an additional landmark illustrating this, using a dictionary of 10,000 randomly selected words (+4 characters):
cat /usr/share/dict/words | grep -E "^.{4,}$" | shuf | head -10000
against: 1024,2048,4096,8192,16384,32768,65536,131072,262144 and 524288 characters in long extracts from the same text “Crime and Punishment”.
The results are shown below:
text Ta(ms) Tb(ms) Ta/Tb(speed up) --------------------------------------- 1024 99 240 0.4125 2048 43 294 0.1462585 4096 113 721 0.1567267 8192 128 1329 0.0963130 16384 320 2230 0.1434977 32768 2052 3708 0.5533981 65536 6811 6650 1.0242106 131072 32422 12663 2.5603728 262144 150655 23011 6.5470862 524288 614634 29874 20.574211
- Ta - StringUtils.replaceEach () time
- Tb - matcher.appendReplacement () time
Please note that the string length of the template is 135537 bytes (all 10000 keys are combined)