Write an efficient string replacement function?

For training purposes, I tried to come up with a string replacement function, here is what I have:

let string_replace where what replacement = let wlength = (String.length what - 1) in let length = (String.length where - 1) in let rec loop i shift = let [shift, i] = if wlength == shift then (String.blit replacement 0 where (i - wlength) (wlength + 1); [0, i]) else if (String.get where i == String.get what shift) then [shift + 1, i] else [0, i - shift] in if i < length then loop (i + 1) shift in loop 0 0; where;; 

Now there is a test here: http://raid6.com.au/~onlyjob/posts/arena/ , and therefore I wanted to see how I am doing relatively. And so, I wrote a test:

 let test = let initial = "abcdefgh" ^ "efghefgh" in let initial_time = Sys.time() in let initial_length = String.length initial in let imax = (1024 / (String.length initial)) * 1024 * 4 in let rec loop si = if i < imax then let ns = string_replace (s ^ initial) "efgh" "____" in let sl = initial_length * i in if (sl mod 1024) * 256 == 0 then Printf.printf "%fs | %d Kb |\n" (Sys.time() -. initial_time) (sl / 1024); loop ns (i + 1) in loop initial 0;; test;; 

I tried to run the JavaScript test as close as possible to be able to compare the results (I did some formatting here, as well as to save you from searching):

 function test() { var str = "abcdefgh" + "efghefgh", imax = 1024 / str.length * 1024 * 4, time = new Date(), gstr = "", i = 0, lngth; console.log("exec.tm.sec\tstr.length"); while (i++ < imax + 1000) { gstr += str; gstr = gstr.replace(/efgh/g, "____"); lngth = str.length * i; if (lngth % 1024 * 256 === 0) { var curdate = new Date(); console.log((curdate.getTime() - time.getTime()) / 1000 + "sec\t\t" + lngth / 1024 + "kb"); } } } 

Here are the results I get ...

 | JavaScript | OCaml | String size | |------------+--------------+-------------| | 0.78 s | 14.576784 s | 256 Kb | | 3.266 s | 58.468111 s | 512 Kb | | 7.618 s | 132.954788 s | 768 Kb | | 13.849 s | 238.793697 s | 1024 Kb | | 46.243 s | 379.175356 s | 1280 Kb | | 86.046 s | 550.838260 s | 1536 Kb | 

So my question is: is there anything special in my string replacement function? Or can this speed be expected? I compiled the code using ocamlopt .


Here is my other attempt:

 let string_index_of where what = let length = (String.length where - 1) in let pattern = (1 lsl (String.length what + 1)) - 1 in let rec loop ij mask = if i == length + 1 then (-1) else if mask == pattern then i - j else if (where.[i]) == (what.[j]) then loop (i + 1) (j + 1) ((mask lsl 1) lor 1) else loop (i + 1 - j) 0 1 in loop 0 0 1;; let test = let initial = "abcdefgh" ^ "efghefgh" in let replacement = "____" in let replacement_length = String.length replacement in let initial_time = Sys.time() in let initial_length = String.length initial in let imax = (1024 / (String.length initial)) * 1024 * 4 in let rec loop si = if i < imax then let concatenated = s ^ "efgh" in let sl = initial_length * i in let rec replace_loop st = let index = string_index_of st "efgh" in if index >= 0 then ((String.blit replacement 0 st index replacement_length); replace_loop st) in replace_loop concatenated; if sl mod (1024 * 256) == 0 then Printf.printf "%fs | %d Kb |\n" (Sys.time() -. initial_time) (sl / 1024); loop concatenated (i + 1) in loop initial 0;; test;; 

I also made a large table (including some other languages ​​for comparison):

 | JavaScript V8 | ocamlopt | ocamlopt bitup | SBCL | C gcc 4 -O3 | String size | |---------------+--------------+----------------+-----------+-------------+-------------| | 0.78 s | 14.576784 s | 4.615298 s | 17.463 s | 1 s | 256 Kb | | 3.266 s | 58.468111 s | 18.474191 s | 68.484 s | 4 s | 512 Kb | | 7.618 s | 132.954788 s | 41.673664 s | 153.99 s | 10 s | 768 Kb | | 13.849 s | 238.793697 s | 74.652651 s | 275.204 s | 17 s | 1024 Kb | | 46.243 s | 379.175356 s | 116.517287 s | 431.768 s | 27 s | 1280 Kb | | 86.046 s | 550.838260 s | 166.662663 s | 624.559 s | 38 s | 1536 Kb | 

However, I would hope to do better than that.

+3
source share
2 answers

It's hard for me to compare the performance of Javascript, most engines (js) use ropes as an internal data structure for strings ( https://en.wikipedia.org/wiki/Rope_%28data_structure%29 ).

So, every time you execute a substring, concat, or any function that should create a new line (or internally called), ocaml creates a new line, while javascript uses ropes to refer to the previous one.

If you want to see it, check every time you use create: ( https://github.com/ocaml/ocaml/blob/trunk/stdlib/string.ml ). I wanted to show you how the V8 optimizes its reuse of string fragments, but cannot find it easily, so it will be used as an exercise for you.

Hope this helps to understand the difference and why javscript still works better than C when it doesn't need to handle a lot of ropes.

+2
source

Your string search has complex complexity, you should try a more efficient algorithm for finding strings ( here is a list , I recommend KMP )

You seem to be new to OCaml, so here are some differences from the language you're used to:

  • operator for checking structural equality = , not == , which checks physical equality (there is no difference for int , but this will cause some problems with other types).
  • a tuple is created with ( and ) , not [ and ] , which are for lists, so [shift, i] is a list containing one element, which is a pair. And I think Ocaml complains about this unrecognizable pattern matching.
  • String.get si can be written s.[i] .
+3
source

Source: https://habr.com/ru/post/949365/


All Articles