Using zip to find the difference between two lines in Scala

This is a continuation of my previous question .

I noticed that if I manipulate two lines (for example, to get the longest common prefix, calculate the difference between two lines, etc.), I tend to use zip (as in (s1 zip s2)... ).

It looks good, but probably inefficient (compared to imperative code). Is it correct? Perhaps if performance matters, I should use an imperative algorithm.

Now I am trying to find if the two lines are different. Could you use zip for this?

+4
source share
4 answers

It is less efficient to use (s1,s2).zipped than (s1 zip s2) , since you do not need to create tuples; instead, you create functions that take two arguments.

Even better for efficiency, but not ease of use, is to define your own custom folder with a chain:

 abstract class StrFold[@specialized T](var result: T) { def apply(c1: Char, c2: Char): Unit def done: Boolean } def strfold[@specialized T](s1: String, s2: String)(folder: StrFold[T]): T = { var i = 0 val L = math.min(s1.length, s2.length) while (i < L) { folder(s1.charAt(i), s2.charAt(i)) if (folder.done) return folder.result i += 1 } folder.result } 

Now, if you want to find the length of the longest common prefix, you

 class LcpFind extends StrFold(0) { var done = false def apply(c1: Char, c2: Char) { if (c1 == c2) result += 1 else done = true } } def lcp(s1: String, s2: String) = strfold(s1,s2)(new LcpFind) 

This is a fair job β€” perhaps almost the same as just writing an imperative or recursive function code to compute LCP without a fold-with-escape clause.

But it should be almost as fast as writing down material every time. (The only penalty should be to create a (tiny) LcpFind object every time, and if you really wanted to be able to reuse it by resetting result to zero between calls or you could change StrFold and StrFold to implement and call the reset method by changing the input a parameter called zero and having a separate result method.)

+2
source

Yes, it will be less effective for two reasons.

  • You create a list of character pairs of the same length as the shorter string. This will take up significantly more space than the original two lines. The usual ways to find the longest common prefix or to check if lines are different from one character do not require additional memory.

  • When searching for LCP, you don’t have to iterate over whole lines. Since zip really needs to do this, it will obviously be less efficient to pin them first.

But this does not mean that you need to use imperative algorithms: a normal tail recursive functional algorithm will work just as well.

+2
source

It looks good, but probably inefficient (compared to the imperative code).

It copies both inputs, so the effective O (n) space when searching for the longest common prefix can be the time in O (1) memory. In addition, O (n) time is required, while O (len (LCP)) time will be sufficient (although O (n) time is in the worst case) and it allocates memory; it is an expensive operation.

Now I'm trying to find if two strings differ in only one char. Could you use zip for this?

Depends on how many lines and how important the performance is. You can probably use zip for the first shot.

+2
source

You can use view to have a lazy rating when executing your zip (so that it zips up only what you are doing).

+1
source

Source: https://habr.com/ru/post/1382887/


All Articles