Creating more "functional" code in Scala to use immutable collections

I port the algorithm from Java to Scala, which does a range search on the VP Tree . Briefly, nodes in a tree have coordinates in space and a radius: nodes in this radius can be found in the left subtree, while nodes outside this radius are found in the right subtree. A range search tries to find all the objects in the tree at a given distance from the query object.

In Java, I passed an arraylist function in which it accumulated results, possibly recursing down one of both or both subtrees. Here's the direct port to Scala:

def search(node: VPNode[TPoint, TObject], query: TPoint, radius: Double, results: collection.mutable.Set[TObject]) { var dist = distance(query, node.point) if (dist < radius) results += node.obj if (node.left != null && dist <= radius + node.radius) search(node.left, query, radius, results) if (node.right != null && dist >= radius + node.radius) search(node.right, query, radius, results) } 

Scala Collection types are immutable by default, and I thought it was a little annoying to introduce collection.mutable. all the time, so I started to study it. It seems recommended that using immutable collections is almost always wonderful: I use this code to execute millions of queries, although it seems to me that copying and concatenating an array of results will slow it down.

Answers such as this , for example, suggest that the problem needs to be approached more โ€œfunctionallyโ€.

So what should I do to solve this problem in a more Scala network? Ideally, I would like it to be as fast as the Java version, but I'm interested in the solutions independently (and can always project them to see if it matters).

Note that I was just starting to learn Scala (maybe I could cut my teeth for something useful), but I'm not new to functional programming, I used Haskell before (although I don't think I'm good at that!).

+6
source share
2 answers

I wonder if you will get good performance using the standard immutable List . All search is checking one node at a time and adding the current element if it meets some criteria, and then do a double recursion. Thus, you can use a fixed battery:

 def search(node: VPNode[TPoint, TObject], query: TPoint, radius: Double, acc: List[TObject] = Nil): List[TObject] = { val dist = distance(query, node.point) val mid = if (dist < radius) node.obj :: acc else acc val midLeft = if (node.left != null && dist <= radius + node.radius) search(node.left, query, radius, mid) else mid if (node.right != null && dist >= radius + node.radius) search(node.right, query, radius, midLeft) else midLeft } 

As far as I can see, this only precedes the start of battery life and should be fast.

Please note that I believe that it is normal to use a mutable collection internally and return immutable for the caller:

 def search(node: VPNode[TPoint, TObject], query: TPoint, radius: Double): Vector[TObject] = { import collection.immutable.{VectorBuilder => Builder} def rec(n: VPNode[TPoint, TObject], acc: Builder[TObject]): Builder[TObject] = { val dist = distance(query, node.point) val mid = if (dist < radius) acc += node.obj if (node.left != null && dist <= radius + node.radius) rec(node.left, acc) if (node.right != null && dist >= radius + node.radius) rec(node.right, acc) acc } rec(node, new Builder()).result } 
+3
source

Here is what I would like to consider a more functional approach:

 val emptySet = Set[TObject]() def search(node: VPNode[TPoint, TObject], query: TPoint, radius: Double): Set[TObject] = { val dist = distance(query, node.point) val left = Option(node.left) // avoid nulls .filter(_ => dist <= radius + node.radius) // do nothing if predicate fails .fold(emptySet)(l => search(l, query, radius)) // continue your search val right = Option(node.right) .filter(_ => dist >= radius + node.radius) .fold(emptySet)(r => search(r, query, radius)) left ++ right ++ (if (dist < radius) Set(node.obj) else emptySet) } 

Instead of passing each search function around mutable.Set , the search function returns Set[TObject] , which is then combined with other sets. If you created function calls, it would look like each node of your tree was connected to each other (if they were in your radius).

In terms of efficiency, this is probably not as effective as the volatile version. Using List instead of Set is likely to be better, and then you can convert the final List to Set when you are done (although probably not as fast as the modified version).

UPDATE To answer your question about benefits:

  • Determinism. Since it is immutable, you are always guaranteed the same results when you call this function with the same parameters. With that said, you original version must be deterministic, you just don't know who else will modify your results, although this is probably not a big problem.
  • Is it hard to read? - I think that this is more a matter of opinion and experience in different programming styles. I found your version hard to read, because you are not returning values โ€‹โ€‹from a function, and you have several if statements. I agree that at first Option / filter / fold may seem a little strange, but after you start using them for a while (like everyone else), it becomes easy to read. I would compare this with the ability to read LINQ in .NET.
  • Performance. Using @huynhjl answer, using List , you should get equal, if not better performance, from your original version. It doesn't seem like you need to use Set , which has overhead to make sure everything in the set is unique.
  • Garbage collection - in a purely functional version, you quickly create new objects and also quickly discard them, which means that they most likely will not survive after the first generation of GC. This is important because moving objects between generations cause the GC to pause. In the modified version, you pass a link to the original collection, which hangs around longer and can be compressed for the next generation. This is not the best example, because your volatile version is probably not so long-lived, and who knows what you want to do with the returned object (maybe keep it for a while). In the modified version, you will most likely get a second gen collection pointing to second-generation objects, while an immutable version you will get a first gen collection pointing to second-generation objects. Clearing the immutable version will be much faster and without pauses (again, this makes some general assumptions and generalizations about the use of your objects and what the GC does, your mileage may vary).
  • Parallelism - A functional version can be easily parallelized, but a volatile version cannot. Depending on the size of your tree, this is probably not a big problem.

Since you seem to be quite interested, I would recommend reading Functional Programming in Scala . It covers all of these basics in what I find a great way for beginners.

+5
source

Source: https://habr.com/ru/post/951918/


All Articles