Is there an implementation of fast parallel syntactic sugar in scala? eg. map reduce

Question

Is there an implementation of fast parallel syntactic sugar in scala? eg. map reduce

Messaging with actors is great. But I would like to have even simpler code.

Examples (pseudo code)

val splicedList:List[List[Int]]=biglist.partition(100) val sum:Int=ActorPool.numberOfActors(5).getAllResults(splicedList,foldLeft(_+_))

where spliceIntoParts turns one large list into 100 small partofactors part lists, creates a pool that uses 5 actors and receives new tasks after completion and getallresults uses the method in the list. all this is done with messaging in the background. where getFirstResult is possible, calculates the first result and stops all other threads (for example, breaks the password)

+4

scala concurrency syntactic-sugar mapreduce

TiansHUo Apr 28 '10 at 11:51

source share

5 answers

With Scala Parallel Collections, which will be included in 2.8.1, you can do things like this:

 val spliced = myList.par // obtain a parallel version of your collection (all operations are parallel) spliced.map(process _) // maps each entry into a corresponding entry using `process` spliced.find(check _) // searches the collection until it finds an element for which // `check` returns true, at which point the search stops, and the element is returned

and the code will automatically execute in parallel. Other methods found in the regular collection library are also parallelized.

Currently 2.8.RC2 is very close (this or next week) and the 2.8 finals will come a few weeks after, I think. You can try parallel collections if you use the night watch 2.8.1.

+4

axel22 Apr 29 '10 at 13:11

source share

You can use the Scalaz concurrency function to achieve what you want.

 import scalaz._ import Scalaz._ import concurrent.strategy.Executor import java.util.concurrent.Executors implicit val s = Executor.strategy[Unit](Executors.newFixedThreadPool(5)) val splicedList = biglist.grouped(100).toList val sum = splicedList.parMap(_.sum).map(_.sum).get

It would be pretty easy to make it prettier (i.e. write a mapReduce function that splits and stacks everything in one). In addition, parMap over the list is overly strict. You will want to start folding before the entire list is ready. More like:

 val splicedList = biglist.grouped(100).toList val sum = splicedList.map(promise(_.sum)).toStream.traverse(_.sum).get

+3

Apocalisp Apr 28 '10 at 12:53

source share

I am not waiting for Scala 2.8.1 or 2.9, it would be better to write my own library or use another one, so I did more search query and found this: akka http://doc.akkasource.org/actors

which has object futures with methods

 awaitAll(futures: List[Future]): Unit awaitOne(futures: List[Future]): Future

but http://scalablesolutions.se/akka/api/akka-core-0.8.1/ does not have documentation at all. This is bad.

But the good part is that the Akka actors are more compact than the Scala native. With all these libraries (including the skalaz) it would be really cool around if Scala himself could finally unite them officially

+2

TiansHUo Apr 29 '10 at 3:44

source share

In Scala Days 2010, there was a very interesting conversation by Alexander Prokopek (who is working on Scala in the EPFL) about Parallel Collections . This will probably be in 2.8.1, but you may have to wait a little longer. I'll see if I can get a presentation myself. for reference here.

The idea is to have a collection structure that parallelizes the processing of collections, doing what you offer, but transparent to the user. All you theoretically need to do is change the import from scala.collections to scala.parallel.collections. You obviously still have to do the work to check if what you are doing can really be parallelized.

+1

Matthew farwell Apr 28 '10 at 14:16

source share

Rex kerr · Accepted Answer · 2010-04-28T15:07:20+0000

You can do this with less overhead than creating participants using futures:

 import scala.actors.Futures._ val nums = (1 to 1000).grouped(100).toList val parts = nums.map(n => future { n.reduceLeft(_ + _) }) val whole = (0 /: parts)(_ + _())

You must handle the problem decomposition and write the “future” block and rewrite it into the final answer, but this makes executing several small blocks of code in parallel simple.

(Note that _() on the left side is a function of the application of the future, which means: “Give me the answer that you are calculating in parallel!” And it will be blocked until the answer is available.)

The parallel collection library will automatically decompose the problem and recompile the answer for you (as in pmap in Clojure); which is not yet part of the core API.

Is there an implementation of fast parallel syntactic sugar in scala? eg. map reduce

More articles: