Finding the largest item in a list with K processes using Erlang?

It is easy to implement an algorithm using one process, however, how can I use several processes to complete a task?

Here is what I have done so far.

find_largest([H], _) -> H; find_largest([H, Q | T], R) -> if H > Q -> find_largest([H | T], [Q | R]); true -> find_largest([Q | T], [H | R]) end. 

thanks

+4
source share
2 answers

Given that Erlang presents lists, it is probably not a good idea to try to do this in parallel. Splitting a list involves many copies (since they are linked by lists), as well as sending these sections to other processes. I expect the comparison to be much cheaper than just copying it twice and then combining the results.

The implementation is also incorrect, you can find it in lists.erl as max / 1

 %% max(L) -> returns the maximum element of the list L -spec max([T,...]) -> T. max([H|T]) -> max(T, H). max([H|T], Max) when H > Max -> max(T, H); max([_|T], Max) -> max(T, Max); max([], Max) -> Max. 

If for some reason your data is already in separate processes, just get the lists: max / 1 or each of the lists and send them to one place, and then get the lists: max / 1 from the list of results. You can also perform comparisons as you get results to avoid creating this staging list.

+5
source

The only process version of your code should be replaced with lists:max/1 . A useful feature for parallelizing code is as follows:

 pmap(Fun, List) -> Parent = self(), P = fun(Elem) -> Ref = make_ref(), spawn_link(fun() -> Parent ! {Ref, Fun(Elem)} end), Ref end, Refs = [P(Elem) || Elem <- List], lists:map(fun(Ref) -> receive {Ref, Elem} -> Elem end end, Refs). 

pmap/2 applies Fun to each List member in parallel and collects the results in input order. To use pmap with this problem, you need to segment the original list in the list of lists and pass it to pmap. e.g. lists:max(pmap(fun lists:max/1, ListOfLists)) . Of course, the action of segmenting lists would be more expensive than just calling lists:max/1 , so this solution would require the list to be pre-segmented. Even then, it is likely that the overhead of copying lists outweighs any advantage of parallelization - especially on a single node.

The inherent problem with your situation is that computing each subtask is tiny compared to the overhead of data management. Tasks that are more intensively calculated (for example, factoring a list of large numbers) are easier to parallelize.

This does not mean that the search for the maximum value cannot be parallelized, but I believe that for this it is necessary that your data be pre-segmented or segmented in such a way as not to require the repetition of each value.

+2
source

Source: https://habr.com/ru/post/1342067/


All Articles