A GridGain application that runs slower than a multi-threaded application on one machine

I executed my first GridGain application and did not receive the expected performance improvements. Unfortunately, it is slower. I need help improving my implementation so that it can be faster.

The essence of my application is that I am performing brute force optimization with millions of possible parameters that take a split second for each function evaluation. I implemented this by dividing millions of iterations into several groups, and each group runs as one task.

Below is the corresponding code snippet. The maxAppliedRange function calls the foo function for each value in the x range and returns the maximum, and the result becomes the maximum of all the maxima found by each job.

  scalar {
    result = grid !*~
      (for (x <- (1 to threads).map(i => ((i - 1) * iterations / threads, i * iterations / threads)))
        yield () => maxAppliedRange(x, foo), (s: Seq[(Double, Long)]) => s.max)
  }

My code can choose between multi-threaded execution on the same computer or use multiple GridGain nodes using the code above. When I launch the gridgain version, it starts as if it will be faster, but then a few things happen:

  • One of the nodes (on another machine) skips a heartbeat, forcing the node on my main computer to abandon this node and start the job a second time.
  • the node that missed the pulse continues to do the same job. Now I have two nodes that do the same.
  • In the end, all the tasks are performed on my main machine, but since some of the tasks started later, it takes more time to complete the work.
  • Sometimes an exception is thrown by GridGain because the node timed out and the whole task failed.
  • Annoys me.

, , , , , , , node. , node , . , , node . .

, , , :

  • node, .

, . node , , . , , .

- , ? ?

+3
2

. 50% - , , . .

gridgain, , , . node .

xml :

    <property name="discoverySpi">
        <bean class="org.gridgain.grid.spi.discovery.multicast.GridMulticastDiscoverySpi">
            <property name="maxMissedHeartbeats" value="20"/>
            <property name="leaveAttempts" value="10"/>
        </bean>
    </property>

, , node . , . IP- , . , , , , .

, :

    <property name="collisionSpi">
        <bean class="org.gridgain.grid.spi.collision.jobstealing.GridJobStealingCollisionSpi">
            <property name="activeJobsThreshold" value="2"/>
            <property name="waitJobsThreshold" value="4"/>
            <property name="maximumStealingAttempts" value="10"/>
            <property name="stealingEnabled" value="true"/>
            <property name="messageExpireTime" value="1000"/>
        </bean>
    </property>

    <property name="failoverSpi">
        <bean class="org.gridgain.grid.spi.failover.jobstealing.GridJobStealingFailoverSpi">
            <property name="maximumFailoverAttempts" value="10"/>
        </bean>
    </property>

activeJobsThreshold node, . , . , "" , .

. Gridgain node, -, , .

javadocs, , .

+2

.

-, xml, . GRIDGAIN_HOME/config/default- spring.xml. , , ggstart.sh, node. , :

    <property name="networkTimeout" value="25000"/>

25

   <property name="executorService">
        <bean class="org.gridgain.grid.thread.GridThreadPoolExecutor">
            <constructor-arg type="int" value="1"/>
            <constructor-arg type="int" value="1"/>
            <constructor-arg type="long">
                <util:constant static-field="java.lang.Long.MAX_VALUE"/>
            </constructor-arg>
            <constructor-arg type="java.util.concurrent.BlockingQueue">
                <bean class="java.util.concurrent.LinkedBlockingQueue"/>
            </constructor-arg>
        </bean>
    </property>

1 1. - threadpool, gridgain. - 100, , .

, , :

  scalar.apply("/path/to/gridgain home/config/custom-spring.xml") {
    result = grid !*~
      (for (x <- (1 to threads).map(i => ((i - 1) * iterations / threads, i * iterations / threads)))
        yield () => maxAppliedRange(x, kalmanBruteForceObj.performKalmanIteration), (s: Seq[(Double, Long)]) => s.max)
  }

.apply node , , .

, . , .

+2

Source: https://habr.com/ru/post/1781082/


All Articles