I started a spark on my computer. I got 4 cores and I installed the memory for my worker in 5Go. My master is on another machine, in which no worker is working, on the same network. My code is as follows:
private void myClass() {
SparkConf conf = new SparkConf().setAppName("myWork").setMaster("spark://myHostIp:7077").set("spark.driver.allowMultipleContexts", "true");
JavaSparkContext sc = new JavaSparkContext(conf);
for(int i = 0; i<200; i++) {
System.out.println("===============================================================");
System.out.println("iteration : " + i);
System.out.println("===============================================================");
ArrayList<Boolean> list = new ArrayList<Boolean>();
for(int j = 0; j < 1900; j++){
list.add(true);
}
JavaRDD<Ant> ratings = sc.parallelize(list, 100)
.map(bool -> new myObj())
.map(obj -> this.setupObj(obj))
.map(obj -> this.moveObj(obj))
.cache();
int[] stuff = ratings
.map(obj -> obj.getStuff())
.reduce((obj1,obj2)->this.mergeStuff(obj1,obj2));
this.setStuff(tour);
ArrayList<TabObj> tabObj = ratings
.map(obj -> this.objToTabObjAsTab(obj))
.reduce((obj1,obj2)->this.mergeTabObj(obj1,obj2));
ratings.unpersist(false);
this.setTabObj(tabObj);
}
sc.close();
}
When I run it, I see progress in the Spark user interface, but it is very slow (I need to set parrallelize high enough, otherwise I will have a timeout problem). So I thought this was a bottleneck in the processor, but in fact the JVM processor consumption is really low (in most cases it is 0%, sometimes a little more than 5% ...)
The JVM uses about 3Go memory according to the monitor, and I only got 19Mo in cache according to SparkUI.
- 4cores, . (4G) . , 100% ( ), , . , ?
, . - , .
.