Does it make sense to use the spark offline on one large computer?

Question

Does it make sense to use the spark offline on one large computer?

I work with ~ 120Gb csv files (from 1Gb to 20Gb each). I am using a 220Gb Ram computer with 36 hells.

I was wondering if it makes sense to use the spark offline for this analysis? I really like the natural spark concurrency plus (with pyspark). I have a good laptop environment.

I want to create material like union / aggregation and run machine learning in a converted dataset. Python tools like pandas want to use only one thread, which seems like a massive waste, as using all 36 threads should be much faster.

+4

python concurrency ipython-notebook apache-spark

anthonybell Jul 16 '15 at 14:59

source share

1 answer

cnnrznn · Answer 1 · 2015-07-16T15:11:00+0000

, , node, , , ( ), .

, 1 node. . /spark -submit, :

--master local[*]

:

./spark-submit --master local[*] <your-app-name> <your-apps-args>

node, .

, , ; 512 . , , -, SparkConf.

Does it make sense to use the spark offline on one large computer?

More articles: