By default, Tachyon implements RDD in Apache Spark?

I am trying to understand the Spark function in memory. In this process, I came across Tachyon, which is mainly located in the memory data layer, which provides fault-tolerance without replication using linear systems and reduces recalculation by checking the data sets. Now that everything is confused, all of these functions are also reachable with the standard system > RDD . So I wonder if RDD implements Tachyon behind the curtains to implement these functions? If not what Tachyon uses, where all his work can be done with standard RDD. Or am I mistaken about these two? A detailed explanation or reference to one will be of great help. Thanks.

+6
source share
1 answer

That in the document you are linking does not reflect the reality of what's in Tachyon as an open source project for releases, parts of this article only ever existed as research prototypes and were never fully integrated into Spark / Tachyon.

When you save data to the OFF_HEAP storage OFF_HEAP via rdd.persist(StorageLevel.OFF_HEAP) , it uses Tachyon to write this data to Tachyon's memory space as a file. This removes it from the Java heap, thereby giving Spark more heap memory to work with.

He currently does not write line information, so if your data is too large to fit into the configured sections of Tachyon clusters, parts of the RDD will be lost and your Spark jobs may fail.

+1
source

Source: https://habr.com/ru/post/985685/


All Articles