Using SparkR and Sparklyr at the same time

As I understand it, these two packages provide similar, but mostly different wrapper functions for Apache Spark. Sparklyr is newer and still needs to expand functionality. Therefore, I believe that it is currently necessary to use both packages to get full functionality.

Since both packages essentially carry references to Java instances of scala classes, it should be possible to use packages in parallel, I think. But is it really possible? What are your best practices?

+4
source share
1 answer

These two packages use different mechanisms and are not intended to interact. Their interior is designed differently and does not provide the JVM backend in the same way.

Although one might think of some solution that would allow partial data sharing (taking into account global temporal representations) with constant metastasis, it would have a rather limited application.

If you need both, I would recommend splitting your pipeline into several stages and transferring data between them using persistent storage.

+1
source

Source: https://habr.com/ru/post/1259690/


All Articles