PySpark: PicklingError: failed to serialize object: TypeError: cannot sort CompiledFFI objects

Question

PySpark: PicklingError: failed to serialize object: TypeError: cannot sort CompiledFFI objects

I am new to PySpark and found an error while trying to encrypt data in RDD using the cryptography module. Here is the code:

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('encrypt').getOrCreate()

df = spark.read.csv('test.csv', inferSchema = True, header = True)
df.show()
df.printSchema()

from cryptography.fernet import Fernet
key = Fernet.generate_key()
f = Fernet(key)

dfRDD = df.rdd
print(dfRDD)
mappedRDD = dfRDD.map(lambda value: (value[0], str(f.encrypt(str.encode(value[1]))), value[2] * 100))
data = mappedRDD.toDF()
data.show()

Everything works fine until I try to match value[1]with str(f.encrypt(str.encode(value[1]))). I get the following error:

PicklingError: Failed to serialize object: TypeError: Failed to sort CompiledFFI objects

I did not see too many resources referencing this error, and wanted to find out if anyone else met (or if through PySpark you have the recommended approach to column encryption).

+4

python pickle apache-spark pyspark

Byrdann fox Aug 21 '17 at 4:31

source share

1

user8371915 · Accepted Answer · 2017-08-21T10:02:40+0000

Hive (HIVE-5207, HIVE-6329), (HIVE-7934).

, Fernet . , :

def f(value, key=key): 
    return value[0], str(Fernet(key).encrypt(str.encode(value[1]))), value[2] * 100

mappedRDD = dfRDD.map(f)

def g(values, key=key):
    f = Fernet(key)
    for value in values:
        yield value[0], str(f.encrypt(str.encode(value[1]))), value[2] * 100

mappedRDD = dfRDD.mapPartitions(g)

PySpark: PicklingError: failed to serialize object: TypeError: cannot sort CompiledFFI objects

More articles: