I am new to PySpark and found an error while trying to encrypt data in RDD using the cryptography module. Here is the code:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('encrypt').getOrCreate()
df = spark.read.csv('test.csv', inferSchema = True, header = True)
df.show()
df.printSchema()
from cryptography.fernet import Fernet
key = Fernet.generate_key()
f = Fernet(key)
dfRDD = df.rdd
print(dfRDD)
mappedRDD = dfRDD.map(lambda value: (value[0], str(f.encrypt(str.encode(value[1]))), value[2] * 100))
data = mappedRDD.toDF()
data.show()
Everything works fine until I try to match value[1]with str(f.encrypt(str.encode(value[1]))). I get the following error:
PicklingError: Failed to serialize object: TypeError: Failed to sort CompiledFFI objects
I did not see too many resources referencing this error, and wanted to find out if anyone else met (or if through PySpark you have the recommended approach to column encryption).
source
share