Here is the result of some code that writes predictions from a model LogisticRegressionin json:
(predictions
.drop(feature_col)
.rdd
.map(lambda x: Row(weight=x.weight,
target=x[target],
label=x.label,
prediction=x.prediction,
probability=DenseVector(x.probability)))
.coalesce(1)
.toDF()
.write
.json(
"{}/{}/summary/predictions".format(path, self._model.bestModel.uid)))
Here is one example: a JSON object:
{"label":1.0,"prediction":0.0,"probability":{"type":1,"values":[0.5835784358591029,0.4164215641408972]},"target":"Male","weight":99}
I would like to be able to output the same data to a CSV file (preferably using only probability.values[0](the first element of an array of values). However, when I use the same code fragment as above, but replace .jsonwith .csv, I get the following result:
1.0,0.0,"[6,1,0,0,280000001c,c00000002,af154d3100000014,a1d5659f3fe2acac,3fdaa6a6]",Male,99
What happens to the third column (an array with a bunch of values quoted in a row)?