How can I add some kind of custom field (user ID) to the forecast results?
List<org.apache.spark.mllib.regression.LabeledPoint> localTesting = ... ;
DataFrame localTestDF = jsql.createDataFrame(jsc.parallelize(studyData.localTesting), LabeledPoint.class);
DataFrame predictions = model.transform(localTestDF);
Row[] collect = predictions.select("label", "probability", "prediction").collect();
for (Row r : collect) {
int userNo = Integer.parseInt(r.get(0).toString());
double prob = Double.parseDouble(r.get(1).toString());
int prediction = Integer.parseInt(r.get(2).toString());
log.debug(userNo + "," + prob + ", " + prediction);
}
but when I used this class for localTesting instead of LabeledPoint,
class NoLabeledPoint extends LabeledPoint implements Serializable {
private static final long serialVersionUID = -2488661810406135403L;
int userNo;
public NoLabeledPoint(double label, Vector features) {
super(label, features);
}
public int getUserNo() {
return userNo;
}
public void setUserNo(int userNo) {
this.userNo = userNo;
}
}
List<NoLabeledPoint> localTesting = ... ;
DataFrame localTestDF = jsql.createDataFrame(jsc.parallelize(studyData.localTesting), LabeledPoint.class);
DataFrame predictions = model.transform(localTestDF);
Row[] collect = predictions.select("userNo", "probability", "prediction").collect();
for (Row r : collect) {
int userNo = Integer.parseInt(r.get(0).toString());
double prob = Double.parseDouble(r.get(1).toString());
int prediction = Integer.parseInt(r.get(2).toString());
log.debug(userNo + "," + prob + ", " + prediction);
}
the exception threw
org.apache.spark.sql.AnalysisException: cannot resolve 'userNo' given input columns rawPrediction, probability, features, label, prediction;
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:63)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:52)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
I mean, I want to get not only the forecast data (function, label, probability ..), but also some kind of custom field that I want. e.g. userNo, user_id, etc. from the result: predictions.select ("......")
Update
resolved. one line should be fixed.
from
DataFrame localTestDF = jsql.createDataFrame(jsc.parallelize(studyData.localTesting), LabeledPoint.class);
to
DataFrame localTestDF = jsql.createDataFrame(jsc.parallelize(studyData.localTesting), NoLabeledPoint.class);