I have 2 JavaRDD. The first -
JavaRDD<CustomClass> data
and second -
JavaRDD<Vector> features
My custom class has 2 fields, (String) text and (int). I have 1000 instances of CustomClass in my JavaRDD data and 1000 instances of Vector in JavaRDD functions.
I calculated these 1000 vectors using JavaRDD data and applying a display function on it.
Now I want to have a new JavaRDD form
JavaRDD<LabeledPoint>
Since the LabeledPoint constructor requires a label and a vector, I cannot use a display function that has both CustomClass and Vector as an argument to a call function, because it takes only one argument.
Can someone tell me how to combine these two JavaRDDs and get a new one
JavaRDD<LabeledPoint>
?
Here are some snippets of code that I wrote:
Class CustomClass {
String text; int label;
}
JavaRDD<CustomClass> data = getDataFromFile(filename);
final HashingTF hashingTF = new HashingTF();
final IDF idf = new IDF();
final JavaRDD<Vector> td2 = data.map(
new Function<CustomClass, Vector>() {
@Override
public Vector call(CustomClass cd) throws Exception {
Vector v = new DenseVector(hashingTF.transform(Arrays.asList(cd.getText().split(" "))).toArray());
return v;
}
}
);
final JavaRDD<Vector> features = idf.fit(td2).transform(td2);