How to convert Dataset <Tuple2 <String, DeviceData >> to Iterator <DeviceData>
I have a Dataset<Tuple2<String,DeviceData>> and want to convert it to Iterator<DeviceData> .
Below is my code, where I use the collectAsList() method and then get Iterator<DeviceData> .
Dataset<Tuple2<String,DeviceData>> ds = ...; List<Tuple2<String, DeviceData>> listTuple = ds.collectAsList(); ArrayList<DeviceData> myDataList = new ArrayList<DeviceData>(); for(Tuple2<String, DeviceData> tuple : listTuple){ myDataList.add(tuple._2()); } Iterator<DeviceData> myitr = myDataList.iterator(); I cannot use collectAsList() as my data is huge and this will hinder performance. I looked into the Dataset API, but could not get any solution. I was looking for her, but could not find the answer. Can someone guide me? If the solution is in java, it will be great. Thanks.
EDIT:
DeviceData class is a simple javabean. Here is the output of printSchema () for ds.
root |-- value: string (nullable = true) |-- _2: struct (nullable = true) | |-- deviceData: string (nullable = true) | |-- deviceId: string (nullable = true) | |-- sNo: integer (nullable = true) +5
1 answer
You can directly extract DeviceData from ds instead of building and building again.
Java:
Function<Tuple2<String, DeviceData>, DeviceData> mapDeviceData = new Function<Tuple2<String, DeviceData>, DeviceData>() { public DeviceData call(Tuple2<String, DeviceData> tuple) { return tuple._2(); } }; Dataset<DeviceData> ddDS = ds.map(mapDeviceData) //extracts DeviceData from each record Scala:
val ddDS = ds.map(_._2) //ds.map(row => row._2) +1