How to convert Dataset <Tuple2 <String, DeviceData >> to Iterator <DeviceData>

I have a Dataset<Tuple2<String,DeviceData>> and want to convert it to Iterator<DeviceData> .

Below is my code, where I use the collectAsList() method and then get Iterator<DeviceData> .

 Dataset<Tuple2<String,DeviceData>> ds = ...; List<Tuple2<String, DeviceData>> listTuple = ds.collectAsList(); ArrayList<DeviceData> myDataList = new ArrayList<DeviceData>(); for(Tuple2<String, DeviceData> tuple : listTuple){ myDataList.add(tuple._2()); } Iterator<DeviceData> myitr = myDataList.iterator(); 

I cannot use collectAsList() as my data is huge and this will hinder performance. I looked into the Dataset API, but could not get any solution. I was looking for her, but could not find the answer. Can someone guide me? If the solution is in java, it will be great. Thanks.

EDIT:

DeviceData class is a simple javabean. Here is the output of printSchema () for ds.

 root |-- value: string (nullable = true) |-- _2: struct (nullable = true) | |-- deviceData: string (nullable = true) | |-- deviceId: string (nullable = true) | |-- sNo: integer (nullable = true) 
+5
source share
1 answer

You can directly extract DeviceData from ds instead of building and building again.

Java:

 Function<Tuple2<String, DeviceData>, DeviceData> mapDeviceData = new Function<Tuple2<String, DeviceData>, DeviceData>() { public DeviceData call(Tuple2<String, DeviceData> tuple) { return tuple._2(); } }; Dataset<DeviceData> ddDS = ds.map(mapDeviceData) //extracts DeviceData from each record 

Scala:

 val ddDS = ds.map(_._2) //ds.map(row => row._2) 
+1
source

Source: https://habr.com/ru/post/1264712/


All Articles