Hadoop: an array of primitives as a value in a pair of key values

I asked a very similar question in a previous Hadoop thread : How can I have an array of pair numbers as a value in a key-value pair? .

My problem is that I want to pass a double array as a value from the map to reduce phase. The answer I received is to serialize, convert to text, pass it to the reducer and deserialize. This is a great solution, but it is like serializing and deserializing twice.

ArrayWritable only allows types that implement Writable, such as FloatWritable. So another solution is to convert my doubles array to a DoubleWritables array. But this takes some time, and Writables is a very expensive resource. Is there such a simple solution as ArrayWritable array = new ArrayWritable (Double.class) ???

+4
source share
2 answers

Just implement your own Writable interface.

For instance,

public class DoubleArrayWritable implements Writable { private double[] data; public DoubleArrayWritable() { } public DoubleArrayWritable(double[] data) { this.data = data; } public double[] getData() { return data; } public void setData(double[] data) { this.data = data; } public void write(DataOutput out) throws IOException { int length = 0; if(data != null) { length = data.length; } out.writeInt(length); for(int i = 0; i < length; i++) { out.writeDouble(data[i]); } } public void readFields(DataInput in) throws IOException { int length = in.readInt(); data = new double[length]; for(int i = 0; i < length; i++) { data[i] = in.readDouble(); } } } 
+8
source

You can specify double[] as the value type for Map :

 Map<String, double[]> map = new HashMap<String, double[]>(); // compiles 

Java arrays are automatically Serializable if the element type is Serializable , and primitives are all Serializable .

0
source

Source: https://habr.com/ru/post/1439595/


All Articles