Hadoop: How can I have an array of doubles as a value in a key-value pair?

I have a problem: I need to aggregate some vectors to find some statistics. For example, I have twin vectors, and I need to sum them up. My vectors look like this:

1,0,3,4,5 2,3,4,5,6 3,4,5,5,6 

My key-value pairs are still (String, String). But every time I need to add these vectors, I must first convert them to double arrays, add them, and finally convert the aggregated vector to a string. I think it would be much faster if I could have key-value pairs in the form (String, double array). There would be no need to translate them back and forth. My problem is that I cannot find a way to have double arrays as value. Is there any simple way and not create a new custom type?

+2
source share
1 answer

Do you mean something like this?

 Map<String, List<Double>> arrays = new HashMap<String, List<Double>>(); double[] array; arrays.put("ArrayKey", Arrays.asList(array)); 

then you can call your map method:

 map(String key, String arrayKey) { List<Double> value = arrays.get(arrayKey); } 

You can also serialize your double array and then deserialize it back:

 package test; import org.apache.commons.codec.binary.Base64InputStream; import org.apache.commons.codec.binary.Base64OutputStream; import java.io.*; import java.util.Arrays; public class Test { public static void main(String[] args) throws IOException, ClassNotFoundException { double[] array = {0.0, 1.1, 2.2, 3.3}; String stringValue = serialize(array); map("Key", stringValue); } public static void map(String key, String value) throws ClassNotFoundException, IOException { double[] array = deserialize(value); System.out.println("Key=" + key + "; Value=" + Arrays.toString(array)); } public static String serialize(double[] array) throws IOException { ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); Base64OutputStream base64OutputStream = new Base64OutputStream(byteArrayOutputStream); ObjectOutputStream oos = new ObjectOutputStream(base64OutputStream); oos.writeObject(array); oos.flush(); oos.close(); return byteArrayOutputStream.toString(); } public static double[] deserialize(String stringArray) throws IOException, ClassNotFoundException { ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(stringArray.getBytes()); Base64InputStream base64InputStream = new Base64InputStream(byteArrayInputStream); ObjectInputStream iis = new ObjectInputStream(base64InputStream); return (double[]) iis.readObject(); } } 

OUTPUT:

 Key=Key; Value=[0.0, 1.1, 2.2, 3.3] 

The mapping is faster, but serialization will be more useful if you use nodes and clusters for this (if you need to transfer your arrays to another JVM):

  private static class SpeedTest { private static final Map<String, List> arrays = new HashMap<String, List>(); public static void test(final double[] array) throws IOException, ClassNotFoundException { final String str = serialize(array); final int amount = 10 * 1000; long timeStamp = System.currentTimeMillis(); for (int i = 0; i < amount; i++) { serialize(array); } System.out.println("Serialize: " + (System.currentTimeMillis() - timeStamp) + " ms"); timeStamp = System.currentTimeMillis(); for (int i = 0; i < amount; i++) { deserialize(str); } System.out.println("Deserialize: " + (System.currentTimeMillis() - timeStamp) + " ms"); arrays.clear(); timeStamp = System.currentTimeMillis(); // Prepaire map, that contains reference for all arrays. for (int i = 0; i < amount; i++) { arrays.put("key_" + i, Arrays.asList(array)); } // Getting array by its key in map. for (int i = 0; i < amount; i++) { arrays.get("key_" + i).toArray(); } System.out.println("Mapping: " + (System.currentTimeMillis() - timeStamp) + " ms"); } } 

OUTPUT:

 Serialize: 298 ms Deserialize: 254 ms Mapping: 27 ms 
+3
source

Source: https://habr.com/ru/post/1439598/


All Articles