Writing null to DataOutput in Hadoop

I am using Hadoop 0.20.2. I am writing an X object that implements Writable .

X has several fields that are instances of Integer . For these fields, null is of particular importance.

When serializing an object by writing to a DataOutput out in the write method of the Writable interface, Writable there a way to write null ? Or should I use separate booleans that indicate that the value is null?

+4
source share
3 answers

Booleans are the standard procedure for marking object properties as NULL .

Consider this case:

 public class LongMessage implements Writable { private long tag; private String data; // interface methods omitted first } 

So data may be NULL for some reason. Therefore, I would do the read / write as follows:

  @Override public void readFields(DataInput in) throws IOException { tag = in.readLong(); if (in.readBoolean()) { data = in.readUTF(); } else { data = null; } } @Override public void write(DataOutput out) throws IOException { out.writeLong(tag); if (data != null) { out.writeBoolean(true); out.writeUTF(data); } else { out.writeBoolean(false); } } 

It is even pretty readable. But keep in mind that you have a fixed overhead of one byte per write, as stated in JavaDocs #writeBoolean :

Writes a boolean value to this output stream. If v is true, the value (byte) 1 is written; if v is false, the value (byte) 0 is written

+6
source

NullWritable is a special type of Writable because it has zero-length serialization. No bytes are written to or read from the stream. see pdf: 104 final aduon guide for further reference

+1
source

When serializing, the size of a null object is exactly 1 byte in the Java object serialization protocol . Therefore, I do not think that you will have a problem in the write method of your custom Writable .

Typically, it really depends on what you are trying to simulate. If you are trying to represent Boolean and null means this is not the case, you should probably use false by default. If it is an integer, you should use the default value for your dataset by default. Therefore, if there is no specific processing associated with the "special significance" that you mention, I think that everything is fine with the null entry, otherwise you should use the default value.

0
source

Source: https://habr.com/ru/post/1469315/


All Articles