The Writable interface is different from Serializable. Serializable does not assume that the stored value class is known. So each instance is marked with its class. ObjectOutputStream and ObjectInputStream optimizes this, so that 5-byte pens are written for class instances after the first. But sequences of objects with handles cannot be accessed randomly, since they rely on flow. This complicates things like sorting.
On the other hand, in terms of writing, it is assumed that the application knows the expected class. An application should be able to instantiate in to call readFields (). Therefore, the class should not be stored with each example. This results in significantly more compact binaries, easier random access, and generally higher performance.
Perhaps Hadoop could use Serializable. You can override writeObject or writeExternal for each class whose serialization was performance critical. (MapReduce is very intensive with i / o, so almost every class is performance-critical.) You can implement ObjectOutputStream.writeObjectOverride () and ObjectInputStream.readObjectOverride () to use a more compact representation that, for example, you did not need to mark every top level instance in a file with its class. This will probably require at least as much as Haddop's in Writable, ObjectWritable, etc. and the code will be a little more complicated as it will try to work with another model. But this may have the advantage of better built-in version control. Or that?
The mechanism of the Serializable version is for classes to define a static named serialVersionUID. This protects incompatible changes, but does not allow backward compatibility. For this, the application must explicitly allow with versions. He needs to reason to some extent about what was written during the reading in order to decide what to do. But the Serializeable version engine does not support this more or less than Writable.