Understanding LongWritable

Sorry if this is a stupid question, but I could not find the answer in a google search. How can I understand the type of LongWritable ? What is it? Can anyone link to a diagram or other useful page.

+6
source share
3 answers

Hadoop should be able to serialize data to and from Java types through DataInput and DataOutput (usually I / O streams). Writable classes do this by implementing two methods: write (DataOuput) and readFields (DataInput).

In particular, LongWritable is a Writable class that wraps java long.

In most cases (especially for starters) you can mentally replace LongWritable β†’ Long , i.e. it's just a number. If you decide to define your own data types, you will begin to understand how to implement a write-friendly interface:

Which looks something like this:

 public interface Writable { public void write(DataOutput out) throws IOException; public void readFields(DataInput in) throws IOException; } 
+16
source

The Mapper class is a generic type with four formal parameters that define the input key, input value, output key, and output values ​​for the display function.

 public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> { @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { } @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { } } 

For example code, the input key is a long integer offset, the input value is a string of text. the output key is an integer, and the output value is an integer. Instead of using the built-in types of Java, Hadoop provides its own set of basic types, optimized for serializing the network. They are in the package org.apache.hadoop.io.

Here we use LongWritable, which corresponds to Java Long, Text (e.g. String Java) and IntWritable (e.g. Java Integer).

+3
source

From the Apache documentation page ,

Writable described as:

A serializable object that implements a simple, efficient serialization protocol based on DataInput and DataOutput.

LongWritable is A WritableComparable for longs.

Need for Writables:

In Hadoop, interprocess communication was built using remote procedure calls (RPC). The RPC protocol uses serialization to render a message in a binary stream at the sender, and it will be deserialized into the original message from the binary stream in the receiver.

Java Serialization has many flaws in terms of performance and efficiency. Java serialization is much slower than use in memory storages, and tends to significantly expand the size of the object. Java Serialization also creates a lot of garbage.

Refer to these two posts:

dzone article

https://softwareengineering.stackexchange.com/questions/191269/java-serialization-advantages-and-disadvantages-use-or-avoid

For Hadoop to be effective, the serialization / de-serialization process must be optimized, since there are a lot of remote calls between cluster nodes. So the serialization format should be fast, compact, extensible and interoperable . For this reason, the Hadoop framework has developed its own I / O classes to replace Java primitive data types. e.g. IntWritbale for int , LongWritable for long , Text for String , etc.

You can get more information by referring to the fourth edition of the Hadoop the definitive guide.

0
source

Source: https://habr.com/ru/post/918378/


All Articles