Why is there no command 'hadoop fs -head' shell?

Question

Why is there no command 'hadoop fs -head' shell?

A quick method of checking files on HDFS is to use tail :

~$ hadoop fs -tail /path/to/file

This displays the last kilobyte of data in the file, which is very useful. However, the opposite head command is not part of the shell command collections. I find it very awesome.

My hypothesis is that since HDFS is built for very fast streaming reads in very large files, there is some access problem that affects the head . It makes me hesitantly do something to gain access to my head. Does anyone have an answer?

+42

hadoop hdfs

bbengfort Nov 04 '13 at 22:05

source share

4 answers

 hdfs -dfs /path | head

is a good way to solve a problem.

+1

xu2mao Apr 21 '15 at 8:58

source share

you can try the following command

 hadoop fs -cat /path | head -n

where -n can be replaced by the number of entries to view

0

Amey Aug 13 '17 at 7:18

source share

In Hadoop v2:

 hdfs dfs -cat /file/path|head

In Hadoop v1 and v3:

 hadoop fs -cat /file/path|head

0

Ani Menon Dec 02 '17 at 11:16

source share

Chris White · Accepted Answer · 2013-11-04 23:37

I would say that this is more related to efficiency - the head can be easily replicated by connecting the fs -cat chaop output using the linux head command.

 hadoop fs -cat /path/to/file | head

This is effective since the head will close the underlying stream after the desired number of lines has been output.

Using a tail this way would be significantly less efficient - since you would need to transfer the entire file (all HDFS blocks) to find a finite number x of lines.

 hadoop fs -cat /path/to/file | tail

The hadoop fs -tail command, as you note, works on the last kilobyte - hadoop can efficiently find the last block and move to the position of the last kilobyte, and then transfer the result. Piping through the tail cannot easily do this.

Why is there no command 'hadoop fs -head' shell?

More articles: