Why is there no command 'hadoop fs -head' shell?

A quick method of checking files on HDFS is to use tail :

~$ hadoop fs -tail /path/to/file 

This displays the last kilobyte of data in the file, which is very useful. However, the opposite head command is not part of the shell command collections. I find it very awesome.

My hypothesis is that since HDFS is built for very fast streaming reads in very large files, there is some access problem that affects the head . It makes me hesitantly do something to gain access to my head. Does anyone have an answer?

+42
hadoop hdfs
Nov 04 '13 at 22:05
source share
4 answers

I would say that this is more related to efficiency - the head can be easily replicated by connecting the fs -cat chaop output using the linux head command.

 hadoop fs -cat /path/to/file | head 

This is effective since the head will close the underlying stream after the desired number of lines has been output.

Using a tail this way would be significantly less efficient - since you would need to transfer the entire file (all HDFS blocks) to find a finite number x of lines.

 hadoop fs -cat /path/to/file | tail 

The hadoop fs -tail command, as you note, works on the last kilobyte - hadoop can efficiently find the last block and move to the position of the last kilobyte, and then transfer the result. Piping through the tail cannot easily do this.

+102
Nov 04 '13 at 23:37
source share
 hdfs -dfs /path | head 

is a good way to solve a problem.

+1
Apr 21 '15 at 8:58
source share

you can try the following command

 hadoop fs -cat /path | head -n 

where -n can be replaced by the number of entries to view

0
Aug 13 '17 at 7:18
source share

In Hadoop v2:

 hdfs dfs -cat /file/path|head 

In Hadoop v1 and v3:

 hadoop fs -cat /file/path|head 
0
Dec 02 '17 at 11:16
source share



All Articles