I have data 2 GBin mine HDFS.
2 GB
HDFS
Is it possible to get data randomly. Like the Unix command line
cat iris2.csv |head -n 50
Native head
hadoop fs -cat /your/file | head
effective here, as the cat will close the stream as soon as head finishes reading all the lines.
To get the tail , hadoop has a special effective command:
hadoop fs -tail /your/file
Unfortunately, it returns the last kilobyte of data, not the specified number of rows.
head tail Linux 10 10 . , , .
head
tail
Linux shuffle - shuf , Hadoop , :
shuf
$ hadoop fs -cat <file_path_on_hdfs> | shuf -n <N>
, , iris2.csv HDFS, , 50 :
iris2.csv
$ hadoop fs -cat /file_path_on_hdfs/iris2.csv | shuf -n 50
. Linux sort, shuf .
sort
Hive, - :
SELECT column1, column2 FROM ( SELECT iris2.column1, iris2.column2, rand() AS r FROM iris2 ORDER BY r ) t LIMIT 50;
EDIT: :
SELECT iris2.column1, iris2.column2 FROM iris2 ORDER BY rand() LIMIT 50;
sudo -u hdfs hdfs dfs -cat "path of csv file" |head -n 50
50 - the number of rows (this can be configured by the user based on requirements)
hdfs dfs -cat yourFile | shuf -n <number_of_line>
Will do the trick for you. Although its not available on Mac OS. You can install GNU coreutils.
Source: https://habr.com/ru/post/1529381/More articles:gradle structure of several projects - best practice - svnAngularJS $ http get in ASP.NET Web Api with object in parameters - javascriptRegex для разделения строки, содержащей запятые - pythongmvault on osx - Received IMAP interrupt error. Wait 1 second and try again - osx-maverickshow to delete files from a remote server using phpStorm - phpShould we name the superclass before or after the execution of some code - androidPlaying Sound with SoundPlayer - c #Get QML element in Squish by id or objectName using findObject () or waitForObject () without an object map - pythonhttps://translate.googleusercontent.com/translate_c?depth=1&pto=aue&rurl=translate.google.com&sl=ru&sp=nmt4&tl=en&u=https://fooobar.com/questions/1529385/adding-column-with-empty-string-as-default-value-and-not-null-constraint-leads-to-inconsistent-behaviour-for-oracle-database&usg=ALkJrhgl4aGCkhDHgoqft7eNdfKtst2GgQAngularjs List Item Margin Issue Combining ng-repeat elements with static - listAll Articles