To process specialized file formats (for example, video) in Hadoop, you need to write a custom InputFormat and RecordReader, which understands how to turn a video file into sections (InputFormat), and then read, break it into values (RecordReader). This is a non-trivial task and requires some intermediate knowledge of how Hadoop handles data splitting. I highly recommend Tom White Hadoop O'Reilly's final guide book, as well as a video at http://www.cloudera.com . (Full disclosure: I work at Cloudera.)
, , , , , InputSplits ( InputFormat) (). http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/InputFormat.html
: InputFormat , InputSplit, () 64 128 . RecordReader InputSplit , . , .
, .