Hadoop (HDFS) - file versioning

Question

Hadoop (HDFS) - file versioning

At the moment, I have a custom file system in my application (apache CMIS). As it grows larger, I doubt moving to how-to (HDFS), as we also need to run some statistics. Problem: The current file system provides version control of files. When I read about hasoop - HDFS - and file versioning, I found most of the time I had to write this (version). Is there something available for version control of files in HDFS or do I really need to write it myself (do not want to invent hot water, but do not find a suitable solution either).

Answer

Details: see comments for answers below

Hadoop (HDFS) does not support file versioning. You can get this functionality when you combine hadoop with (amazon) S3: Hadoop will use S3 as the file system (without chuncks, but recovery will be provided by S3). This solution comes with file versions that S3 provides. Hadoop will still use YARN for distributed processing.

+4

version-control hadoop hdfs

Vandeperre maarten Mar 13 '17 at 9:45

source share

2 answers

HDFS . , "" HDFS.

+1

facha 13 . '17 13:27

franklinsijo · Accepted Answer · 2017-03-13T13:17:52+0000

Versions with HDFS are not possible.
Instead, you can use Amazon S3 , which provides Versioning and is also compatible with Hadoop.

Hadoop (HDFS) - file versioning

More articles: