At the moment, I have a custom file system in my application (apache CMIS). As it grows larger, I doubt moving to how-to (HDFS), as we also need to run some statistics. Problem: The current file system provides version control of files. When I read about hasoop - HDFS - and file versioning, I found most of the time I had to write this (version). Is there something available for version control of files in HDFS or do I really need to write it myself (do not want to invent hot water, but do not find a suitable solution either).
Answer
Details: see comments for answers below
Hadoop (HDFS) does not support file versioning. You can get this functionality when you combine hadoop with (amazon) S3: Hadoop will use S3 as the file system (without chuncks, but recovery will be provided by S3). This solution comes with file versions that S3 provides. Hadoop will still use YARN for distributed processing.
source
share