GlusterFS or Ceph as a backend for Hadoop

Has anyone tried using GlusterFS or Ceph as a backend for Hadoop? I'm not talking about just using a plugin for sewing things. Is performance better than HDFS? whether it is suitable for use in production.

Also, is it really a good idea to combine object storage, hoods hasoop storage together as a single store? or better to separate them.

+5
source share
2 answers

I tried Ceph as an β€œinsertion” of the HDFS replacement in Hadoop 2.7, and after solving many integration issues, he found that it was two / three times slower than HDFS with the default replication rate in the tester. I do not know the reason for this. Other people have tried a different approach with a similar result:

http://www.snia.org/sites/default/files/SDC15_presentations/cloud_files/YuanZhou_big_data_analytics_on_object_store_r3.pdf

Is it a good idea to combine object storage and hdfs? I think the question is wrong. Both HDFS (via Ozone and FUSE) and Ceph provide the ability to use them as a repository of objects and regular POSIX file systems, with Ceph also having an edge block repository, while HDFS is currently being discussed: https: // issues. apache.org/jira/browse/HDFS-11118 If the question arises: "Can I expose my storage as POSIX FS, Object, Block store at the same time?" Then the answer will be, if your design satisfies your requirements for scalability and high availability, this may be a great idea in fact.

+4
source

I used GlusterFS before, it has some nice features, but in the end I decided to use HDFS for the distributed file system in Hadoop.

The good thing about GlusterFS is that it does not require master-client nodes. Each cluster node in the cluster is the same, so there is no single failure in GlusterFS. And one more thing that interests me in GlusterFS is that it has the glusterfs-client module, http://www.jamescoyle.net/how-to/439-mount-a-glusterfs-volume , when you if you want to save the file in glusterfs, you don’t need to interact with Apache GlusterFS, you just need to copy the file to the installed volume in glusterfs-client and make the work so simple.

But I find that GlusterFS is difficult to integrate into the Hadoop ecosystem, such as Spark, Mapreduce, ect .., where HDFS is supported by all of the majority of any components in the Hadoop ecosystem. I think GlusterFS is good for building a clustered system such as file storage, regardless of Hadoop.

+5
source

Source: https://habr.com/ru/post/1237242/


All Articles