How to store and query very large data sets (in addition to relational databases)

We are currently faced with the problem of efficiently storing and retrieving data from very large data sets (billions). We used mysql and optimized the system, OS, raids, queries, indexes, etc., and now they want to move on.

I need to make an informed decision about which technology should be addressed to solve our data problems. I studied map / reduction using HDFS, but also heard good things about HBase. I cannot help but think that there are other options. Is there a good comparison of the technologies available and what are their tradeoffs?

If you have links to share with each other, I would appreciate it.

+3
source share
1

. , . - DB. , RAID- - Oracle , . TPC-H : http://www.tpc.org/tpch/results/tpch_perf_results.asp, . - .
- Hadoop HDFS + Map/Reduce + Hive. - MapReduce. , . , - .
- MPP - . SQL. Netezza, Greenplum, Asterdata, Vertica. - , .

+4

Source: https://habr.com/ru/post/1786561/