Recommendation for large-scale storage

I have a large amount of data that I need to store and be able to generate reports - each of which represents an event on a website (we are talking about 50 seconds per second, so obviously older data should be aggregated).

I evaluate the approaches to its implementation, it is obvious that it should be reliable and should be as simple as possible in scaling. It should also be possible to create reports from data in a flexible and efficient way.

I hope that some SOers have experience with such software and can make recommendations and / or point out pitfalls.

Ideally, I would like to deploy this on EC2.

+3
source share
4 answers

. .

...

, . , .

HTH

+4

@Simon , :

  • - , .
  • , / .
  • ETL db .
  • , , , 50 / 24x7x365 -
  • . Oracle MSSQL (, /).
  • /. - /.
+1

, Hadoop HDFS - , , SO - qa, .

, HDFS ( EC) (I.e. analytics) .

EC2 ( , ) , .

+1

.. .

. - , . Oracle Teradata.

-, /. , () .

,

  • : ( , ) , ,

  • Take the homegrown approach: build just what you need right now, and organically grow it all. Start with a simple database and create a web reporting structure. There are many open source software tools and low-cost agencies that do the job.

Regarding the EC2 approach. I am not sure how this fits into the data storage strategy. Handling is limited where EC2 is strong. Your main goal is effective storage and recovery.

0
source

Source: https://habr.com/ru/post/1698994/


All Articles