Database solutions for 200 million letters / day, monthly summarization requests

I am looking for help determining which database system to use. (I’ve been searching the Internet and reading the last few hours, now it seems worth asking first-hand for help.)

I need to write about 200 million rows (or more) in an 8-hour business day to the database, and then execute the weekly / monthly / annual summary queries on this data. Consolidated queries consist of collecting data for things like billing, for example. "How many type A transactions have each user completed this month?" (it may be more difficult, but what a general idea).

I can distribute the database to several machines if necessary, but I don’t think I can disable the old data. I will definitely need to request data for a month, maybe a year. These requests would be for my own use, and they would not need to be generated in real time for the end user (they could work overnight if necessary).

Does anyone have any suggestions as to which databases would work well?

PS Cassandra looks like she will not have problems processing letters, but what about a huge monthly table scan? Anyone familiar with the performance of Cassandra / Hadoop MapReduce?

+3
source share
3

Cassandra + Hadoop . 200M/8h - 7000/, Cassandra node, , / ( Pig).

+1

( crawlling -) .

. , SAN. , "" , .

( )

, . , , . " " ( ) .

, 2 4 Windows 2003 SQL Server 2005 1GB IIS WebServer , 20 10 ( - RAID 5 SAN). 160 , 40 .

+2

Greenplum or Teradata would be a good option. These databases are MPP and can process data on the Pet scale. Greenplum is a distributed PostgreSQL database and also has its own mapreduce. Although Hadoop may solve your storage problem, it would be impractical to perform summary queries on your data.

+1
source

Source: https://habr.com/ru/post/1743574/


All Articles