PostgreSQL and S3QL for storing / accessing large amounts of data

Question

PostgreSQL and S3QL for storing / accessing large amounts of data

We currently use Postgres 9 on Amazon EC2 and are very pleased with the performance. Now we are looking at adding ~ 2TB of data to Postgres, which is more than a small copy of EC2 can contain.

I found S3QL and am considering using it in conjunction with moving the Postgres data directory to the S3 repository. Has anyone had experience with this? I'm mostly interested in performance (frequent reads, less frequent writes). Any advice is appreciated, thanks.

+4

postgresql amazon-s3

Ian Dec 14 '11 at 18:39

source share

1 answer

Andrew · Accepted Answer · 2011-12-23T20:50:26+0000

My advice is: "Don't do this." I don’t know anything about the context of your problem, but I think that the solution does not have to include mass media processing through PostgreSQL. Entire lattice processing systems were invented to solve the problem of analyzing large data sets. I think you should consider creating a system that follows standard BI methods for extracting dimensional data. Then take this normalized data and, assuming it is still quite large, load it into Hadoop / Pig. Do your analysis and aggregation there. Reset the aggregate totals to a file and upload them to your PG database along the dimensions.

PostgreSQL and S3QL for storing / accessing large amounts of data

More articles: