I made my Django models and after inserting a test / dummy record into my PostgreSQL database, I realized that my data is large enough for each record. The sum of data in all fields will be about 700 KB per record. I estimate that I will have about five million records, so this will become very large around the 3350 GB mark. Most of my data is large JSON dumps (about 70+ Kbytes for each field).
I'm not sure if PostgreSQL automatically compresses my data when processing through the Django framework. I was wondering if I should compress my data before entering it into the database.
Questions: Can PostgreSQL automatically compress my string fields using some compression algorithm xwhen using a field type like Django TextField?
Should I not rely on PostgreSQL and just compress my data in advance and then enter it into the database? If so, which compression library should I use? I already tried zlibin Python and it seems wonderful, but I read that there is a library gzip, and I am confused what would be the most efficient (in terms of compression and decompression speed, as well as compression percentage).
EDIT: I read this Django fragment for CompressedTextField , which caused my confusion regarding the compression library used. I saw how several people used zlib, and some used gzip.
EDIT 2: https://stackoverflow.com/a/4148441/216184 / ... says that PostgreSQL automatically performs string data compression.
EDIT 3: PostgreSQL uses pg_lzcompress.c for compression, which is part of the LZ compression family. Is it possible to assume that we do not need to use any other form of compression ( zlibor gzip) in itself TextField, since it will have a data type text(variable-length string) in the database itself?
source
share