Are null columns taking up extra space in PostgreSQL?

I have a table with 7 columns and 5 of them will be empty. I will have zero columns in types int , text , date , boolean and money . This table will contain millions of rows with many zeros. I am afraid that zero values ​​will occupy space.

Also, did you know that Postgres indexes null values? I would like to prevent it from indexing zeros.

+18
null indexing postgresql database-design
Aug 27 '12 at 16:23
source share
3 answers

Basically, NULL values ​​occupy 1 bit in a NULL raster file. But it is not so simple.

a zero bitmap (for each row) exists only if at least one column in this row contains a NULL value. This can lead to a paradox effect in tables with 9 or more columns: assigning the first NULL value to a column can take up more disk space than writing a value to it. Conversely, with the last non-zero column, the zero bitmap is discarded for the row.

Physically, the original null bitmap takes 1 byte between the HeapTupleHeader (23 bytes) and the actual column data or OID string (if you still have to use this), which always starts with a multiple of MAXALIGN (usually 8 bytes ). This leaves 1 fill byte, which is used by the original zero bitmap.

In fact, NULL storage is completely free for tables of 8 columns or less .
After that, for the next columns of MAXALIGN * 8 , MAXALIGN bytes are allocated (usually 8) (usually 64). Etc.

More details in the manual and on these related issues:

  • How much disk space is needed to store a NULL value using postgresql DB?
  • Doesn’t NULL in PostgreSQL still use a NULL bitmap in the header?
  • How many records can I store in 5 MB PostgreSQL on Heroku?

After you understand the alignment of data elements, you can further optimize storage:

  • Calculating and saving space in PostgreSQL

But cases are rare when you can save a significant amount of space. This is usually not worth the effort.

@Daniel already covers effects on index size.

+34
Aug 27 '12 at 18:04
source share
— -

NULL values ​​fall into the index or are independent of at least the type of index. Basically it will be YES for btree and gist index types NO for hash , and seems YES or NO for gin index types depending on the version of PostgreSQL.

The pg_catalog.pg_am table used the boolean column amindexnulls , which contained this information, but it was in 9.1. Probably because the indices have become even more difficult in improving PG.

In the specific case of your data, the best way would be to determine the difference in index sizes using the pg_relation_size('index_name') function between the contents, completely NULL and completely NOT NULL, with your exact PG version, exact data type, exact type and definition of the index. And be aware that perhaps a future change in any of these parameters may change the outcome.

But in any case, if you “just” want to avoid NULL indexing, you can always create a partial index:

 CREATE INDEX partial_idx(col) ON table WHERE (col is not null) 

It will take up less space, but whether it depends on whether it will respond or not depends on these requests.

+11
Aug 27 '12 at 17:16
source share

I believe that everyone would use a bit in the raster string for the string. See here: http://www.postgresql.org/docs/9.0/static/storage-page-layout.html#HEAPTUPLEHEADERDATA-TABLE

+2
Aug 27 '12 at 16:31
source share



All Articles