PostgreSQL: exit identifiers in whole columns

Problem

We are building a web application in Java on top of PostgreSQL. He is quite large and successful, and he should be able to run for a few more years.

Unfortunately, we (well, I) made a serious mistake in the early stages of the design process: all database identifiers are integers passed from one common sequence .

Java max int is 2 ^ 31-1 , so about 2 billion. The same goes for the PostgreSQL integer type . Currently, the system consumes ~ 10 thousand IDs every day, and the speed increases when we get new users.

One day the identifiers will be exhausted and full.

Question

We are looking for ways to rectify the situation. Let me get an obvious opportunity right away: switching to Java long and Postgres' bigint is a clean solution, but it's a ton of work. We need to postpone it as much as possible.

Some ideas we have had so far:

  • Do not use one sequence for everything; give each table its own sequence.
    • Pros: this gives us up to N times more time, where N is the number of tables.
    • Cons: we like that each line has a unique identifier.
  • Stop using sequence identifiers for some tables. For example, a table with client events does not really need an identifier: customer, timestamp is a perfectly valid primary key.
    • Pros: Some of our biggest ID-hogs can be changed this way.
    • Cons: non-trivial amount of work.
  • Stop wasting identifiers on empty records. This happens with some sub-tables, such as customer contact information. Having a record always makes the code simpler, but this means that many customers have an empty contact information record.
    • Pros: Some of our biggest ID-hogs can be fixed this way.
    • Cons: we lose the simplicity of the code.
  • Each new table should use long / bigint with a new sequence.
    • Pros: at least we don't make it worse.
    • Cons: Contact surfaces with the rest of the code will be ugly.

Under these restrictions, what other approaches will delay identifier depletion?

+5
source share
2 answers

Switching to a long is far from a clean solution. There is only one reasonable option if you are too large: UUID ( yes, PostgreSQL comes with the uuid data type ).

128 bits have a size of 4 integers, but you don’t want to go through the entire application in a few years, and all this again, right? UUIDs will work when you get too big and you need to outline your data. After that, you cannot have a common sequence, so UUIDs make sense.

As a bonus, you can even save your unique property on each line.


Migration is not so complicated: adding a column with NULL in PostgreSQL is cheap, so you can add a column first and then perform online migration in batches where you update several thousand records at a time, t there is downtime.

Then you can test the same code with both foreign keys. Does Java have something similar to a laboratory or scientist ?

Will it be a ton of work? Yes, but this is obviously a good sign if you have an application that is so popular.

I also hope that you learned the lesson using the same sequence for all tables. Honestly, I do not see this as an added value. If you want to know where the object is located, you can also name primary keys in different ways (for example, room_id, reservation_id, etc.).

+2
source

Asking this question, I found a good way to fix half the problem - the database side. So, for future generations, here is a way to do this.

  • Find all database columns of type integer or integer[] . Check the results manually and delete columns of the type, for example text[] .

     SELECT * FROM information_schema.columns cls JOIN information_schema.tables tbl ON cls.table_name = tbl.table_name WHERE cls.table_schema = '<my schema>' AND cls.data_type = 'integer' OR cls.data_type = 'ARRAY' AND tbl.table_type = 'BASE TABLE'; 
  • Prepare a DDL data type for each of these columns:

     ALTER TABLE <one of the tables found> ALTER COLUMN <one of its integral columns> TYPE bigint; 
  • This works great except for VIEW s: they don't like me changing return types. I need to recreate all of them - the sequence will be

    • Drop all kinds.
    • Change column types.
    • Recreate all views.
  • Stop the application, start the script update from step 3, fix the slow queries by running VACUUM and ANALYZE in all tables.
  • Run the tests and fix the problems in the source code - for example, bigint[] cannot be added to integer[] .

For VIEW export / backup only?

0
source

Source: https://habr.com/ru/post/1243950/


All Articles