Correct way to store different types of data in one column in postgres?

I am currently trying to modify an existing API that interacts with a postgres database. In short, it essentially stores descriptors / metadata to determine where the actual “asset” (usually a file) is stored on the server’s hard drive.

It is currently possible to "tag" these "assets" with any number of undefined key-value pairs (i.e., uploadedBy, addedOn, assetType, etc.). These tags are stored in a separate table with a structure similar to the following:

+---------------+----------------+-------------+ |assetid (text) | tagid(integer) | value(text) | |---------------+----------------+-------------| |someStringValue| 1234 | someValue | |---------------+----------------+-------------| |aDiffStringKey | 1235 | a username | |---------------+----------------+-------------| |aDiffStrKey | 1236 | Nov 5, 1605 | +---------------+----------------+-------------+ 

assetid and tagid are foreign keys from other tables. Think of the fact that assetid represents the file, and the tagid / value pair is a descriptor map.

Currently, the API (which is in Java) creates all these key-value pairs as a Map object. This includes points such as time / date stamps. We would like to somehow store data of different types for a value in a key-value pair. Or at least store it differently in the database, so that if we needed it, we could run queries that check date ranges, etc. On these tags. However, if they are stored as text elements in db, then we will need to.) Know that this is actually a date / time / time element, and b.) Convert to something that we could actually run such a request.

There is only one idea that I could think of so far without completely changing the db layout too small.

Expand the asset table (shown above) to have additional columns for different types (numeric, text, timestamps), allow them to be null, and then insert, check the corresponding "key" to find out what the data type really is. However, I see many problems with such an implementation.

Can any PostgreSQL ninja offer a suggestion on how to approach this problem? I just recently returned to the deep end of interacting with databases, so I admit I'm a little rusty.

+9
source share
4 answers

You basically have two options:

Option 1: Sparse Table

Have one column for each data type, but use only the column corresponding to the type of data you want to keep. Of course, this leads to the fact that most of the columns are zero - a waste of space, but purists love this because of the strong typing. It is a little inconvenient to check each column for zero to figure out which data type is being applied. Also, it’s too bad if you really want to keep zero - then you have to choose a specific value that "means" null "- more clunkiness.

Option 2: two columns - one for content, one for type

Everything can be expressed as text, so there is a text column for the value and another column (int or text) for the type, so your application code can restore the correct value in the correct type object. Good things - you can easily expand types to something other than SQL data types, and you don't have a lot of zeros.

+16
source

I'm by no means a newbie to PostgreSQL, but I think that instead of two columns (one for the name and one for the type), you could look at the hstore data type :

data type for storing sets of key / value pairs in one PostgreSQL value. This can be useful in various scenarios, such as rows with many attributes that are rarely checked, or semi-structured data. Keys and values ​​are just text strings.

Of course, you should check how to convert dates and timestamps to and from this type, and see if this is good for you.

+3
source

You can use two different techniques:

  • if you have a floating type for each tag

Define a table and an identifier for each combination of tagid-assetid and actual data:

 maintable: +---------------+----------------+-----------------+---------------+ |assetid (text) | tagid(integer) | tablename(text) | table_id(int) | |---------------+----------------+-----------------+---------------| |someStringValue| 1234 | tablebool | 123 | |---------------+----------------+-----------------+---------------| |aDiffStringKey | 1235 | tablefloat | 123 | |---------------+----------------+-----------------+---------------| |aDiffStrKey | 1236 | tablestring | 123 | +---------------+----------------+-----------------+---------------+ tablebool +-------------+-------------+ | id(integer) | value(bool) | |-------------+-------------| | 123 | False | +-------------+-------------+ tablefloat +-------------+--------------+ | id(integer) | value(float) | |-------------+--------------| | 123 | 12.345 | +-------------+--------------+ tablestring +-------------+---------------+ | id(integer) | value(string) | |-------------+---------------| | 123 | 'text' | +-------------+---------------+ 
  1. If each tag has a fixed type

create tag description table

 tag descriptors +---------------+----------------+-----------------+ |assetid (text) | tagid(integer) | tablename(text) | |---------------+----------------+-----------------| |someStringValue| 1234 | tablebool | |---------------+----------------+-----------------| |aDiffStringKey | 1235 | tablefloat | |---------------+----------------+-----------------| |aDiffStrKey | 1236 | tablestring | +---------------+----------------+-----------------+ 

and mapping data tables

 tablebool +-------------+----------------+-------------+ | id(integer) | tagid(integer) | value(bool) | |-------------+----------------+-------------| | 123 | 1234 | False | +-------------+----------------+-------------+ tablefloat +-------------+----------------+--------------+ | id(integer) | tagid(integer) | value(float) | |-------------+----------------+--------------| | 123 | 1235 | 12.345 | +-------------+----------------+--------------+ tablestring +-------------+----------------+---------------+ | id(integer) | tagid(integer) | value(string) | |-------------+----------------+---------------| | 123 | 1236 | 'text' | +-------------+----------------+---------------+ 

All this is just for a general idea. You must adapt it for your needs.

0
source

Another option, depending on what you are doing, might be to just have one column of value, but store multiple json values ​​around the value ...

It might look something like this:

  { "type": "datetime", "value": "2019-05-31 13:51:36" } 

It can even take a step forward using a Json or XML column .

0
source

Source: https://habr.com/ru/post/951723/


All Articles