The question is how foreign key data is stored in SQL

I know that this is ultra-basic, but this is an assumption that I have always adhered to and would like to confirm its truth (in general, with details specific to various implementations)

Let's say I have a table with a text column called Fruits. Only one of four meanings appears in this column: pear, apple, banana and strawberry. I have a million lines.

Instead of repeating this data (on average) a quarter of a million times each, if I extract it to another table that has the Fruit column and only these four rows, and then make the original column a foreign key, does it save space?

I assume that the four fruit names are stored only once and that millions of rows now have pointers or indexes or some link to the second table.

If my string values ​​are longer than the short fruit names, I assume the savings / optimizations are even greater.

+6
source share
7 answers

The field data types on both sides of the foreign key relationship must be identical.

If the key field of the parent table is (say) varchar(20) , then the foreign key fields in the dependent table must also be varchar(20) . That means yes, you should have had X million rows of "Apple" and "Pear" and "Banana" repeating in each table with a foreign key pointing to the fruit table.

As a rule, it is more efficient to use numeric fields as keys (int, bigint), since they can have comparisons with a very small number of CPU instructions (as a rule, it is possible to compare one command with one processor). Strings, on the other hand, require loops and relatively expensive tunings. So yes, you would be better off storing the fruit names in a table somewhere and using their associated numeric ID fields as a foreign key.

Of course you should compare both settings. These are just general rules, and your specific requirements / settings may work faster with the version of strings.

+4
source

It is right.

You must have

 table fruits id name 1 Pear 2 Apple 3 Banana 4 Strawberry 

Where ID is the primary key. In your second table, you will only use the identifier of this table. This will save your physical space and make your operators of choice faster.
In addition, this structure will make it easier for you to add new fruits.

+5
source

Instead of repeating this data (on average) a quarter of a million times each, if I extract it to another table that has the Fruit column and just these four rows, and then make the original column a foreign key, does that save space?

No, if Fruits are the PRIMARY KEY of the lookup table, so it should also be a FOREIGN KEY in the big table.

However, if you make a small surrogate PRIMARY KEY (for example, an integer id) in the lookup table, and then use this as a FOREIGN KEY in the big table, you will save space.

+2
source

Normalization is not just space, but often redundancy and modeling of data behavior, as well as updating only one row to change - and reducing the amount of locks by updating only a minimal amount of data.

+2
source

First, yes, this will save space, because int is 4 bytes, TINYINT is 1 byte. Secondly, searching this field with TYPE INT will be faster than VARCHAR. In addition to this, you can use ENUM if your data does not change in the future. With an enumeration, you get the same, perhaps faster than with an extra table, and you avoid an extra join.

+1
source

Unfortunately, you are mistaken: the values ​​are physically stored repeatedly for each link table. Some SQL products store the value only once, but most of them are not, especially the more popular ones, which are based on continuous storage on disk.

It is for this reason that end users feel the need to implement their own points in the form of using whole "surrogate keys." A surrogate system would be preferred, for example. It will not be visible to users, just like the "values" of the index are supported by the system and cannot directly manipulate users. The problem with translating your own is that they become part of the logical model.

+1
source

I understand that you really do not want to use foreign keys. Aaah, Marc B just posted the implications for FK. But using the second table as an external "name provider" would save space. You will need an extra pointer to fruit.fruit_id. This one will be quite small, and it will be NUMERICAL. Faster than char or varchar indexes.

0
source

Source: https://habr.com/ru/post/895805/


All Articles