What improves database performance from storing as numbers rather than text?

Suppose I have text such as "Win", "Lose", "Uncomplete", "Forfeit", etc. I can directly save the text in the database. Instead, if you use numbers like 0 = Win, 1 = Lose, etc., will I get a significant improvement in database performance? In particular, in queries where the field is part of my WHERE clause

+4
source share
6 answers

At the CPU level, comparing two integers of a fixed size takes only one command, while comparing variable-length strings usually involves a loop through each character. Therefore, for a very large data set, there must be a significant increase in performance when using integers.

In addition, a fixed-size integer usually takes up less space and may allow the database engine to execute faster algorithms based on random searches.

However, most database systems have an enum type, which is designed for cases like yours. In a query, you can compare the value of a field with a fixed set of literals, while it is internally stored as an integer.

+5
source

If a column is used in an index, there can be a significant performance boost.

+2
source

It can range from negligible to extremely useful depending on the size of the table, the number of possible values ​​that are listed, and the database engine / configuration.

However, it almost certainly will never work worse to use a number to represent an enumerated type.

+2
source

Do not guess. To measure.

Performance depends on how selective the index is (how many different values ​​are in it), whether critical information is available in a natural way, how long a natural key is, etc. You really need to check the data with representative data.

When I was developing a database for my employer's operational data warehouse, I built a stand with tables designed around natural keys and tables created around identifier numbers. Both of these schemes contain more than 13 million lines of sampled data generated by the computer. In some cases, requests by number number scheme exceeded the natural key scheme by 50%. (Thus, a complex query that took 20 seconds with identifier numbers took 30 seconds with natural keys.) But 80% of the test queries had better SELECT performance against the natural key scheme. And sometimes it was staggering faster - a difference of 30 to 1.

The reason, of course, is that for most queries in a schema with a natural key there are no connections at all - most often the required information is naturally transferred to the natural key. (I know this sounds weird, but it happens unexpectedly often. It is often application dependent.) But null joins will often be faster than three joins , even if you are joining integers.

+1
source

Obviously, if your data structures are shorter, they compare faster and faster to store and retrieve.

How much faster 1, 2, 1000. It all depends on the size of the table, etc.

For example: let's say you have a table with a text column productId and varchar.

Each row will roughly take 4 bytes for int , and then another 3> 24 bytes for text in your example (depending on whether the column is null or is unicode)

Compare this to 5 bytes per row for the same data with the byte status column.

This huge space saving means that the page has more links, more data in the cache, fewer entries occur when loading the storage data, etc.

In addition, string comparisons are at best as fast as comparisons of bytes and worst cases are much slower.

There is a second huge problem with saving text in which you plan to have an enumeration. What happens when people start to store Incompete as opposed to Incomplete ?

0
source

having a skinner column means you can put more rows on the page.

this is a HUGE difference between varchar (20) and an integer.

0
source

Source: https://habr.com/ru/post/1341055/


All Articles