What improves database performance from storing as numbers rather than text?

Question

What improves database performance from storing as numbers rather than text?

Suppose I have text such as "Win", "Lose", "Uncomplete", "Forfeit", etc. I can directly save the text in the database. Instead, if you use numbers like 0 = Win, 1 = Lose, etc., will I get a significant improvement in database performance? In particular, in queries where the field is part of my WHERE clause

+4

performance sql database

deltanovember Feb 23 '11 at 23:18

source share

6 answers

If a column is used in an index, there can be a significant performance boost.

+2

drewrobb Feb 23 '11 at 23:31

source share

It can range from negligible to extremely useful depending on the size of the table, the number of possible values that are listed, and the database engine / configuration.

However, it almost certainly will never work worse to use a number to represent an enumerated type.

+2

patros Feb 23 '11 at 23:38

source share

Do not guess. To measure.

Performance depends on how selective the index is (how many different values are in it), whether critical information is available in a natural way, how long a natural key is, etc. You really need to check the data with representative data.

When I was developing a database for my employer's operational data warehouse, I built a stand with tables designed around natural keys and tables created around identifier numbers. Both of these schemes contain more than 13 million lines of sampled data generated by the computer. In some cases, requests by number number scheme exceeded the natural key scheme by 50%. (Thus, a complex query that took 20 seconds with identifier numbers took 30 seconds with natural keys.) But 80% of the test queries had better SELECT performance against the natural key scheme. And sometimes it was staggering faster - a difference of 30 to 1.

The reason, of course, is that for most queries in a schema with a natural key there are no connections at all - most often the required information is naturally transferred to the natural key. (I know this sounds weird, but it happens unexpectedly often. It is often application dependent.) But null joins will often be faster than three joins , even if you are joining integers.

+1

Mike Sherrill 'Cat Recall' Feb 24 '11 at 12:37

source share

Obviously, if your data structures are shorter, they compare faster and faster to store and retrieve.

How much faster 1, 2, 1000. It all depends on the size of the table, etc.

For example: let's say you have a table with a text column productId and varchar.

Each row will roughly take 4 bytes for int , and then another 3> 24 bytes for text in your example (depending on whether the column is null or is unicode)

Compare this to 5 bytes per row for the same data with the byte status column.

This huge space saving means that the page has more links, more data in the cache, fewer entries occur when loading the storage data, etc.

In addition, string comparisons are at best as fast as comparisons of bytes and worst cases are much slower.

There is a second huge problem with saving text in which you plan to have an enumeration. What happens when people start to store Incompete as opposed to Incomplete ?

0

Sam saffron Feb 23 '11 at 23:42

source share

having a skinner column means you can put more rows on the page.

this is a HUGE difference between varchar (20) and an integer.

0

Aaron kempf Feb 26 '11 at 5:49

source share

Blagovest buyukliev · Accepted Answer · 2011-02-23T23:28:44+0000

At the CPU level, comparing two integers of a fixed size takes only one command, while comparing variable-length strings usually involves a loop through each character. Therefore, for a very large data set, there must be a significant increase in performance when using integers.

In addition, a fixed-size integer usually takes up less space and may allow the database engine to execute faster algorithms based on random searches.

However, most database systems have an enum type, which is designed for cases like yours. In a query, you can compare the value of a field with a fixed set of literals, while it is internally stored as an integer.

What improves database performance from storing as numbers rather than text?

More articles: