SQL Server Int or BigInt Database Identifiers

I am writing a new program and it will require a database (SQL Server 2008). All that I run now for the system is 64-bit, which brings me to this issue. For all identifier columns in different tables, should I make them all INT or BIGINT? I doubt that the system will ever exceed the INT range, but this is possible in some of the large financial tables that I assume. INT seems to be standard though ...

+46
sql sql-server
Jan 23
source share
7 answers

Ok, do a math summary:

  • INT is 32-bit and gives you basically 4 billion values ​​- if you only count values ​​greater than zero, that's 2 billion more. Do you have a lot of employees? Customers? Products in stock? Orders for the whole life of your company? REALLY?

  • BIGINT goes a lot further. Do you really need this? really ?? If you are an astronomer or particle physics - maybe. Average business user identity? I doubt it very much

Imagine you have a table with - say - 10 million rows (orders for your company). Say you have an Orders table, and that the OrderID that you created BIGINT refers to 5 other tables and is used in 5 non-clustered indexes in your Orders table - don't overdo it, I think, right?

10 million rows, 5 tables each plus 5 non-clustered indexes, 100 million instances in which you use 8 bytes instead of 4 bytes - 400 million bytes = 400 MB. Total waste ... you need more data and index pages, your SQL Server will need to read more pages from disk and cache more pages ... which is not beneficial for your performance - simple and simple.

PLUS: that most programmers don’t think: yes, disk space is dirty cheap. But this wasted space is also relevant in your SQL Server RAM and database cache - and this space is not cheap!

So, to make a very long record short: use the smallest type of INT that really suits your needs; if you have 10-20 different values ​​for processing, use TINYINT. If you need an order table, I believe INT should be PLENTY ENOUGH . BIGINT is just a waste of space.

Plus: if any of your tables really comes close to reaching 2 or 4 billion rows, you will still have enough time to update the table to BIGINT ID, if it is really necessary .......

+99
Jan 23 '10 at 20:57
source share

You should use the smallest data type that makes sense for the table in question. This includes the use of smallint or even tinyint if the number of rows is sufficient.

You save space on data and indexes and get better index performance. Using bigint when all you need is smallint , similar to using varchar(4000) , when all you need is varchar(50) .

Even if the size of the machine’s original dictionary is 64 bits, it only means that 64-bit CPU operations will not be slower than 32-bit operations. Most of the time they will also not be faster, they will be the same. But in most cases, most databases will not be connected to the CPU, they will be connected to I / O and to a lesser extent related to memory, so the data size of 50% -90% is a very good thing when you need to execute the index scans more 200 million lines.

+14
Jan 23 '10 at 20:45
source share

Here is an article with some real answers to performance ... I prefer answering hard numbers questions if possible ... If you click the following link to at least a million entries, you will find a slight difference on disk usage ... .

http://www.sqlservercentral.com/articles/Performance+Tuning/2753/

Personally, I feel that using an appropriate identifier size is important, but also consider the fact that you might have a table that has a ton of activity over time. It's not that you store a huge amount of data, but that the key value has grown due to the nature of the automatic increase (deletes and inserts over time).

Consider a file repository on a community site or a user comment identifier on a multi-user community site.

I understand that most developers build systems that will never touch millions of records, but it’s important to note that there are reasons why bigint is required, and I’m still not convinced that when designing a scheme that you don’t know the potential growth for this, you should not try to foresee the future and think about using bigint if you feel that the potential exceeds the maximum int value as the id value increases.

+12
Oct. 21 2018-11-11T00:
source share

Aligning 32-bit numbers with x86 architecture or 64-bit x64 architecture is called data structure alignment

This doesn't make any difference to the data in the database, because there is disk space, data cache and table / index architecture that affect performance (as mentioned in other answers).

Remember that this is not a central processor accessing data as such. This is the database engine code (which can be aligned, but who cares?) That runs on the processor and manages your data. When / if your data passes through the CPU, it certainly will not be in the same structures on the disk.

+6
Jan 24 '10 at 9:55
source share

Other people have already given convincing answers for 32-bit identifiers.

For some applications, 64-bit identifiers make more sense.

If you want to guarantee the uniqueness of identifiers in a database cluster, 63-bit identifiers can be very convenient. With 32 bits, it is very difficult to propagate the generation of identifiers to servers in the cluster; or through data centers. Although with 64 bits you have enough space to play, you can easily generate identifiers on servers without blocking and still guarantee uniqueness.

For example, see Twitter's Snowflake and Instagram Engineering blog post under “Facing and Identifiers on Instagram.” Both are good reasons 63 or 64 bits make more sense for their identifiers than 32-bit counters.

+6
Aug 13 '13 at 2:34 on
source share

You must judge each table separately about what type of data will fit the needs of each. Use INTEGER to meet the needs of a particular table. If SMALLINT is sufficient, use this. Use a data type that will last without excessive.

+4
Jan 23 '10 at 20:47
source share

The first answer is a naive answer for those who do not work with TB databases or tables with persistent and large volumes. In any database of a suitable size, you will encounter problems with INT at a certain stage in your life. Use BIGINT if you need to, as this will save you a lot of trouble down the line. I saw how companies got into the INT problem after just one year of data, and when supersaturation was not an option, it caused massive downtime. Also, in systems with a long service life (10 years +), where the system should not have been used, it even got into moderate-sized databases that clear old data. It is much better to use a GUID in most cases when large amounts of data are expected, but a ban on using BIGINT if required.

+2
Jan 05 '17 at 11:42 on
source share



All Articles