How SQL Master Key Works

I am currently using the GUID as a NONCLUSTERED PRIMARY KEY next to the INT IDENTITY column.

GUIDs must allow autonomous data creation and synchronization - this is how the entire database is populated.

I know how to use a GUID as a clustered primary key, hence an integer clustered index, but uses a GUID as a primary key, and therefore foreign keys in other tables have significant performance implications?

Would it be better to use an integer primary / foreign key and use a GUID as the client identifier that has a UNIQUE INDEX for each table? β€œMy concern is that the entity infrastructure will require loading the navigation properties in order to get the GUID of the associated object without substantially changing the existing code.”

The database / hardware uses SQL Azure.

+6
source share
3 answers

Generally speaking, it is preferable to use INT for Primary Key / Foreign Key fields, regardless of whether these fields are the leading field in clustered indexes. The problem is with JOIN performance, and even if you use UNIQUEINDENTIFIER as NonClustered or even if you used NEWSEQUENTIALID () to reduce fragmentation, it will become more scalable for JOIN between INT fields as tables grow. (Please note that I am not saying that PK / FK fields should always be INT, since quite accessible natural keys are sometimes used).

In your case, considering taking care of the Entity Framework and generating the GUID in the application, and not in the database, go to an alternative suggestion to use INT as PK / FK fields , but than to have UNIQUEIDENTIFIER in all tables, just put it in the main user data table / customer. I would think that you should be able to do a one-time search for the INT client identifier based on the GUID, cache this value, and then use the INT value for all remaining operations. And yes, make sure the UNIQUE, NONCLUSTERED index is in the GUID field.

All of the above, if your tables will never (and I mean NEVER, and not only in the first 2 years) grow beyond maybe 100,000 rows each, then using UNIQUEIDENTIFIER is less dangerous, because small amounts of rows usually execute ok (subject to moderately decent equipment that is not overloaded with other processes or with low memory capacity). Obviously, the point at which JOIN performance is degrading due to the use of UNIQUEIDENTIFIER will largely depend on the features of the system: the hardware, as well as what types of requests, how the requests are written and how much load the system has.

+1
source

You can also create foreign keys with unique key constraints, which then give you the option of using a foreign key for an ID as an alternative to management.

i.e.

 Create Table SomeTable ( UUID UNIQUEIDENTIFIER NOT NULL, ID INT IDENTITY(1,1) NOT NULL, CONSTRAINT PK PRIMARY KEY NONCLUSTERED (UUID), CONSTRAINT UQ UNIQUE (ID) ) GO Create Table AnotherTable ( SomeTableID INT, FOREIGN KEY (SomeTableID) REFERENCES SomeTable(ID) ) GO 

Edit

Assuming your centralized database is Mart and that only batch ETLs are executed from the source databases if you use your ETL directly in the central database (i.e. not through the Entity Framework ), given that all your tables have FUUUUs after overpopulation from distributed databases, you need to either map INT UKCs during ETL, or fix them after import (which will require a temporary NOCHECK restriction on INT FK).

After loading the ETL and matching the INT keys, I suggest that you ignore / remove the UUIDs from your ORM model - you will need to restore your EF navigation on the INT keys.

When updating a central database directly or with a permanent ETL and using EF for the ETL itself, a different solution will be required. In this case, it may be the least I / O that simply leaves the PK GUID as the FK for RI, completely removes the INT FK, and select other suitable columns for clustering (minimize page reading).

+4
source

The GUID has important implications, yes. Your index is nonclustered, but the index itself will be quickly fragmented, and indexes on foreign keys will also be. Size is also a concern: 16 bytes instead of the integer 4 bytes.

You can use the NEWSEQUENTIALID() function as the default value for your column to make it less random and reduce fragmentation.

But yes, I would say that the best solution would be to use the whole as the primary key and links.

+1
source

Source: https://habr.com/ru/post/959048/


All Articles