SQL primary keys

So, the employee and I are in a debate about which way is better for generating primary keys, which are GUIDs.

We use .NET 4.0 with Entities 4 and use stored procs to select / insert / update.

He wants to create a primary GUID in the code and pass it as part of the insert using the Guid class and / or using some generated sequential GUID class.

I want the GUID to be created by SQL Server on insert using either newid () or newsequentialid ().

My argument against his path is that if you need to make multiple inserts, you need to make a circuit to get a guide for each insert so that you maintain this relationship for foreign key constraints. In addition, using this method, you must make several calls to each tab.

His argument for using SQL is that he does not have access to the key before the insert happens, and he will have to wait until the insert appears to return the primary key to other parts of the code. This way you can make one connection to the saved proc and handle all the inserts.

So which method is better for single inserts? Which method is best for multiple transaction attachments?

+4
source share
2 answers

GUIDs may seem like a natural choice for your primary key - and if you really should, you can probably bet to use it for the PRIMARY KEY of the table. What I highly recommended not to use, uses the GUID column as a cluster key , which SQL Server does by default, unless you specify it wrong.

You really need to leave two problems:

1) the primary key is a logical construction - one of the candidate keys that uniquely and reliably identifies each row in your table. It can be anything, in fact - INT, GUID, string - select what matters most to your script.

2) the clustering key (the column or columns that define the "clustered index" in the table) is a physical storage, and here is a small, stable, ever-data type execution - your best choice is INT or BIGINT as the default option.

By default, the primary key in the SQL Server table is also used as the clustering key, but this is not necessary! I personally saw a significant performance increase when the previous main / cluster key based on the GUID decayed into two separate keys - the main (logical) key in the GUID and the clustering (sequencing) key on a separate INT IDENTITY (1, 1).

As Kimberly Tripp - The Queen of Indexing - and others have stated many times - the GUID, because the clustering key is not optimal, because of its randomness, this will lead to massive fragmentation of pages and indexes and, as a rule, to poor performance.

Yes, I know - there is newsequentialid() in SQL Server 2005 and above - but even this is not truly and completely sequential and therefore also suffers from the same problems as the GUID - this is a little less noticeable. If you insist on a GUID, then at least use the newsequentialid() method on the server!

Then another problem arises: the clustering key in the table will be added to each record and for each non-clustered index in your table, so you really want to make sure that it is as small as possible, As a rule, an INT with 2+ billion rows should be enough for the vast most tables - and compared to the GUID as a clustering key, you can save hundreds of megabytes of memory on disk and in server memory.

Quick calculation - using INT vs. GUID as the primary and clustered key:

  • Base table with 1'000'000 rows (3.8 MB vs 15.26 MB)
  • 6 non-clustered indexes (22.89 MB vs 91.55 MB).

TOTAL: 25 MB versus 106 MB - and this is only on one table!

Some more food for thought - great stuff from Kimberly Tripp - read it, read it again, digest it! This is truly SQL Server Gospel Indexing.

Mark

+12
source

When I have questions like this, I tell myself: "SQL Server is good at sets, so it lets it do what is good," and sometimes "1 is just a specific case of N".

Which method is best for one insert?

A single insertion time will be the same for any of your approaches for synchronous sql invocation. However, "his" approach will give you more trouble finding time in the line, because its sequential guid method will not be as good as sql servers (and you are likely to lose global uniqueness). It will also split your code base when you inevitably have to do a few inserts.

Which method is best for multiple inserts in a transaction?

If you approve an insert based on a set (insert / select) vs single-line insert (insert), a set based on winning on multiple inserts, because the trip back to the client is an expensive part.

If it were me, I would create an SP that takes a serialized collection of objects to insert, does the insert / select with an output clause check "Example B. Using OUTPUT with identifiers and computed columns" on this page , let sql server create GUID (if you are stuck on it) and return to the client or run the next statement in SP to insert child rows based on the output table into which the insert was generated.

+1
source

Source: https://habr.com/ru/post/1347125/


All Articles