Choosing the best primary key + numbering system

We are trying to come up with a numbering system for the asset system that we are creating, there were several heated discussions on this topic in the office, so I decided to ask SO experts.

Given the database design below, that would be a better option.

alt text

Example 1: Using automatic surrogate keys.

================= ================== Road_Number(PK) Segment_Number(PK) ================= ================== 1 1 

Example 2: Using the generated PK program

 ================= ================== Road_Number(PK) Segment_Number(PK) ================= ================== "RD00000001WCK" "00000001.1" 

( 00000001.1 means the first segment of the road, which increases each time you add a new segment, for example 00000001.2 )

Example 3: Using bits of both (adding a new column)

 ======================= ========================== ID(PK) Road_Number(UK) ID(PK) Segment_Number(UK) ======================= ========================== 1 "RD00000001WCK" 1 "00000001.1" 

With a little background information, we will use the Road Number and Segment Number in reports and other documents, so they must be unique .

I always liked to keep simple things, so I prefer Example 1, but I read that you should not disclose your primary keys in reports / documents. So now I think more about the lines of Example 3.

I also tend to example 3, because if we decide to change the way we create the right amount of assets, he will not need to do cascading updates of the primary key.

What do you think we should do?

Thank.

EDIT: Thanks everyone for the great answers, helped me a lot.

+22
database database-design primary-key
Apr 01 '09 at 22:55
source share
13 answers

This is really a discussion about a surrogate (also called technical or synthetic) versus natural primary keys, a subject that has been widely covered. I examined this in database design errors made by AppDevelopers .

Natural keys are keys based on externally significant data that are (supposedly) unique. Common examples are product codes, two-letter state codes (USA), social security numbers, and so on. Surrogate or technical Primary keys are those that have absolutely no meaning outside the system. They were invented solely for object identification and usually automatically increasing fields (SQL Server, MySQL, others) or sequences (primarily Oracle).

In my opinion, you should always use surrogate keys. This question come up with the following questions:

  • How do you like your primary keys?
  • What is the best practice for primary keys in tables?
  • What primary key format would you use in this situation.
  • Surrogate Vs. Natural / business keys
  • Should I have a primary key field highlighted?

Auto number fields are the way to go. If your keys matter outside of your database (for example, asset numbers), this is quite possible to change, and changing the keys will be problematic. Just use indexes for these things in their respective tables.

+62
Apr 05 '09 at 1:51
source share

I would say keep it simple and stay with the auto-increment primary key. If you need something more “readable” in terms of display in the program, then maybe one of your other ideas, but I think that this is just adding unnecessary complexity to the primary key field.

+7
Apr 01 '09 at 22:57
source share

I also have a very strong attitude towards the "Do not use primary keys as meaningful data" camp. Every time I violated this policy, it ended in tears. Sooner or later, meaningful data needs to change, and if that means you need to change the primary key, it can become painful. The primary key is likely to be used in foreign key constraints, and you can spend years trying to figure it out, just for simple data changes.

I always use the GUID / UUID for my primary keys in every table I have ever created, but only personal preference series or such are also good.

+7
Apr 01 '09 at 23:31
source share

Do not put the value in your PK fields unless ...

  • 100% completely impossible that the value will never change and that

  • No one will ever reasonably argue about what value should be used for a particular string.

Go with option 1 and format the value in the application so that it looks like option two or three when it is displayed.

+4
Apr 02 '09 at 0:10
source share

I think it is important to remember that each table in your database / design can have multiple keys. These are the candidate keys . See Wikipedia entry for Candidate Keys

By definition, all Candidate Keys are created equal. Each of them is a unique identifier for this table.

Then your task is to select the best candidate from the candidate Key pool to serve the Primary Key . The primary key will be used by other tables to establish relational constraints, but you can continue to use the Candidate Keys to query the table.

Since primary keys refer to other structures and therefore are used in join operations, the criteria for choosing a primary key are as follows for me (in order of importance):

  • Immutable / Stable . The primary key values ​​must not change. If so, you run the risk of introducing update anomalies.
  • Not null Most DBMS platforms require that primary key attributes are not null.
  • Simple - Simple data types and values ​​for physical storage and performance. Integer values ​​work well here, and this is the data type for most surrogate / auto-generator keys.

Once you have identified Applicant Keys, the above criteria can be used to select the Primary Key. If there is no “Natural” Key for the candidate that meets the criteria, then a Surrogate Key that meets the criteria can be created and used, as indicated in other answers.

+3
Apr 10 '09 at 2:55 april
source share

Follow the Do Not Use policy.

Some problems you may encounter:

You need to generate keys from several hosts.

Someone wants to reserve connecting rooms for sharing.

How significant will the people of this be? Wars are fighting over it, and you are already in the first battle. "It already makes sense, and if we just add two more numbers, we can ..." i.e. You set a design style that will (should) expand.

If you combine the two, you do type techniques that can ruin your query optimizer.

You will need to reclassify the roads and redefine their boundaries (i.e. move the roads), which implies a change in the primary key and, possibly, loss of links.

There are workarounds for all of this, but this is a problem where workarounds grow and get out of hand. And it won’t take a couple more to go beyond Simple.

+1
Apr 01 '09 at 23:09
source share

As mentioned earlier, save your internal primary keys as just keys, whatever the optimal data type is on your platform.

However, you need to resolve the numbering system argument, as this is actually a business requirement, and perhaps let it identify the system for the asset.

If there is only one identifier, add it as a column to the main table. If there are likely to be many identification systems (and assets usually have many), you will need two more tables

     Identifier-type table Identifier-cross-ref table
       type-id ------------> type-id (unique
       type-name identifier-string key)
                                         internal-id


Thus, different people who need to access the asset can identify themselves in their own way. For example, a server command will identify the server differently from a network command and again will be different from project management, accounts, etc.

In addition, you can go to all meetings in which everyone argues with each other.

+1
Apr 05 '09 at 10:50
source share

Another thing to keep in mind is that if you import a lot of data into this system, you may find that things like Road_Number are not as unique as you thought, and there may be effective security checkpoints to fix the problem (repainting road signs, etc.).

0
Apr 01 '09 at 23:28
source share

Although natural keys can make a big difference to business users, unless you have an agreement that these keys are sacred and should not be changed, you will most likely be pulling your hair out while maintaining a database where “product codes need to be changed to host a new product line acquired by the company. " You need to protect the RI of your data, and integers as primary keys with auto-increment is the best way. Performance is also better at indexing and moving integers than char columns.

Although they are not suitable as primary keys, natural keys are very suitable for user consumption, and you can apply uniques through an index. They bring context to the data that will facilitate understanding by all parties. Also, when you need to reload data, natural keys can help verify that your searches are still valid.

0
Apr 10 '09 at 10:16
source share

I would go with a surrogate key, but you might want to have a computed column that "formats" the surrogate key to a more "readable" value if this improves your reporting. The calculated colony could give example 2 from a surrogate key, for example, for display purposes.

I think that the path of the surrogate key is the path, and the only exceptions I make for it are join tables, where the primary key can consist of references to foreign keys. Even in these cases, I found that the primary surrogate key is more useful than not.

0
Apr 10 '09 at 13:42
source share

I suspect that you really need to use option number 3, as many of them have already said. Surrogate PCs (integers or GUIDs) are good practice, even if there are sufficient business keys. Surrogates will reduce maintenance headaches (as you yourself noted).

Talking about this, you might think about whether your database is:

  • focused on data maintenance and transaction processing (i.e. create / update / delete operations)
  • analysis and reporting oriented (i.e. queries)

In other words, are users interested in maintaining active data or finding mostly static data to find answers?

If you are heavily focused on creating an analysis and reporting database (e.g., data / file repositories) that are impacted by technical business users (e.g. report designers) who have a good understanding of business vocabulary, then you might want to consider using natural keys based on significant business values. They help reduce query complexity by eliminating the need for complex joins and helping the user focus on their task, rather than struggling with the database structure.

Otherwise, you are probably focused on the full CRUD DB, which should to some extent cover all the bases - this is the vast majority of situations. In this case, go to option number 3. You can always optimize for queries in the future, but it will be difficult for you to modify for ease of maintenance.

0
Apr 10 '09 at 17:36
source share

I hope you agree with me that each design element should have one goal.

Question: what do you think is the purpose of the PC? If he identifies a unique entry in the table, then the surrogate keys win without any problems. It is simple and straightforward.

As for the new columns in option 3, you should check if they can be calculated (it would be best to perform calculations at the model level so that they can be easily changed than if they were performed in the DBMS), not too many from other elements . For example, you can save the segment number and road number in the corresponding tables, and then use them to generate “00000001.1”. This will allow you to quickly change the numbering of assets.

0
Apr 11 '09 at 21:08
source share

First, option 2 is the absolute worst option. As an index, this is a string , and it slows down. And it is generated based on business rules, which can change and cause a rather big headache.

Personally, I always use a separate primary key column; and I always use the GUID. Some developers prefer a simple INT over a GUID due to a lack of hard drive. However, if a situation arises when you need to combine two databases, GUIDs almost never collide (while INTs can collide).

Primary keys should look NEVER by the user. Making it understandable to the user should not be a concern. Primary keys MUST be used to communicate with foreign keys. That is their goal. The value must be machine readable and should never change after its creation.

0
Apr 12 '09 at 0:59
source share



All Articles