Why use an auto-increment primary key when other unique fields exist?

I am taking a course called "database systems" and for our class project I need to create a website.

Here is an example of the table I created:

CREATE TABLE users ( uid INT NOT NULL AUTO_INCREMENT, username VARCHAR(60), passhash VARCHAR(255), email VARCHAR(60), rdate DATE, PRIMARY KEY(uid) ); 

The professor told me that the "uid" (user ID) was completely useless and unnecessary, and I had to use the username as the primary key, since no user could have the same username.

I told him that it was convenient for me to use the user ID, because when I call something like domain.com/viewuser?id=5, I just check the parameter with is_numeric($_GET['id']) . .. it goes without saying that he was not convinced.

Since I saw user_id and other similar attributes (thread_id, comment_id, among others) in a large number of tutorials and looked at the database schema of popular software (for example, vbulletin), there should be many other (more powerful) reasons.

So my question is: how would you justify the need for a non-null auto incrementing id as the vs primary key using another attribute such as username?

+46
sql database database-design data-modeling
Nov 05 '10 at 3:42
source share
12 answers

Auto-incrementing primary keys are useful for several reasons:

  • They allow duplicate usernames, for example, when stack overflows
  • They allow you to change the username (or email address if it is used to log in)
  • Selecting, concatenating, and inserting is faster than varchar primary keys, since it is much faster to maintain a numerical index
  • As you mentioned, validation becomes very simple: if ((int)$id > 0) { ... }
  • Sanitizing input is trivial: $id = (int)$_GET['id']
  • Much less overhead since foreign keys should not duplicate potentially large string values.

I would say that trying to use any piece of string information as a unique identifier for a record is a bad idea when an automatically incrementing numeric key is so easily accessible.

Systems with unique user names are great for a very small number of users, but the Internet has made them fundamentally disrupted. When you consider the huge number of people with the name "john" who can interact with the website, it is ridiculous to require each of them to use a unique display name. This leads to a terrible system that we see so often with random numbers and letters decorating the username.

However, even on a system where you use unique user names, this is still a poor choice for a primary key. Imagine a user with 500 posts: the foreign key in the posts table will contain the username duplicated 500 times. Overhead is prohibitive even before you think that someone may eventually need to change their username.

+79
Nov 05 2018-10-11T00:
source share

If the username is the primary key, and the user changes his username, you need to update all the tables that have links to foreign keys to the user table.

+15
Nov 05 '10 at 3:46
source share

If you demonstrated to your professor that assigning a unique arbitrary integer to each user is important for your application, then, of course, he is mistaken in stating that he is "absolutely useless and not needed."

However, you may have missed his point. If he told you that the requirement is that โ€œnone of the two users can have the same usernameโ€, then you have not fulfilled this requirement.

Sincere thanks for submitting your SQL DDL, this is very useful, but most do not worry about SO.

Using your table, I can do this:

 INSERT INTO users (username) VALUES (NULL); INSERT INTO users (username) VALUES (NULL); INSERT INTO users (username) VALUES (NULL); INSERT INTO users (username) VALUES (NULL); INSERT INTO users (username) VALUES (NULL); 

The result is the following:

 SELECT uid, username, passhash, email, rdate FROM users; uid username passhash email rdate 1 <NULL> <NULL> <NULL> <NULL> 2 <NULL> <NULL> <NULL> <NULL> 3 <NULL> <NULL> <NULL> <NULL> 4 <NULL> <NULL> <NULL> <NULL> 

I think this is what your professor tried to do: without observing the natural username key, you really have no data integrity.

If I were a professor, I would also highly recommend deleting nullable columns from your design.

+10
Nov 05 '10 at
source share

This is usually called a surrogate key , and it has many advantages. One of them isolates your relationship with the database from application data. More information and related flaws can be found in the wiki link above.

+7
Nov 05 2018-10-11T00:
source share

Because someone may want to change their username (or any other name).

+4
Nov 05 '10 at 3:46
source share

Your professor is doing the right thing by indicating that you should have made the username unique and not null if it is a requirement that the usernames be unique. Uid can also be a key, but if you are not actually using it, then it is not needed. A more important aspect of design should be the introduction of a natural key. Therefore, I agree with the comment of your professor.

+4
Nov 05 '10 at 7:59
source share

I need someone with a lot of database knowledge to support me on this, but I believe that you will get a faster response in foreign key search mode.

In addition, you can later decide if you want to change usernames or that the requirements for usernames may change (perhaps a longer string?). Using an identifier prohibits changing all foreign keys.

Let's face it, most projects are not going to expand so much, but do you really want to risk a headache 12 months after you can meet good programming standards now?

+1
Nov 05 '10 at 3:48
source share

For example, an integer search (? Id = 5) is much faster and has a higher power than a string search (? Username = bob). Another example: uid is auto_increment, so you donโ€™t need to insert it explicitly, but it will automatically increase in every insert request.

PS: Your prof is not so wrong: D

0
Nov 05 '10 at 3:48
source share

we use IDs to prevent data duplication, and this can make some transactions not complicated (if we want to update or delete data), it is simpler if we use IDs.

If you do not want to use IDs, you can use other fields. but do not forget to make them UNIQUE. it can make your data proactive from data duplication.

another way beyond UNIQUE BASE.

0
Nov 05 '10 at 3:50
source share

I go with all the answers above. I would say that an identifier is easy to implement, and when it comes to indexing, Int is always preferable to varchar. Your professor should know better, why would he say no to have Int id above me?

0
Nov 05 '10 at 3:51
source share

Since the user ID must be unique (it can not be duplicated), and sometimes an index.

0
Nov 05 '10 at 3:57
source share

And do you want to keep your usernames in clear text for someone who steals? I would never have thought of using a natural key, which I might want to encrypt someday (or do you want to encrypt now).

0
Nov 05 '10 at 16:05
source share



All Articles