Web application user table primary key: surrogate key vs username vs email vs customer Id

I am trying to create an e-commerce web application in MySQL and I am having trouble choosing the right primary keys for the user table. The above example is an example to illustrate.

enter image description here

user table has the following definition

CREATE TABLE IF NOT EXISTS `mydb`.`user` ( `id` INT NOT NULL , `username` VARCHAR(25) NOT NULL , `email` VARCHAR(25) NOT NULL , `external_customer_id` INT NOT NULL , `subscription_end_date` DATETIME NULL , `column_1` VARCHAR(45) NULL , `column_2` VARCHAR(45) NULL , `colum_3` VARCHAR(45) NULL , PRIMARY KEY (`id`) , UNIQUE INDEX `username_UNIQUE` (`username` ASC) , UNIQUE INDEX `email_UNIQUE` (`email` ASC) , UNIQUE INDEX `customer_id_UNIQUE` (`external_customer_id` ASC) ) ENGINE = InnoDB 

I am having the following problems with primary key candidate columns:

Column Id

Arguments

  • No business value (stable primary key)
  • faster table join
  • index

against

  • not a "natural" key
  • All attribute tables must be combined with the "main" user table, so impossible direct queries cannot be
  • causes fewer "natural" SQL queries
  • Information about leaks: the user can determine the number of registered users if the initial value is 0 (changing the initial value sorts this) ii) The user registers the profile as user_A at time_X, and after a while, when user_B at time_Y it will be easy to calculate the number of registered users for the period time ((Id for user_B) - (Id for user_A) / (time_Y - time_X))

email column

Arguments

  • None

against

  • the user must be able to change the email address. Not suitable for primary key.

username column

Arguments

  • "natural" primary key
  • Less table join
  • simpler and more "natural" queries

against

  • Varchar column is slower when joining tables
  • index in varchar column is less compact than int index index
  • It’s very difficult to change the username because foreign keys are value-dependent. Solution: "Synchronization" of all foreign keys in the application or does not allow the user to change the username, for example. user must delete profile registered new

external_customer column

pros

  • can be used as an external link for the client and does not contain any information (maybe a non-editable username can be used?)

    cons

  • may leak information if it is automatic incremental (if possible)

  • It is problematic to generate an unqiue value if an automatically incremental surrogate identifier is already in use, since the MySQL innodb engine does not contain multiple auto_increment columns in a single table.

What is common practice when choosing user table primary keys for a scalable e-commerce web application? all reviews are rated

+6
source share
3 answers

I have nothing to say about your analysis. If I cut some of your pros or cons, it only means that I don’t think I have anything useful to add.

Column Id

Arguments

  • No business value (stable primary key)
  • faster table join
  • index

First, any column or set of columns declared NOT NULL UNIQUE has all the properties of the primary key. You can use any of them as a target for referencing a foreign key, which is all of this.

In your case, your structure allows you to use 4 columns for links to foreign keys: id, username, email and external_customer_id. You should not use the same all the time. It might make sense to use id for 90% of your FK links and email 10% of them.

Stability has nothing to do with whether a column matters to a business. Stability is related to how often and under what circumstances the value can change. Stable does not mean immutable if you are not using Oracle. (Oracle cannot do ON UPDATE CASCADE.)

Depending on the structure of your table and indexing, a natural key may work faster. Natural keys make some combinations unnecessary. I did tests before building our production database. It is likely that decades will come before we reach the point that combines identification numbers, will outnumber fewer associations and natural keys. I wrote about these tests on either SO or DBA.

You have three other unique indexes. (Good for you. I think that at least 90% of the people who build the database do not get this right.) Thus, not only the ID index is more compact than any of the three; it is also an optional index. (In this table.)

email column

Arguments

  • None

Email address can be considered stable and unique. You cannot prohibit users from distributing email addresses, regardless of whether this is the purpose of linking to a foreign key.

But email addresses may be "lost." In the USA, most university students lose their * .edu email addresses after a year or so. If your email address arrives through the domain for which you pay, and you stop paying, the email address goes away. I guess it is possible that an email address, such as those that will be provided to new users. What creates an unbearable load depends on the application.

against

  • the user must be able to change the email address. Not suitable for primary key.

All values ​​in the SQL database are subject to change. This is not suitable if your environment does not allow your dbms to honestly execute the ON UPDATE CASCADE declaration in a timely manner. My environment does. (But I'm running PostgreSQL on decent, unshared hardware.) YMMV.

username column

Arguments

  • "natural" primary key
  • Less table join
  • simpler and more "natural" queries

Fewer connections are important. I was at consulting concerts, where I saw the meaningless use of ID numbers, forcing people to write queries with 40+ associations. The wise use of natural keys eliminated up to 75% of them.

It is not important to always use surrogate keys as the target for your foreign keys (except Oracle) or always use natural keys as the target. It is important to think.

against

  • Varchar column is slower when joining tables
  • index in varchar column is less compact than int index index

You cannot say that joining varchar () is slower without responding to this claim. The fact is that although most connections to varchar () are slower than joining identifier numbers, they are not necessarily so slow that you cannot use them. If the request takes 4 ms with id numbers and 6ms with varchar (), I don't think this is a good reason to disqualify varchar (). In addition, using a natural key will eliminate many connections, so the overall system response may be faster. (All things being equal, 40 unions 4 ms will be worse than 10 6 ms.)

I can’t recall the case in my career database (25+ years), where the width of the index was a decisive factor when choosing a target for a foreign key.

external_customer column

pros

  • can be used as an external link for the client and does not contain any information (maybe an editable username can be used instead?)

In fact, there are several systems that allow me to change my username. Most will allow me to change my real name (I think), but not my username. I think an unused username is perfectly reasonable.

+10
source

In general, web applications try to save their database schema from the client, including primary keys. I think that you combine the design of the scheme with authentication methods - it doesn’t stop you from allowing users to log in with their email address, even if your database project uses an integer to uniquely identify them.

Whenever I developed such systems, I used an identifier column - either an integer or a GUID for the primary key. This is fast, does not change due to annoying real life situations and is a familiar idea for developers.

Then I developed the best authentication scheme for the application in hand - most people expect to log in with their email address these days, so I will stick with this. Of course, you can also let them log in with their Facebook, Twitter, or Google accounts. It has nothing to do with my main key, though ...

+4
source

I think with the username column you also have cons:

  • The user should be able to change the username. Not suitable for primary key.

Therefore, for the same reason that you will not use the letter, I will not use the username. For me, the best approach is an internal user id.

0
source

Source: https://habr.com/ru/post/912184/


All Articles