SQL Table linking ... is it better to have a link table or dividing column?

My database has two tables: one contains a list of users and the other contains a list of roles. Each user will belong to one or several roles, and, of course, each role will have several users in it.

I have two ways to link information. The first is to add a third table containing the identifier from both tables. A simple join will then return all users belonging to the role, or all roles to which the user belongs. However, as the database grows, the datasets returned by these simple queries will grow exponentially.

The second method is to add a column to the users table, which stores a list of partitioned partitions. This will eliminate the need for a third link table, which can have a positive effect on database growth. The disadvantage is that SQL does not have the ability to use delimited lists. The only way I found processing this information is to use a temporary table and a custom function.

I look at the execution plans, the "table scan" event is the one that takes up the most resources. It makes sense that removing the table from the equation will speed up the process. Function takes less than 1% of resources.

These tests were conducted in a database with less than 20 entries. As the size of the database grows, scanning the table will take longer, so limiting them is probably the best choice.

Using a delimited list is a good way, why doesn't anyone do this?

Please tell me which method you prefer (even if it differs from my two) and why.

Thanks.

+4
source share
6 answers

If you have a delimited list, searching for users with a given role will become very expensive: effectively, you need to perform a FULL scan of this table and see all the values ​​for this column in each row, trying to see if it contains a given role.

A separate table (normalized, many-many relation) is the way to go, and with proper indexing you will not have a full scan.

eg:

User: UserId, Name, .... Role: RoleId, Name, .... UserRole: UserRoleId, UserId, RoleId 

(UserRoleId is optional, you can also use PK for UserId + RoleId, I will not discuss here surrogate or composite keys here)

You will need an index (UserId, RoleId), which is UNIQUE, to ensure there are no duplicates. It will also help in any queries where you are trying to determine if a particular user has a specific role (WHERE userId = x AND roleId = y)

If you are viewing all the roles that a user has, you only need an index for UserId.

Conversely, if you are looking for all users of a given role, an index on just roleId will speed it up. If you do not fulfill this request or do it very rarely, then without this index, the speed will increase slightly for insert / update, since this is the least. This is a cautious act of balancing, which is database tuning.

+10
source
  • A table scan means that you do not have indexes, or your query does not allow them to be used. In the security database, you rarely have to download the entire list of users / roles, unless it is for the administrator application. You need to address this in your design.

  • Limited lists break the first normal form (1NF) and almost always cause problems in the long run. What happens if you want to get all users in a specific role? How do you write this query? Do not follow this road. Normalize it.

  • If you use the correct column types (i.e. not varchar(4000) or varchar(max) everywhere), disk space really should not be a problem. Yes, it will grow "exponentially" - so what? Databases are good at this kind of scaling. If you are not trying to run this on a 10 gigabyte hard drive, you have nothing to worry about. And if you try to run it on a 10 gigabyte hard drive, you probably have more problems to worry about.

Short answer: do not use a delimited list. Normalization.

+8
source

First option. He called the many-to-many join table. This will work fine if you create the appropriate indexes.

Do not go with the second option "denormalised".

+6
source

You could use a separate table, or you could return to cavemen with chisels. The choice is up to you.

+4
source

A separate table is the path, otherwise you are trying to get around your database engine. A separate table is properly normalized - in general, as the application expands, the better it normalizes, the easier you will find it to work. What Greg said above is also absolutely right.

+2
source

Although I would highly recommend the normalized method that everyone offers. I believe that having an enum-based role system will allow you to have one digit for the "roles" column and avoid the need to create another table.

0
source

Source: https://habr.com/ru/post/1299009/


All Articles