Database Design - Relationships vs. Properties

I have a problem with database design (SQL / MySQL). Suppose we have a user, the user can have many friends and many messages, and some information about himself is filled in .

It is obvious that for friends we need one pivot_table for the n: n relationship, for posts we need to create one additional table with the user_id (1: n) relationship.

So we need the users , user_friends and posts tables. It is obvious. Here's how relationships should be handled.

But now suppose we want users to have the following data:

 name - text description - text marital status - select only one from list favourite colour - select only one from list hobby - select up to 3 from list 

For text fields (name, description) it is really obvious that we just create varchar / text columns in the users table and what it is.

General question : how to handle other fields (selectable from lists)? Should I create relationships for them, or maybe I should create standard data columns with them?

In my opinion, it makes no sense to create relationship tables for this, because, using select lists, we restrict the user when he can actually insert into the database. In theory, we could allow the user to manually enter their color as a favorite color (for example, red and if something is entered incorrectly, for example reds , we would compare it with the list of allowed colours ). The same would be for gender - in my opinion, it makes no sense to create an additional table when we hold only a woman and a person and create relationships for her.

The first database design:

I could, for example, create the following columns for properties:

 marital_status - int fav_colour - int hobby_1 - int hobby_2 - int hobby_3 - int 

And another table (or even a simple array in PHP or another language), where I store this value 1 for fav_colour, for example, red, value 2 for a hobby is music, etc. (it doesn't matter how I store these values โ€‹โ€‹here - I could also use the enum type for this).

For me, the advantages of such an attitude do not create many relationships, which are actually rather properties, rather than relationships (as I mentioned above), so less work + easier to get information about the user - you do not need to use any associations, it would be important if you have for the user, for example, 20 or 100 of these properties, and I can easily search in the user table. The disadvantages are also quite obvious - the data is not normalized, for any multiple choice (for example, for a hobby) I need to create 3 columns, and if in the future I decide that the user can choose not 2 colors, but 2 or 3, I will need to add 2 extra column.

Alternative database design:

I create additional tables: colours , hobbies , marital_statuses , and I create 3 summary tables: user_colours , user_hobbies , user_marital_statuses . Disadvantages: many join. Advantages - if I created 3 additional pivot tables, I could easily allow the user to select up to 10 colors, and I donโ€™t need a redesign base at all. But there are also disadvantages - a complex search, a lot of work, a lot of connections.

Detailed question

So, to summarize - which solution would be better to assume:

  • I would probably not change the maximum score of one property (if I decided that I allow a maximum of 3 hobbies, this will probably never change)
  • Selection lists for many fields will be relatively short (for most of them less than 10)
  • I need to search a lot in such a database. Someone, for example, wants to look for a user whose fav_colour is set to red and has a hobby.

If there are any other solutions or advantages / disadvantages that you see, I appreciate sharing with me.

+6
source share
4 answers

It looks like you want to apply some restrictions to certain properties of your users. For example, your favorite color should be one of red, green, blue, pink, orange, etc .; marital status should be one of the lonely, divorced, married.

You described one way to do this: lookup tables. This is the best way if the possible values โ€‹โ€‹are dynamic and require constant maintenance, or if there are many possible values. From your description, this is not your situation. Your possible values โ€‹โ€‹will be pretty static and short.

I recommend using the sql constraint CHECK . With it, you can control the fields of possible values. For instance:

 CREATE TABLE users ( Name varchar(255) NOT NULL, Description varchar(255), Marital_Status varchar(10) NOT NULL, Color varchar(10) NOT NULL, CONSTRAINT chk_Color CHECK (Color in ('Red', 'Blue', 'Green', 'Orange')), CONSTRAINT chk_Marriage CHECK (Marital_Status in ('Single', 'Married', 'Divorced')) ) 

I did not check the syntax of this DDL statement, so it may contain punctuation errors. In addition, the syntax may be different for your specific DBMS. I think this should work for MySQL.

+1
source

If users can often change their favorite colors / hobbies, I would use lookup tables, in my example I will call them decode tables. All relationships between user/hobbies and user/colors will be found in this decode table.

Since you can only have 1 marital status , this can easily handle this ratio from 1 to many.

Create a Marital_Status table with two fields, Id (pk) and Status(varchar(n)) A decode table decode not required to search for marital status .

Now I would recommend creating a table for storing colors and a table for hobbies . We did the marital status .

 Hobbies HobbyId, Hobby Colors ColorId, Color 

Whenever you need to add / remove a new hobby/color , do it in these decode tables.

It depends on whether you want to use 1 decode table for each relationship or a lot that is. Hobby_Decode and Color_Decode etc.

I will explain usage scenario 1.

Create your decoding table with the following fields ...

decode

Item_Type varchar(n) - We will click Hobby or Color in this field

UserId int - explaining itself, contains the user identifier for the search.

LookupId - will contain the identifier of either Hobby or Color

Let me create some sample data, and we will work on it.

Hobbies table data

  | HobbyId | Hobby 1 Studying 2 Doing Drugs 3 Drinking 

Colors table data

  | ColorId | Color 1 Red 2 Blue 

While we are on it, here is our user table.

Users

  | UserId | Name 1 Marcin 2 CSharper 

I like to drink, do drugs and red. You are a nerd, so you like to study and the color is Blue. In our extension table, we will add the following entries to represent this.

decode

  | Item_Type| UserId | LookUpId 'Hobby' 2 2 'Hobby' 2 3 'Color' 2 1 'Hobby' 1 1 'Color' 1 2 

Looking at this decoding table, we will not say anything. Once we join our decode table before colors/hobbies , this will be obvious.

If you want to see all my hobbies and my favorite colors, the request will look like this:

Note: this is SQL Server syntax, not mysql.

 --Pull Hobbies Select u.Name, dH.Item_Type as 'Favorite', h.Hobby as 'Item' from User u inner join decode dH on dH.UserId = u.UserId and dH.Item_Type = 'Hobby' inner join Hobby h on h.HobbyId = dH.LookUpId where u.UserId = 2 --Union in Colors Union Select u.Name, dH.Item_Type as 'Favorite', h.Hobby 'Item' from User u inner join decode dC on dH.UserId = u.UserId and dH.Item_Type = 'Color' inner join Color c on c.ColorId = dH.LookUpId where u.UserId = 2 

Your result will look like

 | Name | Favorite | Item CSharper Hobby Drinking CSharper Hobby Doing Drugs CSharper Color Red 

If it is configured in this way, then it is very easy to change / update your favorite hobbies and colors of people. The decode table will handle all this. It just requires a simple record or deletion of this table. And also in this way, the User can have an infinite number of favorite hobbies and colors, since this is the decoding table that controls this, and not the definition of the Users table.

Manipulating your sample request is a bit if we want to find all the Users who like the blue color and the request booze will look.

 Select u.Name from User u inner join decode d on d.UserId = u.UserId inner join Hobby h on h.HobbyId = d.LookUpId and d.Item_Type = 'Hobby' inner join Color c on C.ColorId = d.LookUpId and d.Item_Type = 'Color' where h.Hobby = 'drinking' and c.Color = 'blue' 

The implementation of such associations is quite acceptable.

+1
source

You want to avoid extra tables and joins if you really need to. This is exactly what is listed. enumerations stored internally as a whole and in use look like strings with limited values.

 create table users ( user_id bigint unsigned not null auto_increment primary key, name varchar(255) not null, description varchar(255), marital_status enum('single', 'married'), favorite_color enum('red', 'green', 'blue'), hobby1 enum('painter', 'doctor', 'lawyer'), hobby2 enum('painter', 'doctor', 'lawyer'), hobby3 enum('painter', 'doctor', 'lawyer') ); 

Insert value: insert into table users (name, marital_status) values ('Jack', 'single');

This instruction will not be executed: insert into table users (name, marital_status) values ('Jack', 'abcd');

Changing the list is a quick and easy operation: alter table users modify marital_status enum('divorced', 'single', 'married');

+1
source

Whatever you choose, well, do not pay much attention to normalization.

But for me there would be 5 tables users , marital_status , colours , hobbies , user_hobbies

 CREATE TABLE users ( user_id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255) NOT NULL, description VARCHAR(255), marital_status INT, fav_colour INT ) CREATE TABLE marital_status ( id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255) NOT NULL ) CREATE TABLE colours ( id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255) NOT NULL, code VARCHAR(7) ) CREATE TABLE hobbies ( id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255) NOT NULL ) CREATE TABLE user_hobbies ( id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, user_id BIGINT, hobby_id INT ) 

For pivot tables, I suggest creating / populating them separately from the application, for example, using the command line or message queue (or using the crontab functions)

0
source

Source: https://habr.com/ru/post/976794/


All Articles