MySQL table design support and normalization

Note. This question was rephrased on 11/19/12 for clarification. As a rule, I don’t have so many problems, but I am struggling to develop a new product system for a client site. We offer a set of products that every customer can sell to their customers. We can add new products at any time, but they all follow this format:

  • Category
  • A type
  • Product

To give an example of the real world, using the structure from earlier:

  • Baseball equipment
    • Gloves
      • Rawlings
      • Nike
      • Mizzuno
    • The bats
      • Easton
      • Louisville batter
  • Football equipment
    • Footwear
      • Nike
      • Reebok
      • Adidas
    • footballs
      • Nike
      • Saplding
      • Wilson
    ....

The list above is clearly continuing and may be much larger, but it gives a general idea.

I currently store product types that individual customers can sell in one flat format table as follows:

ID | clientID | categoryID | typeID | productID | customURL ============================================================= 1 | 111 | 1 | 1 | 1 | 1111 2 | 111 | 1 | 2 | 2 | 2222 3 | 111 | 1 | 2 | 3 | 3333 4 | 111 | 2 | 3 | 4 | 4444 5 | 222 | 1 | 1 | 1 | 5555 6 | 222 | 2 | 3 | 4 | 6666 
  • In the above example, category 1 may be “baseball equipment” and category 2 may be “soccer equipment”.
  • The names of the corresponding category identifiers, typeID and productID will be stored in three seaprate tables with FK (innodb) relationships to support normalization.
  • the type refers to objects of the second level (gloves, bats, shoes, football, etc.). These numbers never intersect (this means that there can never be the same type identifier, even if the overall product is the same (baseball shoes have a different identifier than football shoes).
  • In this table, clientID 1 can sell 4 products, 3 in categories 1 and 1 in category 2. ClientID 2 can sell 2 products, one in each category.

I tend to keep the table structured, but I know in a different design. I may have split the tables for normalization purposes. I am not sure if this is applicable. If I violated them, I would see that this comes from one table in 4 or more as follows:

productsOffered table

 ID | clientID | productID | customURL ====================================== 1 | 111 | 1 | 1111 2 | 111 | 2 | 2222 3 | 111 | 3 | 3333 4 | 111 | 4 | 4444 5 | 222 | 1 | 5555 6 | 222 | 4 | 6666 

productsDefinition Table

 ID | productID | typeID | productName ====================================== 1 | 1 | 1 | rawlings glove 2 | 2 | 2 | product2 3 | 3 | 2 | product3 4 | 4 | 3 | product4 

typeDefinition Table

 ID | typeID | categoryID | typeName ===================================== 1 | 1 | 1 | Gloves 2 | 2 | 1 | Bats 3 | 3 | 2 | Shoes 4 | 4 | 2 | Footballs 

categoryDefinition Table

 ID | categoryID | catName ============================= 1 | 1 | Baseball Equipment 2 | 2 | Football Equipment 

Am I thinking about this? Do not both methods get the final solution the same way (the latter just includes several joins to collect a flat table, as shown in Figure 1)?

+4
source share
4 answers

The goal and advantage of normalization is that it makes it difficult (ideally, impossible) to enter anomalous data.

For example, in Figure 1, to prevent accidentally storing a row with type 3 and category 1? Nothing but writing application code, which is absolutely perfect.

But if you use a single table approach and you ever have to change the parent category of typeid 3, you will have to change the data in millions of places to reflect this change. This means that you lock the table during this cleaning, otherwise new data may be inserted at the same time.

Normalization helps to eliminate the redundancy of information storage, and if each discrete fact (for example, typeid 3 belongs to category 2) is stored only once, then it is easy to change the atoms and automatically change the meaning of all links to this line.

You are right that more unions are needed, but only if you use pseudo-docs all over the place, as you do. You do not need to do this, instead you can use natural keys, and links to them will be declared using cascading foreign keys, so a change in the search table will automatically update the links to the tables.

Of course, normalization rules do not provide for the use of pseudo-oxides. These rules do not say anything about them.


Repeat your comment: an alias or surrogate key is the id column that is used to identify rows. Typically, values ​​are allocated using an auto-increment mechanism that provides uniqueness by allowing simultaneous transactions to insert rows. The id value does not matter with respect to the identifiable string.


The following shows how your tables look in the usual form, but without surrogate keys.

productsOffered table

 client | product | customURL =================================== Smith | Rawlings Glove | 1111 Smith | Product 2 | 2222 Smith | Product 3 | 3333 Smith | Product 4 | 4444 Jones | Rawlings Glove | 5555 Jones  | Product 4 | 6666 

productsDefinition Table

 product | type ======================= Rawlings Glove | Gloves Product 2 | Bats Product 3 | Bats Product 4 | Shoes 

typeDefinition Table

 type | category ============================== Gloves | Baseball Equipment Bats | Baseball Equipment Shoes | Football Equipment Footballs | Football Equipment 

categoryDefinition Table

 category ================== Baseball Equipment Football Equipment 

It is completely consistent with the design and normalization of relational databases in order to use non-integer numbers as the data type for the primary key column, and therefore foreign keys refer to them from other tables.

There are good reasons for using surrogate keys for the sake of performance or brevity, or allowing you to freely change values ​​in other columns. But normalization does not require the use of surrogate keys.

+8
source

I would go for a normalized approach, since in any case you must maintain separate lookup tables for category and type names (and possibly other attributes) using a flat approach.

You might want to change the category and type to a general tree structure using a table, for example:

  create table product_hierarchy( id integer primary key, name character, parent_id references product_hierarchy) 

... because it will give the customer the flexibility to add more depth to the hierarchy.

+1
source

To try to answer your direct questions:

Do I really think about it?

Depending on how large your application is and what engine you use to store data. Since you plan to put it in MySQL tables, your thoughts are very relevant.

Do not both methods get the final solution in the same way (the latter just includes several associations for collecting a flat table, as shown in Figure 1)?

Well yes, but quote Wikipedia ,

Database normalization is the process of organizing fields and tables in a relational database to minimize redundancy and dependency. Normally, normalization involves dividing large tables into smaller (and less redundant) tables and determining the relationships between them. The goal is to isolate the data so that the addition, deletion and modification of the field can be done in only one table, and then distributed through the rest of the database through certain relationships.

Interrupting your data in the structure you described (by the way, I agree) will allow you to most easily support your data. Storing category and type data in the same table as the “offered products” creates a lot of redundant data. Of course, I cannot imagine where you would need to update this data, but if you did, you would have to update many records. In your proposed structure, the number of updated records is minimal.

+1
source

In the first approach, you forgot the name column for each category, type, and product identifier. If you add this information, it may work, but another approach seems to be working. When you use 4 different tables, you have more space.

0
source

Source: https://habr.com/ru/post/1446506/


All Articles