Database Modeling: How to catalog products like Amazon?

Suppose I had several products (from several thousand to one hundred thousand) that needed to be classified hierarchically. How do I model such a solution in a database?

Will a simple parent-child table work like this:

product_category - id - parent_id - category_name 

Then in my product table, I would just do this:

 product - id - product_category_id - name - description - price 

My concern is that this will not scale. By the way, I am currently using MySQL.

+2
source share
4 answers
The course will scale. This will work fine, this is a commonly used structure.

Include a level_no . This will help in the code, but more importantly, duplicates need to be eliminated.

If you need a really complex structure, you need something like the concept of inodes Unix.

You may have difficulty using the code necessary to create a hierarchy, for example, from product , but this is a separate problem.

And please change

  • ( product_category )) id to product_category_id
  • ( product id to product_id
  • parent_id to parent_product_category_id

Comments replies

  • level_no . Take a look at this data model, it is intended for the structure of the directory tree (for example, the FlieManager explorer window):

    Directory Data Model

    See if you can understand this, which is the concept of Unode inode. File names must be unique within Node, hence the second index. This is really completed, but some developers these days will have a flexible formulation of the code necessary to navigate the hierarchy, levels. Those developers need level_no to determine at what level in the hierarchy they are dealing.

  • Recommended Changes. Yes, this is called Good naming conventions. I am hard on this, and I publish it, so this is the naming standard. There are reasons for this, which will become clear to you when you write SQL with 3 or 4 join levels; especially when you go to the same parent in two different ways. If you are looking for SO, you will find many questions for this; always the same answer. It will also be entitled in the next model that I am writing for you.

+4
source

I tried to deal with the same problem 10 years ago. Here is my personal solution to this problem. But before I begin to explain, I would like to mention its pros and cons.

Pros:

  • You can select the children of a given node in any number of desired depths with the lowest possible cost.

  • The same can be done for the selection of parent nodes.

  • No special RDBMS feature required. Thus, the same technique can be implemented in any of them.

  • Everything is implemented using one field.

Minuses:

  • You must be able to determine the maximum number of trees. You also need to determine the maximum number of direct children for nodes.

  • Restructuring a tree is more expensive than moving it. But not as expensive as the Nest Set Model . Adding a new branch is a matter of finding the right value for the field. And to move the branch you need to update the new parent node element and all its children (direct and indirect). The good news is that deleting a node and its children is as simple as traversing it (which is absolutely nothing).

Technics:

Consider the following table as the owner of the tree:

 CREATE TABLE IF NOT EXISTS `product_category` ( `product_category_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `name` varchar(20) NOT NULL, `category_code` varchar(62) NOT NULL, PRIMARY KEY (`product_category_id`), UNIQUE KEY `uni_category_code` (`category_code`) ) DEFAULT CHARSET=utf8 ; 

All magic is performed in the category_code field. You need to encode the address of your branch into a text value as follows:

 **node_name -> category_code** Root -> 01 First child -> 01:01 Second child -> 01:02 First grandchild -> 01:01:01 First child of second child -> 01:02:01 

In the above example, each node can have up to 99 direct children (provided that we think in decimal form). And since category_code is of type varchar(62) , we can have up to (62-2) / 3 = 20 depth. This is a compromise between the depth you want and the number of direct children each node can have and the size of your field. From a scientific point of view, this is a complete tree implementation in which unused branches are not actually created, but reserved.

Good parts:

Now imagine that you want to select nodes under 01:02 . You can do this using one query:

 SELECT * FROM product_category WHERE category_code LIKE '01:02:%' 

Selection of direct nodes under 01:02 :

 SELECT * FROM product_category WHERE category_code LIKE '01:02:__' 

The selection of all ancestors 01:02 :

 SELECT * FROM product_category WHERE '01:02' LIKE CONCAT(category_code, ':%') 

Bad parts:

Inserting a new node into the tree is a matter of finding the right category_code . This can be done using a stored procedure or even in a programming language such as PHP.

Since the tree is limited by the number of straight children and the depth, the insertion may fail. But I believe that in most practical cases we can assume such a limitation.

Greetings.

+3
source

Your solution uses a hierarchy adjacency list model. This is by far the most common. It will scale to thousands of products. The problem is that solving an infinite infinite hierarchy requires either a recursive query or product-specific extensions for SQL.

There are other hierarchy models. In particular, there is a nested dialing model. A nested collection model is good for finding the path of any node in a single query. It is also useful for obtaining any desired auxiliary tree. This works more to keep it up to date. Much more work.

You may want to study it briefly before biting off more than you want to chew.

What are you going to do with the hierarchy?

0
source

I think your big problem is that this is a flaw in MySQL. For most RDBMSs that support WITH and RECURSIVE, you only need one scan per level. This makes deep hierarchies a little problematic, but usually not so bad.

I think in order to do this job well, you will have to code a rather extensive stored procedure, or you will have to switch to another tree model, or you will have to switch to another RDBMS. For example, this is easy to do with PostgreSQL and WITH RECURSIVE, and it provides much better scalability than many others.

0
source

Source: https://habr.com/ru/post/1437810/


All Articles