Keeping an item that is tagged with many categories - bitmask?

Perhaps the solution is obvious, but I cannot find the good.

My upcoming project will have one main table, its data will be often read. Update / insert / delete speed is not a problem.

Elements in this main table are associated with 4 or more categories. An item can have 50-100 or more relationships within the same category.

The most common operations that will be performed in the database:

  • select all items assigned to categories A, B, C, ... using LIMIT X, Y
  • count all items assigned to categories A, B, C, ...

My first thought on how to create a database for the above was something like this (classic approach, I think):

First, for each of the four categories, I create a table category:

id   - PK, int(11), index   
name - varchar(100)

then I will have one table item:

id   - PK, int(11), index
... some more data fields, about 30 or so ...

and link tables the category, there will be 4 or more lookup tables / MM, for example:

id_item     - int(11)
id_category - int(11)

The queries looked something like this:

select
item.*

from
item

inner mm_1 on mm_1.id_item = item.id
inner join cat_1 on cat_1.id = mm_1.id_category and cat_1.id in (1, 2, ... , 100)

inner mm_2 on mm_2.id_item = item.id
inner join cat_2 on cat_2.id = mm_2.id_category and cat_2.id in (50, 51, ... , 90)

Of course, the above approach with MM tables will work, but since the application should provide very good performance SELECT, I tested it with real data volumes (100,000 records in a table item, 50 - 80 relationships in each category), but it was not as fast as I expected, even with the indices in place. I also tried using WHERE EXISTSinstead INNER JOINwhen choosing.


My second idea was to simply use the table itemon top to denormalize the data.

:

category 1.1 - 1
category 1.2 - 2
category 1.3 - 4
category 1.4 - 8
... etc ...

, item category 1.1 category 1.3, 5, item.bitmask, :

select count(*) from item where item.bitmask & 5 = 5

.

: mysql , , item.bitmask BIGINT, 64 , 100 .


. , , , item , category_1_1 category_4_100, 1 0. AND WHERE select, .

, ? - ?


: " " 50 - 100 "??:

, item . , ( 4 ). , :

Image:
     - Category "mood":
         - bright
         - happy
         - funny
         - ... 50 or so more ...
     - Category "XYZ":
         - ... 70 or so more ...

#, :

public class Image {
    public List<Mood> Moods; // can contain 0 - 100 items
    public List<Some> SomeCategory; // can contain 0 - 100 items
    // ...
}
+3
3

():

Item (image)
    Id         PK, int(11)
    Name       varchar(100)

Category (mood, xyz)
    Id         PK, int(11)
    Name       varchar(100)

Relations (happy, funny)
    Id         PK, int(11)
    Name       varchar(100)

ItemCategories
    Id         PK, int(11)
    ItemId     FK, int(11)
    CategoryId FK, int(11)

ItemCategoryRelations
    ItemCategoriesId FK, int(11)
    RelationId       FK, int(11)

SELECT *
  FROM Item 
  JOIN ItemCategories ON Item.Id = ItemCategories.ItemId
 WHERE ItemCategories.CategoryId IN (1, 2, ..., 10)

, , . , , :

Item (image)
    Id         PK, int(11)
    Name       varchar(100)

Category (mood, xyz)
    Id         PK, int(11)
    Name       varchar(100)

Relations (happy, funny)
    Id         PK, int(11)
    CategoryId FK, int(11)
    Name       varchar(100)

ItemRelations 
    ItemId     FK, int(11)
    RelationId FK, int(11)

SELECT *
  FROM Item 
  JOIN ItemRelations ON Item.Id = ItemRelations.ItemId
  JOIN Relations ON Relations.Id = ItemRelations.RelationsId
 WHERE Relations.CategoryId IN (1, 2, ..., 10)
+2

; . , bright - mood bright mood\bright. alt text

+1

, , ... , . "" "". ..

While I absolutely love beatmasking (microprocessor programmer is here day), and although I always like to use it for db design, it always seems like this is the best way.

How about something like that.

tblItems 
------------------
  item_id
  item_name

tblCategories
------------------
  category_id
  category_name

tblRelations
------------------
  relation_id
  relation_name

tblCategoryRelationLink (link relations to specific categories)
------------------
  cat_rel_id
  category_id
  relation_id

tblItemRelationLink (set relations to items)
------------------
  item_rel_id
  item_id
  rel_id

If your relationship is category specific .... then you can just find which category the relationship is related to. If in some way you may have a relationship associated with two categories, you will also need an additional table (to associate an element with a category).

0
source

Source: https://habr.com/ru/post/1724713/


All Articles