Tagging a request with group_concat

Using a database schema to mark from this question accepted by answer , is it possible to get a query using group_concat that works with a lot of data? I need to get items with my tags for all items tagged with x. Using a request with group_concat having ~ .5 million tags is very slow> 15 seconds. Without group_concat (tags without ) this is ~ 0.05 seconds.

As a side question, how can SO solve this problem?

+6
source share
4 answers

This is probably the case with a poor indexing strategy. Adapting the circuit shown in the accepted answer to the question you contacted:

CREATE Table Items ( Item_ID SERIAL, Item_Title VARCHAR(255), Content TEXT ) ENGINE=InnoDB; CREATE TABLE Tags ( Tag_ID SERIAL, Tag_Title VARCHAR(255) ) ENGINE=InnoDB; CREATE TABLE Items_Tags ( Item_ID BIGINT UNSIGNED REFERENCES Items (Item_ID), Tag_ID BIGINT UNSIGNED REFERENCES Tags ( Tag_ID), PRIMARY KEY (Item_ID, Tag_ID) ) ENGINE=InnoDB; 

Note that:

  • The MySQL SERIAL data type is an alias for BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE and, as such, is indexed;

  • defining foreign key constraints in Items_Tags creates indexes in the columns of the foreign key.

+5
source

I would suggest having a hybrid between normalized data and denormalized . Therefore, using the normalized structure provided by eggyal, I would do the following denormalized structure:

 CREATE TABLE Items_Tags_Denormalized ( Item_ID BIGINT UNSIGNED REFERENCES Items (Item_ID), Tags BLOB, PRIMARY KEY (Item_ID) ) ENGINE=InnoDB; 

In the Tags column, you will have all the tags ( Tag_Title ) for the corresponding Item_ID .
Now you have 2 ways to achieve this:

  • create a cron that runs periodically, which will build this Items_Tags_Denormalized table using GROUP_CONCAT or whatever suits you (advantage: does not add extra load when inserting or deleting Items_Tags in the table; disadvantage: the denormalized table will not always be relevant (depending on how often do you run cron))

  • create triggers for the Items_Tags table for insertion and deletion to update the Items_Tags_Denormalized table (advantage: a denormalized table will always be relevant; disadvantage: additional load when inserting or deleting in the Items_Tags table)

Choose any solution that suits your needs, best considering the advantages and disadvantages.

So, at the end you will get the Items_Tags_Denormalized table, from which you will read only without additional operations .

+3
source

Why are you using group_concat for this? For this x tag, you said that a quick list of items. For this list of items, all tags must be fast. And usually there are no restrictions, I mean that regular websites do not show 100,000 entries on one page.

I would suggest:

 drop temporary table if exists lookup_item; create temporary table lookup_item (item_id serial, primary key(item_id)); insert into lookup_item select i.id as item_id from items i where exists (select * from items_tags where item_id = i.id and tag_id = <tag_id>) and <other conditions or limits>; select * from lookup_item inner join items_tags it on it.item_id = i.id inner join tags t on t.id = it.tag_id order by i.<priority>, t.<priority> 

priority can be last changed for elements and has some meaning for tags.

Then you get each item with its tags. The only job in the code is to see when the next element is in the result line.

+1
source

If I understand correctly, GROUP_CONCAT is not the only thing you delete, which makes the query faster without tags. Inside GROUP_CONCAT you select Tags.Tag_Title and force access to the tag table.

You can try to run GROUP_CONCAT with Items_Tags.Tag_ID to test my theory.

+1
source

Source: https://habr.com/ru/post/945311/


All Articles