Mysql: use SET or many columns?

Question

Mysql: use SET or many columns?

I use PHP and MySQL. I have entries for:

events with different types of events that are hierarchical (events can have several categories and subcategories, but there is a fixed number of such categories and subcategories) (timestamped)

What is the best way to set up a table? Should I have a bunch of columns (30 or so) with listings for yes or no, indicating membership in this category? or should I use the MySQL SET data type? http://dev.mysql.com/tech-resources/articles/mysql-set-datatype.html

Mostly I have performance, and I want to be able to get all the event IDs for this category. Just look for information on the most effective way to do this.

+6

database php mysql database-design

lollercoaster Jun 2 '11 at 20:50

source share

4 answers

The relationship between events and event types / categories is a many-to-many relationship, as the echo says , but a simple xref will cause a problem: if you want to query all the descendants of any given node, you have to make some recursive queries. On a deep tree, this will be very inefficient.

So, when you say “get all identifiers for a certain category”, if you mean all descendants, then you want to use the Nested dialing model :

http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/

The nested dialing model makes entries a little slower, but it subtracts subtrees very easily:

To get the Televisions subelement, you ask for all categories left >= 2 and right <= 9 .
Leaf nodes always have left = right - 1
You can find the number of descendants without stretching these lines: (right - left - 1)/2
Finding inheritance and depth paths is also very simple (one query substance). See the article for details.

+2

Nicole Jun 2 '11 at 20:57

source share

You can try using the cross-reference table (Xref) to create a many-to-many relationship between your events and their types.

 create table event_category_event_xref ( event_id int, event_category_id int, foreign key(event_id) references event(id), foreign key (event_category_id) references event_category(id) );

Membership in a category / category is determined by the entries in this table. Therefore, if you have an entry with {event_id = 3, event_category_id = 52} , this means that event No. 3 is in category No. 52. Similarly, you can have entries for {event_id = 3, event_category_id = 27} , etc. d.

+1

echo Jun 2 '11 at 20:56

source share

It’s good that the number of categories is fixed. If you could not use any approach.

Check Why you should not use SET on the page you are linked to. I think this should give you a comprehensive reference.

I think the most important one is about indices. Also, changing a SET bit trickier.

+1

Halcyon Jun 2 '11 at 21:02

source share

joelhardi · Accepted Answer · 2011-06-02T22:19:41+0000

Sounds like you're mostly preoccupied with work.

Several people suggested splitting into 3 tables (a category table plus either a simple cross-reference table, or a more complex way of modeling tree hierarchies, such as a nested set or materialized path), which is the first thing I thought when I read my question.

With indexes, a completely normalized approach like this (which adds two JOINs) will still have "good" read performance. One problem is that the INSERT or UPDATE for the event can now also include one or more INSERT / UPDATE / DELETE in the cross-reference table, which on MyISAM means that the cross-reference table is locked, and on InnoDB means that the rows are locked, therefore, if your database is occupied by a significant number of records, you will have problems with greater conflicts than if event lines were blocked.

Personally, I would try this completely normalized approach before optimizing. But I assume that you know what you are doing, that your assumptions are correct (the categories never change), and you have a usage pattern (many entries) that requires a less normalized, flat structure. This is completely normal and is part of what NoSQL is about.

SET versus "large number of columns"

So, as for your actual question “SET versus a large number of columns,” I can say that I worked with two companies with smart engineers (whose products were CRM web applications ... actually it was event management) and both of them used the multi-column approach for this type of static set data.

My advice would be to think about all the queries that you will make in this table (weighted by their frequency) and how the indexes will work.

First, when using the multi-column approach, you will need indexes for each of these columns so that you can do SELECT FROM events WHERE CategoryX = TRUE . With indexes, this is a super-fast query.

Unlike SET, you must use bitwise AND (&), LIKE, or FIND_IN_SET () to execute this query. This means that the query cannot use the index and must perform a linear search of all rows (you can use EXPLAIN to verify this). Slow request!

That the main reason for SET is a bad idea - its index is useful if you select the exact category groups. SET works great if you select categories by event, but not vice versa.

The main problem with the less normalized multi-column approach (compared to a fully normalized one) is that it does not scale. If you have 5 categories and they never change, great, but if you have 500 and change them, this is a big problem. In your scenario, where about 30 never change, the main problem is the presence of an index in each column, so if you make frequent entries, these queries become slower due to the number of indexes that need to be updated. If you choose this approach, you may need to check the MySQL slow query log to make sure there are no slow queries due to conflicts during difficult times of the day.

In your case, if you have a typical web application for reading, I think that using the multi-column approach (as two CRM products did for the same reason) is probably reasonable. This is definitely faster than SET for this SELECT query.

TL DR Do not use SET, because the query "select events by category" will be slow.

Mysql: use SET or many columns?

SET versus "large number of columns"

More articles: