Fast query in latin1, slow in utf8 - why?

Question

Fast query in latin1, slow in utf8 - why?

I have a query that looks something like this:

 SELECT DISTINCT table1.id, {long list of fields} FROM table1 
     INNER JOIN table2 ON table1.table2_id = table2.id 
     {... more joins ...} 
     LEFT JOIN table_last ON table_last.id=some_table.last_id
     WHERE ( table_last.id IS NULL) AND {...more conditions...}
     ORDER BY table1.date_entered desc LIMIT 0,6

This query in the same database works fine (<1s runtime) when started with latin1 as a client encoding and very slow (could not wait for completion) after SET NAMES 'utf8'. The query returns 70 rows (of course, part to the limit), so the size of the result set should not be a problem. I checked all tables in all joins, and all of them seem to have UTF-8 as their encoding (I checked with SHOW TABLE CREATE).

What can cause such strange behavior? How is utf8 in this case much worse than latin1? In case it is relevant, the identifier field is char(36)everywhere, and unions have conditions based on such fields and integer fields and varchar fields.

PS I know it DISTINCTmay take some time, but I can’t delete it, and it’s 70 lines anyway, and it’s fast by default (latin1)! So this looks like something external to the request, but what?

+3

performance sql join mysql

Stasm Jan 29 '11 at 2:13

source share

1 answer

Yzmir ramirez · Answer 1 · 2011-01-29T02:39:32+0000

When you use a table utf8, it allocates 3 times the varchar length for each row (256 * 3 = 768 bytes)!

This will mean that your queries will use more resources now that the strings take up three times as much space - therefore the buffers are not so large and you may have to swap if there are many queries at the same time - this will further degrade the performance of your query / server.

Fast query in latin1, slow in utf8 - why?

More articles: