Processing a huge MYSQL table

I hope you all feel great. We have a huge mysql table called "messages". It has about 70,000 records and has a size of about 10 GB.

My boss says we need to do something to make it easier for us to process this huge table, because if this table gets corrupted, it will take us a long time to restore the table. Also sometimes slow.

What are the possible solutions to simplify the processing of this table, as in all aspects.

The structure of the table is as follows:

CREATE TABLE IF NOT EXISTS `posts` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT, `thread_id` int(11) unsigned NOT NULL, `content` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL, `first_post` mediumtext CHARACTER SET utf8 COLLATE utf8_unicode_ci, `publish` tinyint(1) NOT NULL, `deleted` tinyint(1) NOT NULL, `movedToWordPress` tinyint(1) NOT NULL, `image_src` varchar(500) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL DEFAULT '', `video_src` varchar(500) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL, `video_image_src` varchar(500) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL, `thread_title` text CHARACTER SET utf8 COLLATE utf8_unicode_ci, `section_title` text CHARACTER SET utf8 COLLATE utf8_unicode_ci, `urlToPost` varchar(280) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL, `posts` int(11) DEFAULT NULL, `views` int(11) DEFAULT NULL, `forum_name` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL, `subject` varchar(150) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL, `visited` int(11) DEFAULT '0', `replicated` tinyint(4) DEFAULT '0', `createdOn` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (`id`), UNIQUE KEY `urlToPost` (`urlToPost`,`forum_name`), KEY `thread_id` (`thread_id`), KEY `publish` (`publish`), KEY `createdOn` (`createdOn`), KEY `movedToWordPress` (`movedToWordPress`), KEY `deleted` (`deleted`), KEY `forum_name` (`forum_name`), KEY `subject` (`subject`), FULLTEXT KEY `first_post` (`first_post`,`thread_title`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=78773 ; 

Thank you.

UPDATED

Note: although I am well versed in the answers, almost all of the answers were about optimizing the current database, and not about how to process large tables normally. Although I can optimize the database based on the responses received, it really does not answer the question about processing huge databases. I am talking about 70,000 records now, but over the next few months, if not weeks, we will grow. Each record can be about 300 KB in size.

+6
source share
3 answers

My answer is also in addition to the two previous comments.

You have indicated half of your table. But if you look at some indexes (publish, delete, move ToWordPress), you will notice that they are equal to 1 or 0, therefore their selectivity is low (the number of rows is divided by the number of different values โ€‹โ€‹of this column). These indexes are a waste of space.

Some things also do not make sense. tinyint(4) - this does not actually make it a four-digit integer. Number - display length. tinyint is 1 byte, so it got 256 possible values. I guess something went wrong there.

In addition, 10 concerts measuring just 75 thousand records? How did you measure the size? Also, what equipment do you have?

Edit regarding your updated question:

There are many ways to scale databases. I will link one question / answer so that you understand what you can do: here it is . Another thing you can do is get the best equipment. Typically, the reason that databases are slow when they grow in size is because of the hard disk subsystem and the available memory left to work with this data set. The more RAM you have, the faster everything will be.

Another thing you could do is split the table in two so that one table contains text data and the other contains data related to what your system requires to perform a specific search or comparison (you would put integer fields there). Using InnoDB, you will get a huge performance boost if the two tables were connected through some kind of foreign key pointing to the primary key. Since InnoDB is such that primary key searches are quick - you open up several new possibilities for what you can do with your dataset. In case your data becomes more and more huge, you can get enough RAM, and InnoDB will try to buffer the data set in RAM. There is an interesting thing called HandlerSocket that does some neat magic with servers that have enough RAM and use InnoDB.

In the end, it really comes down to what you need to do and how you do it. Since you have not mentioned this, it is difficult to evaluate what you should do. My first step towards optimization would certainly be to set up a MySQL instance and bring this large table back up.

+6
source

I think you need to change some columns.

You can start by reducing var char variables.

image_src / video_src / video_image_src VARCHAR (500) is too much, I think. (100 varchars is enough, I would say)

thread_title is text, but should be VARCHAR (200?) if you tell me the same with section_title

Ok, here is your content longtext problem

Do you really need long text? longtext - up to 4 GB of space. I think if you change this column to text, it will be much smaller

  TINYTEXT 256 bytes TEXT 65,535 bytes ~64kb MEDIUMTEXT 16,777,215 bytes ~16MB LONGTEXT 4,294,967,295 bytes ~4GB 

Edit: I see that you are using a full-text index. I am pretty sure that saving a lot of data is a lot. You must use another mechanism to search for the full text.

+2
source

In addition to what Michael commented on, slowness can be a problem based on how optimized the queries are and the correct indexes to match. I would try to find some of the guilty queries that take longer than you expect and post here in S / O to find out if anyone can help optimize the parameters.

0
source

Source: https://habr.com/ru/post/891098/


All Articles