Is mongoDB or Cassandra better than MySQL for large datasets?

Our (currently MySQL) database contains over 120 million records, and we often use complex JOIN queries and application level logic in PHP that relate to the database. We are a marketing company engaged in the development of data as the main focus, so we have many great reports that need to be run daily, weekly or monthly.

At the same time, customer service runs on a replicated slave of the same database.

We would like these reports to happen in real time on the Internet, instead of manually creating spreadsheets for them. However, many of our reports take a considerable amount of time to retrieve data (in some cases more than an hour).

We do not work in the cloud, choosing instead to work using two physical servers in our server room.

Given all this, what is our best option for a database?

+6
source share
3 answers

I think you are mistaken in this problem.

The idea that you will lose NoSQL, that you will get better performance, is actually not the case. At the lowest level, you write and get a fair chunk of data. This means that your bottleneck is (most likely) an I / O hard drive (which is a common bottleneck).

Holding onto the hardware that you have for a moment and using a monolithic data warehouse is not scalable and, as you have noticed, has consequences when you want to do something in real time.

What are your options? You need to scale the server and software settings (this is what you would need to do with any NoSQL, in any case, stick to faster hard drives at some point). You might also want to take a look at alternative storage engines (besides MyISAM and InnoDB - for example, one of the best engines that would seem to turn random I / O into TokuDB serial I / O).

Implementing a faster HDD subsystem will also help your needs ( FusionIO , if you have the resources to get it).

Without additional information about your goal (what server settings are, what version of MySQL you use and which storage engines + data sizes you work with), these are all assumptions.

+9
source

Cassandra still needs Hadoop for MapReduce, and MongoDB limits concurrency regarding MapReduce ...

... So...

... 120 million records are not so many, and MySQL should easily handle this. My guess is this is a bottleneck in IO, or you are doing a lot of random readings instead of sequential readings. I would prefer to hire a MySQL technique for a month or so to customize your schema and queries, rather than investing in a new solution.

If you provide more information about your cluster, we can help you better. "NoSQL" alone is not a solution to your problem.

+9
source

As much as I'm not a fan of MySQL after your data gets big, I have to say that you don’t have to switch to a NoSQL solution anywhere. 120M rows doesn't really matter: the database I'm working on now has ~ 600 M in one table, and we query it efficiently. The problem with this large amount of data in terms of ops; The request is not.

It's all about the correct indexes and the correct use of them when joining, as well as in the second memory settings. Find your slow queries (mysql slow query log FTW!) And learn how to use the explain keyword to understand how slow they are. Then tune your indexes to make your queries efficient. Also, make sure you understand the MySQL memory settings. The documents have large pages explaining how they work, and they are not so difficult to understand.

If you have done both of these things and you still have problems, make sure that disk I / O is not a problem. Then you should look at another solution to request your data, if any.

NoSQL solutions like Cassandra have many advantages. Cassandra is fantastic in writing data. Scaling your records is easy - just add more nodes! But the trade-off is that it is more difficult to return the data. In terms of costs, if you have experience with MySQl, it's probably best to use this and scale your current solution until it reaches the limit before switching the underlying architecture completely.

+4
source

Source: https://habr.com/ru/post/903846/


All Articles