Optimize recording performance for an AWS Aurora instance

Question

Optimize recording performance for an AWS Aurora instance

I have an AWS Aurora DB cluster that focuses on recording 99.9%. At this peak, 2-3k write / sec will be executed.

I know that Aurora is somewhat optimized by default for recording, but I wanted to ask, as a relative newcomer to AWS - what are the best practices / tips for recording performance with Aurora?

+7

database mysql amazon-web-services amazon-rds-aurora aws-aurora

griffinjt Sep 23 '17 at 20:00

source share

2 answers

Bill karwin · Answer 1 · 2017-09-23T20:51:00+0000

In my experience, Amazon Aurora is not suitable for working with a database with high write traffic. At least in its implementation around 2017. Perhaps this will improve over time.

In early 2017, I was working on some tests for an intensive recording application, and we found that RDS (not Aurora) was far superior to Aurora in recording performance, given our application and database. In fact, Aurora was two orders of magnitude slower than RDS. Amazon's high-performance statements for Aurora appear to be entirely marketing related.

In November 2016, I attended the Amazon re: Invent conference in Las Vegas. I tried to find a knowledgeable Aurora engineer to answer my questions about performance. All I could find were junior engineers who were ordered to repeat the claim that Aurora is magically 5-10 times faster than MySQL.

In April 2017, I attended the Percona Live conference and saw a presentation on how to develop a distributed storage architecture like Aurora using the MySQL standard with CEPH for the open source distributed storage tier. There's a webinar on the same topic: https://www.percona.com/resources/webinars/mysql-and-ceph , jointly presented by Yves Trudeau, the engineer I saw speaking at the conference.

What became clear when using MySQL with CEPH was that the engineers had to disable the MySQL change buffer because it was not possible to cache the changes in the secondary indexes, and the storage was also distributed. This caused huge problems with the performance of writing to tables that have secondary (non-unique) indexes.

This corresponded to the performance issues that we saw when comparing our application with Aurora. There were many secondary indexes in our database.

Therefore, if you absolutely must use Aurora for a database with high write traffic, I recommend that you drop all of your secondary indexes first.

Obviously, this is a problem if indexes are needed to optimize some of your queries. Of course, both SELECT queries, but also some UPDATE and DELETE queries can use secondary indexes.

One strategy might be to create a non-Aurora read replica of your Aurora cluster and create secondary indexes only in the read replica to support your SELECT queries. I have never done this, but apparently this is possible, according to https://aws.amazon.com/premiumsupport/knowledge-center/enable-binary-logging-aurora/

But this still does not help in cases where your UPDATE / DELETE statements need secondary indexes. I have no suggestions for this scenario. Perhaps you are out of luck.

My conclusion is that I would not use Aurora for intensive recording applications. Maybe this will change in the future.

Chris zelenak · Answer 2 · 2018-05-01T17:50:36+0000

I had a relatively positive experience with Aurora, for my use case. I suppose (time has passed) that we crushed somewhere close to 20,000 DML per second, the largest type of instances (I think db.r3.8xlarge?). I apologize for the uncertainty, I no longer have the opportunity to get indicators for this particular system.

What have we done:

This system did not require an “immediate” response to this insert, so the records were placed in a separate process. This process will collect N queries and break them into M packets, where each packet correlates with the target table. These parties will be placed in one TXN.

We did this to achieve recording efficiency in bulk recordings and to avoid blocking between tables. There were 4 separate (I suppose?) Processes making this queue and write behavior.

Due to the high write load, it was absolutely necessary for us to transfer all read operations to the read replica, since the main resource usually occupied 50-60% of the CPU. We checked this arch beforehand by simply creating random data-writing processes, and simulated the general behavior of the system before fixing the actual application in it.

Almost all records were INSERT ON DUPLICATE KEY UPDATE , and the tables had a number of secondary indexes.

I suspect that this approach worked for us simply because we were able to tolerate a delay between when the information appeared on the system and when the readers really needed it, thus allowing us to group in much larger quantities. YMMV.

Optimize recording performance for an AWS Aurora instance

More articles: