In short, I am having problems supporting over 5,000 read requests per minute from data APIs using Postgresql, Node.js, and node-postgres. The bottleneck seems to be between the API and the database. Here are the implementation details.
I am using an AWS Postgresql RDS database instance (m4.4xlarge - 64 GB of memory, 16 vCPU, 350 GB SSD, without IOPS initialized) for the data API with Node.js data. By default, RDS max_connections = 5000. The node API is load balanced in two clusters with 4 processes each (2 Ec2s with 4 vCPUs that run the API with PM2 in cluster mode). I am using node-postgres to associate the API with Postgresql RDS and am trying to use its pooling function. The following is an example connection pool code:
var pool = new Pool({ user: settings.database.username, password: settings.database.password, host: settings.database.readServer, database: settings.database.database, max: 25, idleTimeoutMillis: 1000 }); pool.query('SELECT my_column FROM my_table', function(err, result){ });
Using this implementation and testing with a load tester, I can support about 5,000 requests in one minute, with an average response time of about 190 ms (which I expect). As soon as I run more than 5000 requests per minute, the response time increases to more than 1200 ms in the best cases, and in the worst case, the API starts to interrupt time often. Monitoring shows that for EC2 running the Node.js API, CPU utilization remains below 10%. Thus, I focus on the database and binding the API to the database.
I tried to increase (and decrease in this respect) the node-postgres max connection setting, but there was no change in the behavior of the API response / timeout. I also tried providing IOPS on RDS, but no improvement. In addition, interestingly, I scaled RDS to m4.10xlarge (160 GB of memory, 40 vCPU), and while the load of the RDS processor decreased significantly, the overall API performance deteriorated significantly (I couldnโt even support 5000 requests per minute that I could with less RDS).
In many ways, I am in the wrong territory and do not know how best to determine which of these moving parts is the performance of the API with a bottleneck with more than 5000 requests per minute. As already noted, I made various adjustments based on the analysis of Postgresql configuration documentation and node-postgres documentation, but to no avail.
If anyone has any tips on how to diagnose or optimize, I would really appreciate it.
UPDATE
After scaling to m4.10xlarge, I performed a series of load tests, changing the number of requests / min and the maximum number of connections in each pool. Here are some screenshots for monitoring:

