Customization
Ok, I am running the rails application on Heroku (free tier).
I have 2 separate app releases that let me call them Staging and Fake-Production.
In Staging, I use Webbrick as a server. My Procfile is
web: rails s -p $PORT
In Fake-Production, I use Puma as a server. My Procfile is
bundle exec puma -C config/puma.rb
I configured puma to work with 2 workers and 1 thread per worker. config/puma.rb defined below (taken from Heroku Puma web server setup )
workers Integer(ENV['WEB_CONCURRENCY'] || 2) threads_count = Integer(ENV['MAX_THREADS'] || 1) threads threads_count, threads_count preload_app! rackup DefaultRackup port ENV['PORT'] || 3000 environment ENV['RACK_ENV'] || 'development' on_worker_boot do
My database.yml configured to have a connection pool of 20.
Test
To do load testing, I used the ApacheBench tool from my laptop to get to the API endpoint. The API basically makes a very simple database query to return a fixed number of records (unchanged).
I hit both deployments with the following code:
ab -n 1000 -c 100 https://<some heroku endpoint>?access_token=f73f50514c
results
The results here are the most amazing. I expected the Puma deployment to completely destroy the Webbrick deployment, but in reality it was almost the same. I tried using different API endpoints, as well as a different combination of Puma employees and threads (at some point it was 4 employees and 5 threads), and yet there were no visible improvements.
Webbrick Results
Server Software: WEBrick/1.3.1 Server Hostname: webbrick-build.herokuapp.com Server Port: 443 SSL/TLS Protocol: TLSv1,DHE-RSA-AES128-SHA,2048,128 Document Path: /api/v1/packages?access_token=f73f50514c6 Document Length: 488 bytes Concurrency Level: 100 Time taken for tests: 21.484 seconds Complete requests: 1000 Failed requests: 0 Total transferred: 995000 bytes HTML transferred: 488000 bytes Requests per second: 46.55 [
Memory effect
source=web.1 dyno=heroku.1234567899 sample#memory_total=198.41MB sample#memory_rss=197.60MB sample#memory_cache=0.30MB sample#memory_swap=0.51MB sample#memory_pgpgin=103879pages sample#memory_pgpgout=53216pages
Puma results (more or less the same regardless of the number of workers / threads)
Server Software: Cowboy Server Hostname: puma-build.herokuapp.com Server Port: 443 SSL/TLS Protocol: TLSv1,DHE-RSA-AES128-SHA,2048,128 Document Path: /api/v1/packages?access_token=fb7168c147adc2ccd83b2 Document Length: 489 bytes Concurrency Level: 100 Time taken for tests: 23.299 seconds Complete requests: 1000 Failed requests: 0 Total transferred: 943000 bytes HTML transferred: 489000 bytes Requests per second: 42.92 [
Memory Impact (4 workers, 5 threads)
source=web.1 dyno=heroku.1234567890 sample#memory_total=406.75MB sample#memory_rss=406.74MB sample#memory_cache=0.00MB sample#memory_swap=0.00MB sample#memory_pgpgin=151515pages sample#memory_pgpgout=47388pages
Based on the above snippets, sometimes Puma deployment will be faster than Webbrick, while in other cases it may be slower (as shown in the snippet). Even if it is much faster, the speed is low, probably only increasing by 1-5 requests / sec.
My question is: what am I doing wrong? Is my database pool somehow to blame? Am I comparing this wrong? Am I using Puma incorrectly?
EDIT:
Highest CPU utilization for Puma (5 workers and 5 threads each)
source=web.1 dyno=heroku.123456789 sample#load_avg_1m=2.98
Most of the time, however, is either 0.00, or less than 0.1.
In addition, the only code that is called in the controller is:
@package = Package.all
Immediately after that, the JSON response declared in HAML is visualized.
Btw, Package.all only returns about 5 records.
EDIT 2:
UNICORN RESULTS
Realized unicorn in accordance with. Running 3 working unicorn.
Server Software: Cowboy Server Hostname: unicorn-build.herokuapp.com Server Port: 443 SSL/TLS Protocol: TLSv1,DHE-RSA-AES128-SHA,2048,128 Document Path: /api/v1/packages?access_token=f73f50514c6b8a3ea Document Length: 488 bytes Concurrency Level: 100 Time taken for tests: 22.311 seconds Complete requests: 1000 Failed requests: 0 Total transferred: 942000 bytes HTML transferred: 488000 bytes Requests per second: 44.82 [
One thing that ive noticed is that running the same load test code several times will result in different “queries in seconds”. This applies to both the Unicorn and the Puma. For Unicorn and Puma, the best Queries per second are around 48-50, and the worst are around 25-33.
In any case, this still makes no sense. Why isn't it Puma or Unicorn crushing Webbrick?