Puma or Unicorn VS Webbrick performance test shows no improvement

Question

Puma or Unicorn VS Webbrick performance test shows no improvement

Customization

Ok, I am running the rails application on Heroku (free tier).

I have 2 separate app releases that let me call them Staging and Fake-Production.

In Staging, I use Webbrick as a server. My Procfile is

 web: rails s -p $PORT

In Fake-Production, I use Puma as a server. My Procfile is

 bundle exec puma -C config/puma.rb

I configured puma to work with 2 workers and 1 thread per worker. config/puma.rb defined below (taken from Heroku Puma web server setup )

 workers Integer(ENV['WEB_CONCURRENCY'] || 2) threads_count = Integer(ENV['MAX_THREADS'] || 1) threads threads_count, threads_count preload_app! rackup DefaultRackup port ENV['PORT'] || 3000 environment ENV['RACK_ENV'] || 'development' on_worker_boot do # Worker specific setup for Rails 4.1+ # See: https://devcenter.heroku.com/articles/deploying-rails-applications- with-the-puma-web-server#on-worker-boot ActiveRecord::Base.establish_connection end

My database.yml configured to have a connection pool of 20.

Test

To do load testing, I used the ApacheBench tool from my laptop to get to the API endpoint. The API basically makes a very simple database query to return a fixed number of records (unchanged).

I hit both deployments with the following code:

 ab -n 1000 -c 100 https://<some heroku endpoint>?access_token=f73f50514c

results

The results here are the most amazing. I expected the Puma deployment to completely destroy the Webbrick deployment, but in reality it was almost the same. I tried using different API endpoints, as well as a different combination of Puma employees and threads (at some point it was 4 employees and 5 threads), and yet there were no visible improvements.

Webbrick Results

 Server Software: WEBrick/1.3.1 Server Hostname: webbrick-build.herokuapp.com Server Port: 443 SSL/TLS Protocol: TLSv1,DHE-RSA-AES128-SHA,2048,128 Document Path: /api/v1/packages?access_token=f73f50514c6 Document Length: 488 bytes Concurrency Level: 100 Time taken for tests: 21.484 seconds Complete requests: 1000 Failed requests: 0 Total transferred: 995000 bytes HTML transferred: 488000 bytes Requests per second: 46.55 [#/sec] (mean) Time per request: 2148.360 [ms] (mean) Time per request: 21.484 [ms] (mean, across all concurrent requests) Transfer rate: 45.23 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 714 1242 278.1 1214 2012 Processing: 248 842 493.6 699 2883 Waiting: 247 809 492.3 677 2876 Total: 1072 2085 643.5 1929 4845 Percentage of the requests served within a certain time (ms) 50% 1929 66% 2039 75% 2109 80% 2168 90% 2622 95% 3821 98% 4473 99% 4646 100% 4845 (longest request)

Memory effect

 source=web.1 dyno=heroku.1234567899 sample#memory_total=198.41MB sample#memory_rss=197.60MB sample#memory_cache=0.30MB sample#memory_swap=0.51MB sample#memory_pgpgin=103879pages sample#memory_pgpgout=53216pages

Puma results (more or less the same regardless of the number of workers / threads)

 Server Software: Cowboy Server Hostname: puma-build.herokuapp.com Server Port: 443 SSL/TLS Protocol: TLSv1,DHE-RSA-AES128-SHA,2048,128 Document Path: /api/v1/packages?access_token=fb7168c147adc2ccd83b2 Document Length: 489 bytes Concurrency Level: 100 Time taken for tests: 23.299 seconds Complete requests: 1000 Failed requests: 0 Total transferred: 943000 bytes HTML transferred: 489000 bytes Requests per second: 42.92 [#/sec] (mean) Time per request: 2329.949 [ms] (mean) Time per request: 23.299 [ms] (mean, across all concurrent requests) Transfer rate: 39.52 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 743 1304 283.9 1287 2092 Processing: 253 951 740.3 684 5353 Waiting: 253 898 729.0 627 5196 Total: 1198 2255 888.0 1995 7426 Percentage of the requests served within a certain time (ms) 50% 1995 66% 2085 75% 2213 80% 2444 90% 3755 95% 4238 98% 5119 99% 5437 100% 7426 (longest request)

Memory Impact (4 workers, 5 threads)

 source=web.1 dyno=heroku.1234567890 sample#memory_total=406.75MB sample#memory_rss=406.74MB sample#memory_cache=0.00MB sample#memory_swap=0.00MB sample#memory_pgpgin=151515pages sample#memory_pgpgout=47388pages

Based on the above snippets, sometimes Puma deployment will be faster than Webbrick, while in other cases it may be slower (as shown in the snippet). Even if it is much faster, the speed is low, probably only increasing by 1-5 requests / sec.

My question is: what am I doing wrong? Is my database pool somehow to blame? Am I comparing this wrong? Am I using Puma incorrectly?

EDIT:

Highest CPU utilization for Puma (5 workers and 5 threads each)

 source=web.1 dyno=heroku.123456789 sample#load_avg_1m=2.98

Most of the time, however, is either 0.00, or less than 0.1.

In addition, the only code that is called in the controller is:

 @package = Package.all

Immediately after that, the JSON response declared in HAML is visualized.

Btw, Package.all only returns about 5 records.

EDIT 2:

UNICORN RESULTS

Realized unicorn in accordance with. Running 3 working unicorn.

 Server Software: Cowboy Server Hostname: unicorn-build.herokuapp.com Server Port: 443 SSL/TLS Protocol: TLSv1,DHE-RSA-AES128-SHA,2048,128 Document Path: /api/v1/packages?access_token=f73f50514c6b8a3ea Document Length: 488 bytes Concurrency Level: 100 Time taken for tests: 22.311 seconds Complete requests: 1000 Failed requests: 0 Total transferred: 942000 bytes HTML transferred: 488000 bytes Requests per second: 44.82 [#/sec] (mean) Time per request: 2231.135 [ms] (mean) Time per request: 22.311 [ms] (mean, across all concurrent requests) Transfer rate: 41.23 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 846 1326 294.5 1304 2720 Processing: 245 627 342.8 540 3061 Waiting: 244 532 313.6 470 3057 Total: 1232 1954 463.0 1874 4875 Percentage of the requests served within a certain time (ms) 50% 1874 66% 2016 75% 2161 80% 2250 90% 2466 95% 2799 98% 3137 99% 3901 100% 4875 (longest request)

One thing that ive noticed is that running the same load test code several times will result in different “queries in seconds”. This applies to both the Unicorn and the Puma. For Unicorn and Puma, the best Queries per second are around 48-50, and the worst are around 25-33.

In any case, this still makes no sense. Why isn't it Puma or Unicorn crushing Webbrick?

+5

performance multithreading ruby ruby-on-rails heroku

Tikiboy Jan 27 '16 at 11:27

source share

1 answer

Mihai-Andrei Dinculescu · Answer 1 · 2016-01-27T12:18:58+0000

I hope you closely watched Heroku Deploying Rails applications with the Puma web server .

I assume that your test environment minimizes the multi-threaded benefits, or an HTTP server with a bottle using an SQL database.

Your API calls, especially if they cache database results, can be intense. Having 10 threads is not an advantage when the processor is using 100% only 1. Thread control can actually interfere with work in this case.

Multithreading is useful when your workflows wait a lot of time for resources (databases, files, etc.) instead of using the CPU.

The second possibility is that your HTTP server is limited by the database. Perhaps WEBrick is moving as fast as the database allows, leaving no room for improvement, switching to a more efficient HTTP server.

You must give this comprehensive impression report to read.

You will notice that Puma is not one of the fastest Rails HTTP servers. If all you care about is speed, try Unicorn or Torquebox 4 if you use JRuby.

Here's a guide on how to set up Unicorn on Heroku.

Puma or Unicorn VS Webbrick performance test shows no improvement

More articles: