Reporting by grouped average value for a group of records

Question

Reporting by grouped average value for a group of records

My goal is to create a report showing the average garage occupancy (y axis) on a specific day of the week and / or time of day. My data model is as follows:

Garage has_many Cars and Garage has_many Appointments, through: :cars
Car has_many Appointments
The destination has fields such as:
- selected_up_at (datetime)
- return_at (datetime)

In addition, the garage has a capacity (integer) field, which is the maximum number of cars that will enter the garage.

If I have a list of appointments for the last 6 months, and I would like to create a line graph with the x-axis showing each day of the week, divided into 4-hour intervals, and the y-axis showing the average% occupancy (number of cars in the garage / capacity ) over a 6-month period for a given day / hour interval, how can I collect data for a report?

eg. In car from the moment one Destination returns to the next destination pickup and Out from the Destination pickup until returned_at .

I am having trouble connecting these data points to the best way to significantly report and present them to the end user.

I am using Rails 4.1 and Ruby 2.0.

Edit: SQL Fiddle - http://sqlfiddle.com/#!9/a72fe/1

+5

ruby sql ruby-on-rails activerecord postgresql

jackerman09 Dec 16 '15 at 20:24

source share

1 answer

Erwin brandstetter · Answer 1 · 2015-12-16T21:29:54+0000

This request will do everything (adapted to your added fiddle):

 SELECT a.ts, g.*, round((a.ct * numeric '100') / g.capacity, 2) AS pct FROM ( SELECT ts, c.garage_id, count(*) AS ct FROM generate_series(timestamp '2015-06-01 00:00' -- lower and , timestamp '2015-12-01 00:00' -- upper bound of range , interval '4h') ts JOIN appointment a ON a.picked_up_at <= ts -- incl. lower AND (a.returned_at > ts OR a.returned_at IS NULL) -- excl. upper bound JOIN car c ON c.id = a.car_id GROUP BY 1, 2 ) a JOIN garage g ON g.id = a.garage_id ORDER BY 1, 2;

SQL Fiddle

If returned_at IS NULL , this request assumes that the vehicle is still in use. Therefore, NULL should not occur for other cases or you have a calculation error.

First, I create time series using the convenient generate_series() .

Then join the meetings where the timestamp falls inside the reservation.
I assume that each assignment includes a lower and exclusive upper timestamp, as this is a widespread convention.

The totality and counting before we join the garages (faster this way). For comparison:

A collection of one column in a query with many columns

Percentage calculations in an external SELECT .
I multiply the bigint number by numeric (or optionally real or float ) to save fractional digits that will be truncated in integer division. Then I round to two fractional digits.

Note that this is not exactly the average percentage of each 4-hour period, but only the current percentage at each point in time, which is an approximation of the true average. You can start with an odd timestamp, for example, “2015-06-01 01:17”, so as not to fall between orders that are likely to roll over after a full hour or something else that can increase the average approximation error.

You can make an accurate calculation for 4 periods, but it is more complicated. One simple way would be to reduce the interval to 10 minutes or some detail that would be detailed enough to capture the full picture.

Related (with example for exact calculation):

Reporting by grouped average value for a group of records

More articles: