I have a baseball tool that allows users to analyze player statistics by history. For example, how many hits has A-Rod in the last 7 days at night? I want to expand the timeframe so that the user can analyze the playerβs batting statistics up to 365 days. However, this requires some serious performance optimization. Here is my current set of models:
class AtBat < ActiveRecord::Base belongs_to :batter belongs_to :pitcher belongs_to :weather_condition ### DATA MODEL ### # id # batter_id # pitcher_id # weather_condition_id # hit (boolean) ################## end class BattingStat < ActiveRecord::Base belongs_to :batter belongs_to :recordable, :polymorphic => true # eg, Batter, Pitcher, WeatherCondition ### DATA MODEL ### # id # batter_id # recordable_id # recordable_type # hits7 # outs7 # at_bats7 # batting_avg7 # ... # hits365 # outs365 # at_bats365 # batting_avg365 ################## end class Batter < ActiveRecord::Base has_many :batting_stats, :as => :recordable, :dependent => :destroy has_many :at_bats, :dependent => :destroy end class Pitcher < ActiveRecord::Base has_many :batting_stats, :as => :recordable, :dependent => :destroy has_many :at_bats, :dependent => :destroy end class WeatherCondition < ActiveRecord::Base has_many :batting_stats, :as => :recordable, :dependent => :destroy has_many :at_bats, :dependent => :destroy end
To keep my question a reasonable length, let me talk about what I'm doing to update the batting_stats table, rather than copy a bunch of code. Start with 7 days.
- Get all at_bat entries in the last 7 days.
- Iterate over each at_bat entry ...
- Given the at_bat entry, take the related dough and the appropriate weather condition, find the desired batting_stat entry (BattingStat.find_or_create_by_batter_and_recordable (batter, weather_condition), and then update the batting_stat entry.
- Repeat step 3 for the test and the jug (record).
Steps 1-4 are repeated for other time periods - 15 days, 30 days, etc.
Now I imagine how painstaking it would be to run the script every day to make these updates, if I would increase the time periods from mangeable 7/15/30 to 7/15/30/45/60/90/180/365.
So my question is: how do you approach this to be performed at the highest level of performance?
source share