Algorithm for calculating the chances of the team that won the sports match, given the full history

Question

Algorithm for calculating the chances of the team that won the sports match, given the full history

Assumptions:

Teams never change
Teams do not improve in skill
The whole history of the effectiveness of each team against some subsets of other teams is known.
The number of games between teams is large, but potentially rare (each team did not play with each other)

For instance:

I have a long list of match results that look like this:

Team A beats Team B Team B beats Team A Team A beats Team B Team C beats Team A Team A beats Team C

Problem:

Predict the correct betting odds for any team beating any other team.

In the above example, perhaps we conclude that A should beat B 66% of the time. This is based on direct observation and quite simple. However, finding the probability that the C bit B seems harder. They never played together, but it seems that most likely C> B, with little certainty.

The research that I performed:

I read a fair cue ball about various rating systems for games with skills, such as rating systems Elo and Glicko for Chess. They are weakening because they make assumptions about the included probability distributions. For example, Elo's central assumption was that the chess performance of each player in each game is a normally distributed random variable. However, according to wikipedia, there are other distributions that are better suited to existing data.

I do not want to accept distribution. It seems to me that if there were 10,000 results of the comparison, I would either have to deduce the distribution from the evidence (I don’t know how to do it), or use some kind of reinforcement training scheme that does not care what the distribution is.

+5

math algorithm statistics game-theory

John shedletsky Nov 06 '14 at 0:03

source share

7 answers

We need to make some assumptions, as can be seen from the following example:

 Team Rock beats Team Scissors Team Paper beats Team Rock Team Rock beats Team Scissors

Now we have a fight between the Scissors and Team Paper teams. Since Team Paper beat Team Rock, who again beat the Scissors team twice, we can assume that there are better chances for scissors to beat the paper.

However, in the above, we adopted a transitive model that is clearly not applicable. It may be better suited to some sports such as football, but still not really.

What ELO does is to assume that all teams have some “inherent strength” on a scale from 0 to infinity . It should be obvious that no skills really work that way, but this proves to be a useful model for predicting games. Also pay attention to how this model does not work well for Rock, Paper, Scissors.

The next assumption made in chess is that the absolute difference in “inherent strength” creates a probability distribution over the probability that one player beats another. Again, this is clearly wrong, as things like the presence of white / black pieces can also be reproduced. However, you can clarify (show evidence) by looking at the chances of winning over several games.

With the above two assumptions, we can calculate the odds of winning, and if the model proves to be a good fit for the game, they can be quite accurate. Unfortunately, such a simulation is not an exact science, and no matter what assumptions you make, you can always find a game / situation where they are not applied.

Hope this gave you some inspiration in order to come up with more assumptions for your model :) After you get them, you can check how well it works by seeing if it can predict some results that you already have in ask your learning. There is a whole world of machine learning and statistical literature for you :)

+2

Thomas ahle Nov 11 '14 at 22:30

source share

Why don't you use the Wikipedia Interest Rate Rating Index. You may find it better explained, but as a quick introduction, you use the following formula:

RPI = (WP * 0.25) + (OWP * 0.50) + (OOWP * 0.25)

WP : win percentage wins / games _played

OWP : calculated by calculating the average WP value for each of the opponents of the team with the requirement to remove all games against the corresponding team from the calculation

OOWP : The average OWP of each adversary.

This issue was also used when trying Google Code Jam.

Hope algortime can help you.

All thanks to Wikipedia.

+1

David Sánchez Nov 14 '14 at 8:41

source share

You can try applying http://en.wikipedia.org/wiki/Elo_rating_system , mainly on the basis that other people have used this to develop strengths. However, any such system is really based on some probabilistic model of what is really happening, and you better try to come up with one for your specific game. For example, for football, one approach is to model the number of goals scored by a team as a Poisson process, which depends on the strength of their offense and the defense of the other side. When you have a model, you can put it in the data, for example, at the maximum probability.

For an example of a model that matters, see http://en.wikipedia.org/wiki/Nontransitive_dice . This is a simple example of a situation in which A usually beats B, B usually beats C, and C usually beats A, which is not what you would expect, given a simple one-dimensional power system.

0

mcdowella Nov 06 '14 at 5:42

source share

=== Step 1 ===

Let's say two teams A and B played n matches with each other, and A won m times. Using a flat beta distribution, the probability of winning the next time is as follows: (m + 1) / (n + 2).

As you can see, if m and n are large numbers, this is approximately equal to m / n.

=== Step 2 ===

In your case, I propose the following strategy.

Let m = mp + md + mr and n = np + nd + nr

Suffixes p mean advance, d means direct, and r means indirect.

You can set mp and np to 1 and 2 respectively (assuming they were flat earlier) or in an advanced way (described in detail at the end)

md and nd - wins and games.

mr and nr are computed with some strategy.

In the end, the probability of winning is A (mp + md + mr) / (np + nd + nr).

=== Step 3 ===

How to calculate mr and nr:

You can use some kind of moisturizing. For instance. If A def C and C def B, consider it as p for A versus B. For longer chains, use exponential decay.

The optimal value of p can be calculated using cross-validation, in which you do not take into account a certain part of the data and use p, which maximizes the probability of this data without deduction. For your specific problem, I suggest abandoning games between a pair, an estimated probability, and a comparison with actual values.

You can use: k * log ^ 2 (s / t) as a penalty, where k is the number of games between the left side of A and B, s is the predicted and t is the actual probability of winning A. You can also use something like KL- discrepancies.

=== Step 4 ===

Repeat mp and np settings.

To do this, you need to have many different matches between the same teams.

For each pair of teams, the probability of winning and conspiracy is calculated. If it looks flat using 1 and 2, like mp and np, that's fine. Otherwise, go to http://en.wikipedia.org/wiki/Beta_distribution and select the best match.

0

Elkamina Nov 12 '14 at 1:36

source share

This problem can be solved with the help of directed graphs.

Let all the teams be vertices, and the edge directed between team1 and team2 means that team1 beat team2.

After that, you can divide the graph into strongly connected components, and work with each connected component independently, since they are statistically independent. Or are they? Wink

The question we need to ask is , what is the likelihood that team 1 beats team 2 ?

This is easy to answer if the teams you are comparing have direct matches between them. In this case, you only care about direct matches; for example , how many times did team 1 beat team2 ? The way to answer the question;

(team1WinsAgainstTeam2)/(matchesPlayedBetweenThem)

It is right? If the chances for teamA decrease, when do we know that teamB played with teamX and WON 100 times, but teamX always beat teamA? If so, drop everything I said in this post :-)

The final algorithm should look something like this:

 double getOddsTeam1Winning(int team1, int team2){ if(!isInSameConnectedComponent(team1, team2)){ // if two teams are not in the same // connected component, // we can use heuristic, // we'll compare how many matches has team1 won, // compared to team2. var team1Wins = (team1Wins - team1Loses); var team2Wins = (team2Wins - team2Loses); return team1Wins / (team1Wins + team2Wins); } if(isDirectMatchBetween(team1, team2)){ return team1WinsOverTeam2/ totalMatchesPlayedBetweenTeam1AndTeam2; } List<double> odds= new List<double>(); foreach(var opponentTeam in teamsThatTeam1HasPlayedWith){ var oddsThatOponentTeamBeatsTeam2 = getOddsTeam1Winning(opponentTeam, team2); var oddsThatTeam1BeatsOpponentTeam = getOddsTeam1Winning(team1, opponentTeam); // combine them & push them to odds list } return odds.Average(); // taking average of odds?! }

ps, I put it together in a few minutes, not quite sure if it was mathematically corrected, but I think this will solve the problem in your original problem, which you indicated at least one instance: p.

0

Erti-Chris Eelmaa Nov 17 '14 at 17:09

source share

Rating interest rates index - implemented in C # below:

 // <copyright file="RPICalculator.cs" company="Steve Stokes Consulting, LLC"> // Copyright © Steve Stokes Consulting, LLC. All Rights Reserved. // </copyright> using System; using System.Collections.Generic; using System.Diagnostics; using System.Linq; using System.Text; using System.Threading.Tasks; namespace RPI.Calculator { public class RPICalculator { public void Test() { //Home Score Away Score //UConn 64 Kansas 57 //UConn 82 Duke 68 //Minnesota 71 UConn 72 //Kansas 69 UConn 62 //Duke 81 Minnesota 70 //Minnesota 52 Kansas 62 //resulting in the following ratings: //UConn: 0.6660 //Kansas: 0.6378 //Duke: 0.3840 //Minnesota: 0.3403 List<Team> Teams = new List<Team>(); List<Match> Matches = new List<Match>(); Teams.Add(new Team() { Name = "UConn", }); Teams.Add(new Team() { Name = "Kansas", }); Teams.Add(new Team() { Name = "Duke", }); Teams.Add(new Team() { Name = "Minnesota", }); Matches.Add(new Match() { HomeTeam = Teams.Where(t => t.Name == "UConn").First(), AwayTeam = Teams.Where(t => t.Name == "Kansas").First(), Winner = Teams.Where(t => t.Name == "UConn").First(), }); Matches.Add(new Match() { HomeTeam = Teams.Where(t => t.Name == "UConn").First(), AwayTeam = Teams.Where(t => t.Name == "Duke").First(), Winner = Teams.Where(t => t.Name == "UConn").First(), }); Matches.Add(new Match() { HomeTeam = Teams.Where(t => t.Name == "Minnesota").First(), AwayTeam = Teams.Where(t => t.Name == "UConn").First(), Winner = Teams.Where(t => t.Name == "UConn").First(), }); Matches.Add(new Match() { HomeTeam = Teams.Where(t => t.Name == "Kansas").First(), AwayTeam = Teams.Where(t => t.Name == "UConn").First(), Winner = Teams.Where(t => t.Name == "Kansas").First(), }); Matches.Add(new Match() { HomeTeam = Teams.Where(t => t.Name == "Duke").First(), AwayTeam = Teams.Where(t => t.Name == "Minnesota").First(), Winner = Teams.Where(t => t.Name == "Duke").First(), }); Matches.Add(new Match() { HomeTeam = Teams.Where(t => t.Name == "Minnesota").First(), AwayTeam = Teams.Where(t => t.Name == "Kansas").First(), Winner = Teams.Where(t => t.Name == "Kansas").First(), }); var results = Calculate(Teams, Matches, Sport.NCAA_Basketball); foreach (var team in results) { Debug.WriteLine(string.Format("{0} - {1}", team.Name.PadRight(Teams.Select(t => t.Name).Max(s => s.Length)), team.RPI)); } } private decimal defaultHomeWinValue = 1.0m; private decimal defaultAwayWinValue = 1.0m; private decimal homeWinValue = 1.0m; private decimal awayWinValue = 1.0m; public IEnumerable<TTeam> Calculate<TTeam, TMatch>(IEnumerable<TTeam> Teams, IEnumerable<TMatch> Matches, Sport Sport) { // TODO: transform a user team to our internal team type } /// <summary> /// Calculate the RPI of each team /// </summary> /// <param name="Teams">The list of teams to calculate RPI for</param> /// <param name="Matches">The list of matches and results of the matches the teams played in a period</param> /// <param name="Sport">The sport the teams played - this modifies home/away weighting based on NCAA rules</param> /// <returns>The list of teams with calculated RPI's</returns> public IEnumerable<Team> Calculate(IEnumerable<Team> Teams, IEnumerable<Match> Matches, Sport Sport) { SetWeightingBasedOnSport(Sport); foreach (var team in Teams) { // calculate the WP of each team team.WP = CalculateWinPercent(team, Matches, homeWinValue, awayWinValue); // calculate the OWP of each team team.OWP = CalculateOpponentsWinPercent(team, Matches); // calculate the OOWP of each team team.OOWP = CalculateOpponentsOpponentsWinPercent(team, Teams, Matches); // calculate the RPI for each team team.RPI = CalculateRPI(team); } return Teams.OrderByDescending(t => t.RPI); } private decimal CalculateRPI(Team team) { //RPI = (WP * 0.25) + (OWP * 0.50) + (OOWP * 0.25) //UConn: 0.6660 //Kansas: 0.6378 //Duke: 0.3840 //Minnesota: 0.3403 var RPI = (team.WP * 0.25m) + (team.OWP * 0.50m) + (team.OOWP * 0.25m); return Math.Round(RPI, 4); } private decimal CalculateOpponentsOpponentsWinPercent(Team teamInQuestion, IEnumerable<Team> Teams, IEnumerable<Match> Matches) { //UConn: ((Kansas 0.6667) + (Kansas 0.6667) + (Duke 0.3333) + (Minnesota 0.3889)) / (4 games) = 0.5139 //Kansas: ((UConn 0.7500) + (UConn 0.7500) + (Minnesota 0.3889)) / (3 games) = 0.6296 //Duke: ((UConn 0.7500) + (Minnesota 0.3889)) / (2 games) = 0.5694 //Minnesota: ((UConn 0.7500) + (Duke 0.3333) + (Kansas 0.6667)) / (3 games) = 0.5833 // get each team i've played this season (not unique) var teamsIvePlayed = Matches.Where(m => m.AwayTeam == teamInQuestion || m.HomeTeam == teamInQuestion).Select(s => s.HomeTeam == teamInQuestion ? s.AwayTeam : s.HomeTeam); // get the opponent win percent (OWP) of each team I played var teamsIvePlayedOpponentWinPercent = teamsIvePlayed.Select(t => new { Team = t, OWP = CalculateOpponentsWinPercent(t, Matches) }); // calculate the OOWP return (decimal)(teamsIvePlayedOpponentWinPercent.Sum(t => t.OWP) / teamsIvePlayed.Count()); } private decimal CalculateOpponentsWinPercent(Team teamInQuestion, IEnumerable<Match> Matches) { // get each teams WP without the team in question //Home Score Away Score //UConn 64 Kansas 57 //UConn 82 Duke 68 //Minnesota 71 UConn 72 //Kansas 69 UConn 62 //Duke 81 Minnesota 70 //Minnesota 52 Kansas 62 //UConn: ((Kansas 1.0) + (Kansas 1.0) + (Duke 1.0) + (Minnesota 0)) / (4 games) = 0.7500 //Kansas: ((UConn 1.0) + (UConn 1.0) + (Minnesota 0.0)) / (3 games) = 0.6667 //Duke: ((UConn 0.6667) + (Minnesota 0.0)) / (2 games) = 0.3333 //Minnesota: ((UConn 0.6667) + (Duke 0.0) + (Kansas 0.5)) / (3 games) = 0.3889 // get each team i've played this season (not unique) var teamsIvePlayed = Matches.Where(m => m.AwayTeam == teamInQuestion || m.HomeTeam == teamInQuestion).Select(s => s.HomeTeam == teamInQuestion ? s.AwayTeam : s.HomeTeam); // get the win percent of each team I played excluding matches with me var teamsIvePlayedWinPercent = teamsIvePlayed.Select(t => new { Team = t, WP = CalculateWinPercent(t, Matches.Where(m => m.AwayTeam != teamInQuestion && m.HomeTeam != teamInQuestion), defaultHomeWinValue, defaultAwayWinValue) }); // calculate the OWP return (decimal)(teamsIvePlayedWinPercent.Sum(t => t.WP) / teamsIvePlayed.Count()); } private decimal CalculateWinPercent(Team teamInQuestion, IEnumerable<Match> Matches, decimal HomeWinValue, decimal AwayWinValue) { // get the teams win percent - sometimes weighted based on NCAA rules //UConn: (0.6 + 0.6 + 1.4 + 0) / (0.6 + 0.6 + 1.4 + 1.4) = 0.6500 //Kansas: (0 + 0.6 + 1.4) / (1.4 + 0.6 + 1.4) = 0.5882 //Duke: (0 + 0.6) / (1.4 + 0.6) = 0.3000 //Minnesota: (0 + 0 + 0) / (0.6 + 1.4 + 0.6) = 0.0000 // get my wins and sum with weighting var wins = Matches.Where(m => m.Winner == teamInQuestion).Sum(s => s.HomeTeam == teamInQuestion ? HomeWinValue : AwayWinValue); // get my games played count weighted var gamesPlayed = Matches.Where(m => m.HomeTeam == teamInQuestion || m.AwayTeam == teamInQuestion).Sum(s => s.HomeTeam == teamInQuestion ? HomeWinValue : AwayWinValue); // get the WP return wins / gamesPlayed; } private void SetWeightingBasedOnSport(Sport Sport) { switch (Sport) { case Sport.NCAA_Basketball: homeWinValue = 0.6m; awayWinValue = 1.4m; break; case Sport.NCAA_Baseball: homeWinValue = 0.7m; awayWinValue = 1.3m; break; default: homeWinValue = defaultHomeWinValue; awayWinValue = defaultAwayWinValue; break; } } } public enum Sport { NoHomeOrAwayWeighting = 1, NCAA_Basketball = 2, NCAA_Baseball = 3, } public class Team { public string Name { get; set; } public decimal RPI { get; internal set; } public decimal WP { get; internal set; } public decimal OWP { get; internal set; } public decimal OOWP { get; internal set; } } public class Match { public Team HomeTeam { get; set; } public Team AwayTeam { get; set; } public Team Winner { get; set; } } }

0

Steve stokes Dec 03 '14 at 21:50

source share

Julian · Accepted Answer · 2014-11-13T21:57:18+0000

You want to make the best estimate of probability (or multiple probabilities) and constantly update your estimate as more data becomes available. This requires a Bayesian conclusion ! Bayesian reasoning is based on the observation that the probability (distribution) of two things, A and B, is at the same time equal to the probability (distribution) of A in the case when B is the case of probability probability that B is the case. In the form of a formula:

P (A, B) = P (A | B) P (B)

and

P (A, B) = P (B | A) P (A)

and therefore

P (A | B) P (B) = P (B | A) P (A)

Take P (B) on the other hand, and we get a Bayesian update rule:

P (A | B) '= P (B | A) P (A) / P (B)

Typically, A stands for any variable that you are trying to evaluate (for example, “team x beats team y”), while B stands for your observations (for example, the full history of matches won and lost teams). I wrote a simple one (i.e. the Quote in P (A | B) ' ) to indicate that the left hand of the equation is an update of your beliefs. To make this specific, your new estimate of the likelihood that team x will beat team y, given all the observations so far, is the probability of making these observations, taking into account your previous estimate, multiplied by the previous estimate, divided by the total probability of seeing the observations, which you saw (i.e. didn’t make any assumptions about the relative strength between the teams, the one team that won most of the time is less likely than both teams to win almost equally often).

P (A | B) 'to the left of the current update becomes the new P (A) on the right side of the next update. You just repeat this as more data comes in. Typically, in order to be as unbiased as possible, you start with a completely flat distribution for P (A). Over time, P (A) will become more and more confident, although the algorithm does pretty well with the sudden changes in the probability that you are trying to evaluate (for example, if team x suddenly becomes much stronger because a new player joins the team).

The good news is that Bayesian inference works well with the beta distribution that Elkamine also mentioned. In fact, the two are often combined in artificial intelligence systems that are designed to study probability distributions. Although beta distribution itself is still an assumption, it has the advantage that it can take many forms (including completely flat and extremely spikey), so there is relatively little reason to fear that your choice of distribution may affect your outcome .

One of the bad news is that you still need to make assumptions besides the beta distribution. For example, suppose you have the following variables:

A: team x beats team y
B: team y beats team z
C: team x beats team z

and you have observations from direct matches between x and y and matches between y and z, but not from matches between x and z. A simple (albeit naive) way to evaluate P (C) can be to adopt transitivity:

P (C) = P (A) P (B)

No matter how sophisticated your approach, you will need to define some kind of probability structure to fill in the gaps as well as the interdependence of your data. Whatever structure you choose, it will always be an assumption.

Another bad news is that this approach is rather complicated and I cannot give you full information on how to apply it to your problem. Given that you need a structure of interdependent probabilities (the probability of team x beating team y for other distributions involving teams x, y, and z), you can use a Bayesian network or related analysis (for example, Markov random field or path analysis ).

Hope this helps. In any case, feel free to ask for clarification.

Algorithm for calculating the chances of the team that won the sports match, given the full history

More articles: