Abstract
The following refers to the Club Soccer Predictions dataset that can be found here: https://data.fivethirtyeight.com/. I will refer to football rather than soccer from now on. I will attempt to see how I can use this data to make bets against the my local bookmakers. The main goal is to have a minimum profit margin of 20%, however I realize that my account may be restricted or closed if /when the bookmakers pick up that I have clear winning betting strategy. I live in Namibia and therefore have no access to betting exchanges such as Betfair or Matchbook.
The dataset already contains predictions of win, loss or draw. These predictions are given in percentage form. For example a game between Chelsea and Liverpool maybe be displayed as :
Chelsea : 25% Draw : 10% Liverpool : 65%
I will dive deeper to see how to best extract value to achieve the best win rate for the different football leagues. I will focus on the home win rates, away win rates and the draw rates. I attempt to achieve this by comparing the predictions made versus the actual results. I will also focus on over/under 2.5 goals. There is no column for this in the dataset, therefore I will have to create one. I can then compare the predicted goals versus the actual goals scored.
The methods used to derive the answers were basic arithmetic, groupings and pivot tables and visualized through tables and charts.
Data Preparation and Cleaning
The dataset(https://www.kaggle.com/code/lindleylawrence/fivethirtyeight-all-soccer-data) is arranged in a good order. For the information I need, I would need to add extra columns for easy reading and easy computation.
Research Questions
- Which 5 leagues has the best home win rate predictions : min 65%
- Which 5 leagues has the best away win rate predictions : min 65%
- Which 5 leagues has the best win rate for over 2.5 goals predictions : min 65%
- Which 5 leagues has the best win rate for under 2.5 goals predictions : min 65%
- Which of the above (1-4) will give a minimum expected profit of 20% using the formula
- average win rate x bookmaker odds = expected profit
- (expected payoff - 1) * 100 = expected profit percentage
I think the most time-consuming part will be searching for the best bookmaker odds per league per game. For example if the Barclays Premier League has a home win rate of say 70%, I would need to 1st see if the probability of winning from the dataset was high enough, then find bookmaker odds in favour of the home team of at least 1.72 or better to have an expected profit of 20%. The workings for the example is as follows:
- average win rate x bookmaker odds = expected profit
- 0.7 * 1.72 = 1.204¶
- (expected payoff - 1) * 100 = expected profit percentage
- (1.204 - 1) * 100 = 20.4%
Other things to consider:
- When the leagues with the best win rates are playing to make full use to forward test the strategy?
- How long should I forward test the strategy before committing actual money?
- What will be the budget and how much to wage per game?
- What will the source of funds be? Own? Seek outside funds?
Overall win rate predictions
The is the table for the top 20 leagues with the best average win predictions.
The above tables shows that the FA Women's Super League has an overall accuracy winning prediction of 65%. But this is not good enough and we will have to go deep and group the predictions into bins of 10% intervals.
Overall home win rates per prediction percentage grouping
Once the data is binned we see that naturally a better accuracy is achieved when fivethirtyeight assigns a higher percentage of winning to the home team. This is obvious since in most cases the stronger team should win at home.
The above shows that an accuracy of 89% is achieved, when fivethirtyeight predicts a 91%-100% chance for the home team to win.
Drill down into home win rates for individual leagues
If we go into the Japanese J League we can see the home win predictions below
You can read the data from row "1" like this : The predictions for the Japanese J League for home team wins that falls in the 71-80% home win prediction group have an average win rate of 68.01%. So if fivethirtyeight predicts that a team has a win percentage between 71% and 80% then one will have a 68.01% change of winning a bet. No great, not terrible.
After that we can sort all the leagues and attempt to and to answer the first question.
Research Question 1
Q1. Which 5 leagues has the best home win rate predictions : min 65%?
A1. FA Women's Super League is at the number 1 spot with four prediction groupings of home prediction win rate and then there are 8 leagues in all tied at second place.
Research Question 2
Q2. Which 5 leagues has the best away win rate predictions : min 65%
A2. The FA Women's Super League and Portuguese Liga are at the number 1 spot with four groupings of away prediction win rate followed by five leagues in all tied at second place.
Research Question 3
Q3. Which 5 leagues has the best win rate for over 2.5 goals predictions : min 65%
A3. There are no leagues with a minumin win rate of 65% with the Dutch Eredivisie achieving the best : 54.67%
Research Question 4
Q4. Which 5 leagues has the best win rate for under 2.5 goals predictions : min 65%
A4. There are no leagues with a minumin win rate of 65% with the South African ABSA Premier League achieving the best: 45.92%
I expected to find a better win rate prediction for the over/over 2.5 goals but this was not the case. Only eight leagues has a win rate of over 50% which means I would be need to find odds of at least 2.24 for the top three leagues for over 2.5 goals which won't that be easy. The good win rates for home and away teams were encouraging and with that I can answer the last research question.
Research Question 5
Q5. Which of the above (1-4) will give at least an expected profit percentage of 20% using the fomula :
average winrate * bookmaker odds = expected profit
(expected payoff - 1) * 100 = expected profit percentage
A5. 1(home win rate predictions) and 2(away win rate predictions)
The best strategy to maximize value will be to forward test the 2 tops leagues as per the below table:
Both leagues are at the halfway stage so I will attempt to forward test them until the end of their season starting January 2023. The forward test will work as follows:
* NAD500 wager per game
* Wagers to be placed two days before the actual match
EDIT: Current betting sites do not offer wagers on the FA Women's Super League so I am left with the Greek Super League, the Barclays Premier League and the Italy Serie A. All three of them have 7 over 65% prediction grouping accuracy. I am choosing the Greek Super League simply due to it being less high profile which often bring favourable odds. Updated table below:
Aftermath
I saved all my work on a Google Sheets file which can be found here : https://docs.google.com/spreadsheets/d/1uCEb56bqeDytM1UFSWHYUR834ArqRq21U_1FHX0A3GU/edit?usp=sharing. The summary is as follows :
So not that great for all the effort involved. Also fivethirtyeight will no longer be providing picks for sports. If I compare the above data for the same period with other markets there are mixed results:
From the above data bitcoin seems to be the clear winner. The S&P 500 also did good for a 5 month period while gold came in last but still decent. The main difference is that if you considered any of these options who would have just bought and held. Much less work than with betting. It all depends on your risk appetite.
This is just a small study with 296 data points over a 5 month period. I believe there more expensive studies using more sophiticated methods and with much better results. If you happen to know any of them please reach out to me. For now I will just be betting for fun.