The premise of the #PredictTheKick-competition launched by SAP a while ago is quite simple, yet a little bit different than what most people are used to; “predict which country will win each group and by what margin of goals”.

Generally, we fill out a sheet with results for all the individual matches based on our own expectations whereby getting the teams and results correctly later-on in the tournament is often rewarded more compared to the early stages of the tournament. As we all know; our own expectations are often biased! As such, the question becomes if we can use a more grounded approach to predict match results based on e.g. history, while also calculating something about the actual goal differences in matches (i.e. instead of just predicting the winner of the match with certain odds).

While looking for an answer to this question, an often mentioned methodology is that of Dixon and Coles (1997); “Modelling Association Football Scores and Inefficiencies in the Football Betting Market. Even though this approach generally focusses on league-betting, where there is a more rich match-history between the teams and the consideration that more recent results can be weighted higher is perhaps more meaningful; for the sake of trying to be creative for the #PredictTheKick-competition an attempt was made to pseudo-apply this approach to predict the winner of each group and the margin of goals compared to the runner-up of the group.

Using the described R Models by Jonas in his “opisthokonta”-blog , the following approach was used (note: for all Predictive and/or football-enthusiasts, the blog contains great reads about Statistics and Data Analysis with a focus on analyzing football results!):

  • Firstly, grouping the 32 countries into several ‘categories of strength’ (in total 10 categories used). Reasoning behind this approach is that there might not be a (representative) historical dataset between individual countries to actually use to predict new outcomes. The grouping was based on several factors, such as;

    • Current FIFA-ranking;

    • Sum of the FIFA in-game player statistics with regards to the ‘final’ selection of players of each country.

  • Predict the individual group stage results by looking at a historical dataset consisting of:

    • Use the matches between the actual countries (if available), as a base-dataset;

    • Enhancing the base-dataset by adding matches that are similar with regards to ‘opponent’, based on the above-mentioned country-grouping, if a) would not be sufficient;

    • Include a weighting factor both for time (e.g. matches > 4 years ago, exponentially weighed as less relevant) as well as the nature of the match (i.e. tournament > qualifying > practice, …).

Looking at some of the key metrics that can then be derived while using the first match ‘Russia vs. Saudi Arabia’ of June 14th as an example;

1) The Attack and Defense parameters of each of the two teams;

2) The calculated probability of a win for either Russia or Saudi Arabia, or, a draw;

3) Plotting the Probability Distribution of the sum of the goal differences between the teams

The above results for each induvial match resulted in the following overall group results and goal delta of the group winner and runner-up:

And, in a bold attempt to use the same methodology for the remainder of the tournament:

Did you enter the #PredictTheKick-competition yourself or have different ideas for predicting the results of the Russia World Cup Football, starting today, do not hesitate to share your remarks / opinion!

Additionally, McCoy's are more than willing to help you on the road of Predictive Analytics in general! Please feel free to contact us for more information