Microsoft

# Will Your Team Win? Use Modern Excel to Predict Football Results

Every 4 years I have a ritual.

As a diehard fan of the Peruvian Soccer (Fútbol) Team, I’m obsessed with the South American Soccer Qualifying Tournament.  Peru’s last appearance in the finals was 35 years ago, and Slate Magazine describes my support for Peru as "an act of fatalism". However, though potentially delusional, I know this is our year.

Unfortunately for Peru, per DeadSpin, this year’s South American qualifying group is deemed the toughest in the world.  With such a difficult group, who qualifies always comes down to the last few games.  Especially for Peru.  In Peru, we have a saying: “Pray to the calculator gods” because usually whether we qualify depends on the results of other teams. “We’ll make it only if Argentina ties with Chile, AND Colombia beats Brazil by 3 goals, AND…”  Everyone becomes a math expert and creates elaborate result predictions and scenarios. It’s a Peruvian tradition.

That is why every 4 years, my ritual, as we approach the end of the tournament, is to build a model in Excel to predict the score of the final few games and play around with different scenarios. Since I LOVE Get & Transform (a new set of powerful Excel 2016 features which provides easy data gathering and shaping capabilities), I thought that this time around I would build the logic exclusively using this new tool.

My predictions on how the last 2 decisive games will play out, and the results from the model.Currently, there are 2 games left to be played in South America, on October 5th and 10th.  Against all odds, Peru is in good shape this time around, and currently sits in 4th place.  If the standings hold, Peru will qualify.

I’m providing a link to the Excel file at the end of the post, so you can predict your own results.  Note that Get & Transform is available natively as part of Excel 2016 and Office 365 subscriptions. If you have Excel 2010 or Excel 2013, you can also take advantage of these capabilities by downloading the latest Power Query for Excel add-in.

Here is roughly how I put it together.

I sourced some raw data from Wikipedia with the scores from all the previous games.  Since I’m not anticipating the historical scores changing, I simply copied the results from the site into the “Source Data” tab in Excel.  However, if we had more games to go and I wanted the scores to update, I could have used the “From Web” connector. Since we are only 2 games away from the end, I don’t care too much about having a live connection to the website.

Messy/Source Data

Of course, this data is super messy, so the first thing I did was to bring it into “Get & Transform” using the “From Table/Range” button in the “Data Tab”.  Note the ribbon might look a little different depending on which version of Excel you have.

Table/Range Connector

I cleaned the data by applying several transformations in the Query Editor, including removing columns and errors, splitting columns, joining tables, adding conditional columns, etc..  The result was a clean table with just the scores for each game (see below).

First set of transformations in the Query Editor

The fun part was building the logic for Points, Goals and tie breakers exclusively using Get & Transform.  BTW - I know there are multiple ways of building this in Excel, and would love to see other approaches.  The next challenge for myself will be to build the logic using PowerPivot and DAX.

At a high level, the logic that I built is as follows:

• Points: 3 points for a win, 1 point for a draw, 0 points for a loss
• If tied on points, use overall goal difference as the tie breaker
• If still tied, use overall goals scored as the tie breaker

Finally, I combined the historical results table with the score predictions table and brought the standings table back to the grid in Excel.

Final Query

In the process of building this model, I also learned about the “Query Dependencies” view that quickly helped me visualize how all the steps and queries are interconnected (see below).

Dependency View

To access this view, while in the “Query Editor” click on the “Query Dependencies” button in the “View” Tab.

As promised, here is a link to my South American Qualifying spreadsheet for you to build your own predictions. Note that you might get prompted by a "Security Warning" in Excel, click "Enable Content" to proceed.  If you want to see the details behind each of the queries, go to the “Data Tab> Get Data> Launch Query Editor”.  If you want to predict your own scenarios and outcomes you can simply change the scores in the “Score Predictions” table, and click “Refresh All” in the Data Tab. This will update the standings table.

The next challenge will be to build the logic using Power Pivot and DAX.

¡Vamos Perú!

Carlos Otero

- Excel Team

Hi Carlos,

Minor thing - for the majority another warning will be what the queries were built in newer version of Excel. I'm on the latest Insiders Slow Ring build and had such warning.

Microsoft

Good catch.  Were you still able to run the queries succesfully?

Yes, everything works. That's "just in case" warning.

Occasional Visitor

Hi Carlos, greetings from Lima!

Back in Brasil 2014 I built a small model on Excel to predict scores, from the second round through the final, for the betting pool (Polla Mundialista) we made at work. Basically the difference between probabilities of winning the worldcup that each country had, based on historical FIFA rankings which I got from this Deutsche Bank's report: https://www.deutschebank.nl/nl/docs/The_World_Cup_Hackers_Handbook.pdf

Long story short, I won the frist prize and got back home with a nice Pisco Portón bottle.

I'll try to merge your model with mine for Russia 2018 and let's see what happens...of course, la blanquirroja has to be there!

¡Vamos Perú!

Occasional Visitor

Hello Carlos,

what a nice job this is. Is it possible to convert your format to the English premier leauge? I have a love for English football, and sometimes I also place a bet. Do you think it would help me, whtat is your op opinion?