Conditional Data Accumulation for Sport Analytics

Copper Contributor

I'm working on a dataset (Excel version 16.54 on Mac Big Sur 11.5.2) to eventually analyze the effect of shots, shots on target, corners, fouls, and cards on betting odds for a college assignment. As seen in the file, I have data from all English Premier League games from the last 10 seasons. Each game tells the home and away teams along with their game statistics listed as either home or away. I'm hoping to fill the blank columns with accumulated data from throughout each season. For example, I would like to know how many shots a team like Arsenal had accumulated coming into each game by adding their shot totals from each of the previous games in the season. This is difficult because the teams are listed in two different columns (home or away), so the game statistics also vary based on if the team was the home team or away for each given week. Please let me know of any more clarifications needed and/or if there is a formula to do this, as I don't have the time or will power to manually add the data for 3910 rows of games. 

3 Replies

@evwill616 It might take 10 -20 minutes copying, pasting and deleting to prepare the data. Not all that bad. Just make sure that the original data set is not changed so that you can always come back to it to check or start from scratch if something went wrong. You will have to create two rows of data for each game, each row identifying if it is for the Home or Away team.

 

Create two separate sheets with all game data. On each, add a column that will hold a game index number from 1 to 3910. Call one sheet HOME, the other AWAY. On each of them, delete the columns that contain information that do not relate to Home or Away, so that you end up with two data sets. One for the Home teams and one for the Away teams. On each sheet, insert a column marking the teams either as Home or Away. Now you can copy both data sets onto another sheet below each other and sort by game number (then by H or A). Rename the column headers to neutral names (i.e. without a reference to "H" or "A". It will then look something like this (just the first 10 columns shown).

Screenshot 2021-11-15 at 08.43.37.png

(In Excel for Windows, you could use Power Query for this. It will be more sturdy and less prone to error. And, once set-up you can use it over and over again. So, if you can get your hands on a PC, do that instead.)

 

By the way, you need to clean-up your data first as the EPL data is inconsistent. From row 3042 and down, all data shifts one column because of the time information inserted there. Either insert blank space in the rows above it or delete the time information from the rows below it. All columns need to be lined up correctly to begin with.

@evwill616 

=CHOOSE(VLOOKUP(A2,$CG$2:$CH$12,2,FALSE),SUMPRODUCT(($A$2:A2="2011/12")*
($L$2:L2)*(($C$2:C2=$CI$2)+($D$2:D2=$CI$2))),SUMPRODUCT(($A$2:A2="2012/13")*
($L$2:L2)*(($C$2:C2=$CI$2)+($D$2:D2=$CI$2))),SUMPRODUCT(($A$2:A2="2013/14")*
($L$2:L2)*(($C$2:C2=$CI$2)+($D$2:D2=$CI$2))),SUMPRODUCT(($A$2:A2="2014/15")*
($L$2:L2)*(($C$2:C2=$CI$2)+($D$2:D2=$CI$2))),SUMPRODUCT(($A$2:A2="2015/16")*
($L$2:L2)*(($C$2:C2=$CI$2)+($D$2:D2=$CI$2))),SUMPRODUCT(($A$2:A2="2016/17")*
($L$2:L2)*(($C$2:C2=$CI$2)+($D$2:D2=$CI$2))),SUMPRODUCT(($A$2:A2="2017/18")*
($L$2:L2)*(($C$2:C2=$CI$2)+($D$2:D2=$CI$2))),SUMPRODUCT(($A$2:A2="2018/19")*
($L$2:L2)*(($C$2:C2=$CI$2)+($D$2:D2=$CI$2))),SUMPRODUCT(($A$2:A2="2019/20")*
($L$2:L2)*(($C$2:C2=$CI$2)+($D$2:D2=$CI$2))),SUMPRODUCT(($A$2:A2="2020/21")*
($L$2:L2)*(($C$2:C2=$CI$2)+($D$2:D2=$CI$2))),SUMPRODUCT(($A$2:A2="2021/22")*
($L$2:L2)*(($C$2:C2=$CI$2)+($D$2:D2=$CI$2))))

 

Above formula adds up accumulated data of column L (HS) for the team entered in cell CI2. With every new Premier League Season accmulated data starts with 0. Starting with row 3042 the formula returns errors as the layout of the data is different. 

Thank you, this worked!