Round 1 Post | Round 2 Post |
Link to Test 2 Workbook - WITHOUT answers
Link to Test 2 Workbook - WITH answers
Welcome Back! I know this round 2 post is long overdue, but I also know you are a forgiving crowd. I wanted to be sure this is out as everyone prepares for the upcoming WORLD FINALS! You will be able to participate in helping us pick the winner! Click here to learn more
Round 2 calls upon a few more features, like PivotTables and Get & Transform Queries. If you are not familiar with these tools, I hope this walkthrough can give you enough of a foothold to start doing some investigation of your own. I’ll say right now the videos are particularly long. This is not because the solutions require many complex steps, but because I try to provide detailed explanations for anyone new to these tools. Advice to the more experienced: jump to the provided timestamps in the videos to see the solutions. Advice to everyone: jump to the provided timestamps in the videos to minimize hearing my voice.
Before we dive in, here’s a quick refresher on the ground rules to these solution posts…
One last thing before we dive in – in the video I said I would provide links to function definitions – but I realized that would make this long post even longer. Instead, I am going to take the “teach a person to fish” approach and point out that you can quickly get to these function definitions from inside Excel
Challenge 01-01 (Data Relays part 1)
The purpose of this question was to introduce pivot tables, which can take many rows of data and summarize it as a smaller table with some useful data aggregation (like the sum or average of the data). Pivot Tables are incredibly valuable to understand because it is common to have more data than you can processes/understand at one time, often the value of large data sets is in their high-level trends, and sometimes you need to start at the high level to identify areas of concern that require drilling into.
Again, in the provided video I talk A LOT (SURPRISE SURPRISE)…I spend quite a bit of time talking about pivot tables and how to use them in a very general sense. If you want to just see the answer to the question, jump to the time stamps provided with each step
Challenge 01-02 (Data Relays part 2)
GET & TRANSFORM IS HERE!!!!!!! Don’t let my monotonous voice on the video fool you, I am pumped
Sorry I go on a bit in the video before getting to the meaty part of answering the question because I want to explain what Get & Transform is.
A quick note regarding the video is that we continue to modify Excel and those with O365 subscriptions get the latest version. The ribbon I am using in the video may look a little different than what you have on your computer depending on your version of Office.
If you are new to Get and Transform I recommend following in the video. Besides just answering the question I do a bit of an introduction to Get & Transform and show some features I really like. I’d also recommend checking out the tutorials mentioned in the Excel workbook. They are a great starting point for learning more.
If by the end, you don’t love Get & Transform, blame the messenger! Keep exploring the tool and try some more of the challenges. There is so much you can do with Get & Transform.
That’s the end of the challenge. After this step, the video shows a few other introductory aspects of Get & Transform for the new comers…
Not a fan of Get & Transform yet? Keep playing around with it and check out the tests of the later rounds, we are just getting started ;)
Challenge 02 (Chart Gymnastics)
Just like my prior post, I won’t go into a lot of details here, because there is no right answer. If you are new to chart building, go check out some of the references I provided in the round 1 post. I do want to give credit to the community members that spotted my error: I botched the copy/paste so this data doesn’t exactly match the source data I provided. A perfect example of why you should use Get & Transform to gather data whenever possible: you minimize human error. That was not an option in this case, so you are left with my imperfect data set.
We selected this data set to throw a few wrenches into the visualizations (some of these issues also appeared in the round 1 data set):
Take a look at some of our most popular visualizations from this round. Shout out to….
@Efren Peria | @Prerit Ahuja | @Mikko Pöyry |
link to post | link to post | link to post |
You’ll notice they all managed these issues by separating relevant data into different charts, but you can still quickly scan across charts to make comparisons. Now this is not the only approach, but sometimes it is the cleanest. I have to give a special shout out to Prerit for going the extra mile and hypothesizing revenue drivers :)
Challenge 03 (Formula Wrestling)
Round 2 takes formula writing up a notch! The questions require a little more than one quick formula to get the job done. I’ve decided to provide 2 approaches to each problem.
The first approach lays out the steps for processing the data across many cells. It makes it easier to see in the workbook how the data flows, and it’s just how my mind tends to work through these types of problems.
The second approach comes from various submissions from the community to provide a “efficient” solution. By “efficient” I mean the formulas are very compact and use the least amount functions and operations. These solutions relied heavily on array functions. If you are new to array functions, I recommend reading some array function documentation. 1 BIG disclaimer on the compact formulas: some of them depend on the data being ordered in particular ways (like consistently between tables). There was no explicit statement that this would hold true when the data refreshes. That said, I do think the solutions were incredibly resourceful and used some AWESOME mathematics principles, so I wanted to share them.
Approach 2: The Compact Formula (8:50) – This is effectively the same logic as above, but stuffed into 1 cell. The video steps through each portion in detail, but I’ll give the pseudo formula text version a crack. I’ll start from the inside and work our way out to the formula’s result. A few notes on this and the other compact formulas that follow:
Full Solution
{=INDEX(E26:E33,RIGHT(LARGE(MMULT(SUBTOTAL({9,4,5}
Breakdown pt 1 (11:10)
{=INDEX(E26:E33,RIGHT(LARGE(MMULT(SUBTOTAL({9,4,5}
Breakdown pt 2 (12:10)
{=INDEX(E26:E33,RIGHT(LARGE(MMULT(SUBTOTAL({9,4,5},OFFSET(F25,ROW(1:8),,,5)),{1;-1;-1})*10000+ROW(1:8),3)))} sum_max_&_min_of_athlete_scores: {= SUBTOTAL({9,4,5}, array_of_athlete_scores} Using the SUBTOTAL() function calculate the sum (SUBTOTAL function number 9), the max (SUBTOTAL function number 4), and the min (SUBTOTAL function number 5) for each array of athlete scores
Breakdown pt 3 (13:35)
{=INDEX(E26:E33,RIGHT(LARGE(MMULT(SUBTOTAL({9,4,5},OFFSET(F25,ROW(1:8),,,5)),{
Breakdown pt 4 (15:30)
{=INDEX(E26:E33,RIGHT(LARGE(MMULT(SUBTOTAL({9,4,5},OFFSET(F25,ROW(1:8),,,5)),{
Multiplying each of these proxy scores by 10,000 effectively adds a bunch of zeroes to the right side of the score. For example, a proxy score of 26.38 becomes 263800. This does not impact the relative size of the gymnast scores, so we have another mathematically sound proxy score.
Breakdown pt 5 (16:23)
{=INDEX(E26:E33,RIGHT(LARGE(MMULT(SUBTOTAL({9,4,5},OFFSET(F25,ROW(1:8),,,5)),{
Breakdown pt 6 (17:45) {=INDEX(E26:E33,RIGHT(LARGE(MMULT(SUBTOTAL({9,4,5},OFFSET(F25,ROW(1:8),,
Breakdown pt 7 (18:10) {=INDEX(E26:E33,RIGHT(LARGE(MMULT(SUBTOTAL({9,4,5},OFFSET(F25,ROW(
Breakdown pt 8 (19:00) {=INDEX(E26:E33,RIGHT(LARGE(MMULT(SUBTOTAL({9,4,5},OFFSET(F25,ROW(
Helper Step (24:25)
4^{th}_place_athlete_number:={ =INDEX(E26:E33,RIGHT(LARGE(MMULT(SUBTOTAL({9,4,5},
Full Solution
=LOOKUP(1,0/(R17=F38:F45),E38:E45)&-R17
Breakdown pt 1 (25:10) =LOOKUP(1,0/(R17=F38:F45),E38:E45)&-R17
4^{th}_place_matches_in_country_table_v1: {=(4^{th}_place_athlete_number = F38:F45)}
By comparing our the 4^{th} place athlete number (our helper step) to every athlete number in Table 2, we produce a TRUE/FALSE array that shows TRUE for the slot corresponding to the 4th place athlete and FALSE for all others
Breakdown pt 2 (25:57)
=LOOKUP(1,0/(R17=F38:F45),E38:E45)&-R17
4^{th}_place_matches_in_country_table_v2: {=0/4^{th}_place_matches_in_country_table_v1}
Because TRUE/FALSE is also represented as 1/0, the operation of dividing zero by our TRUE/FALSE array produces an array that shows 0 where we had a TRUE (the slot corresponding to the location of the 4^{th} place athlete) and a dividing by zero error everywhere we had a FALSE
Breakdown pt 3 (27:15)
=LOOKUP(1,0/(R17=F38:F45),E38:E45)&-R17
4^{th}_place_country:=LOOKUP(1,4^{th}_place_matches_in_country_table_v2, country_list)
LOOKUP(lookup_value, lookup_vector, result_vector) is similar to INDEX(). It looks for the lookup_value in lookup_vector and returns the corresponding value from the result_vector. For example, LOOKUP(“dog”, {“cat”,”cow”,”dog”}, {“meow”,”moo”, ”woof”}}) first looks for “dog” in {“cat”,”cow”,”dog”}, determines it is the 3^{rd} item in lookup_vector, and returns “woof” (the 3^{rd} item in result_vector). LOOKUP() assumes lookup_vector is sorted in ascending order. If LOOKUP() does not find lookup_value in lookup_vector, it uses the largest value in lookup_vector that is less than or equal to lookup_value (aka closest without going over, aka Price-Is-Right-style). Since our lookup_vector is a bunch of errors and a single zero, our LOOKUP function will use the location of the zero to return a value from the result_vector. In other words, we get the country that corresponds to the 4^{th} place score
Breakdow pt 4 (29:20)
=LOOKUP(1,0/(R17=F38:F45),E38:E45)&-R17
ANSWER: 4^{th}_place_country&-4^{th}_place_athlete_number
Again a simple “&” allows us to combine strings. Since the second part of the answer is a number, we can make it negative to produce the “-“ in our final answer.
Calculate the average score for each judge (31:15) – using the AVERAGE() function we can quickly calculate the average score for each judge
Find the lowest average score among the judges (32:45) – using the MIN() function we can find the lowest among all the averages
Find the appropriate judge number (33:30) – again using MATCH function we can find which judge corresponds to the lowest score
Format the judge number (34:15) – since the judge numbers are always “J#”, we can use string concatenation and say “J”&MATCH(…) to get our final answer. Alternatively we could use the INDEX/MATCH combo on the headers of Table 1 to get the judge number. The INDEX/MATCH approach is more sound, because the order of Judge numbers is not locked across the different tables. INDEX/MATCH allows you to do searches and lookups without having specific orders in your data sets. I’ve show the “J”&# approach to be consistent with the compact formula.
Approach 2: The Compact Formula – Again the logic flow is pretty similar to what we accomplished in approach 1, but with fewer function calls thanks to array functions
Full Solution
="J"&MATCH(1,FREQUENCY(0,SUBTOTAL(1,OFFSET(E26,,{1
Breakdown pt 1
="J"&MATCH(1,FREQUENCY(0,SUBTOTAL(1,OFFSET(E26,,{1,2,3,4,5},8))),)
scores_by_judge_array: {=OFFSET(E26,,{1,2,3,4,5},8)}
The formula =OFFSET(E26,,1,8) says start from cell E26, go 1 column to the right and create a cell reference that is 8 cells high. This is effectively the range of cells holding Judge 1’s scores. By entering an array of column offsets ({1,2,3,4,5} rather than just 1), we produce an array of 5 cell ranges. Each cell range is 8 cells high, that is each range represents 1 judge’s scores across all athletes
Breakdown pt 2
="J"&MATCH(1,FREQUENCY(0,SUBTOTAL(1,OFFSET(E26,,{1,2,3,4,5},8))),0)
avg_score_per_judge_array: {=SUBTOTAL(1, scores_by_judge_array)}
Function number 1 of the SUBTOTAL() function is average, so SUBTOTAL(1, ...) calculates the average of a set of numbers or range of cells. Since we are feeding this function an array of cell ranges, the output is an array holding the averages of those ranges. In other words, we get an array that has the average score for each judge
Breakdown pt 3
="J"&MATCH(1,FREQUENCY(0,SUBTOTAL(1,OFFSET(E26,,{1,2,3,4,5},8))
lowest_avg_score_location: {=FREQUENCY(0, avg_score_per_judge_array)}
Again some clever work from the community! The FRQUENCY() function takes an array of data points and an array of data bins and returns an array representing the number of data points (the frequency) falling into each bin. You might have gone through this process when building a histogram. In this case, we’ve passed a single data point (zero) and bins corresponding to the average scores for each judge. The zero data point lands in the smallest bin, which corresponds to the lowest average score. So FREQUENCY() returns an array of all zeros (no data points landing in the bins), EXCEPT for a 1 corresponding to the location of the lowest average score
Breakdown pt 4
="J"&MATCH(1,FREQUENCY(0,SUBTOTAL(1,OFFSET(E26,,{1,2,3,
As we said above, we produced an array of zeros with a single 1. MATCH() searches for the 1 in our array and returns its location. So if the 1 was the 4^{th} item in the array, it returns 4.
Breakdown pt 5
="J"&MATCH(1,FREQUENCY(0,SUBTOTAL(1,OFFSET(E26,,{1
Approach 2: The Compact Formula –
Side Note on Array Formulas (42:00) – A little late I know, but if you have never used array formulas before I discuss a little bit about how you actually type them into a cell
Side Note on writing about arrays - You may have noticed I’ll write an array as a data series wrapped in braces {}. We can call the array {1,2,3} 1-dimensional or say it is a 1x3 array (it has 1 “row” with 3 “columns”). We can also call the array {10;20} 1-dimensional or say it is a 2x1 array (it has 2 “rows” with 1 “column” each. The subtle notation difference is that I used commas (,) to separate my “columns” and semicolons (;) to separate my “rows”. If we added our 2 example arrays together, we would get the 2x3 array {11, 12, 13; 21, 22, 23}. I call this out, because understanding the various dimensions of an array factors into my walkthrough below.
Breakdown part 1
="J"&MODE(--RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*10000+{1,2,3,4,5},{41,45}-ROW(1:8)*5)))
modified_scores_2Darray_v1:{=F26:J33*10000}
F26:J33 represents the range of scores in Table 1. This is a 8x5 2-dimensional array. Multiplying all of these numbers by 10,000 uniformly removes the decimal point and adds a trailing zero. For example, J1’s score for athlete number 3150 goes from 8.973 to 89730. This modification means scores still have the same relative size (the highest original score becomes the highest modified score, the second highest original score becomes the second highest modified score, etc…), but scores are now whole numbers with the least significant digit open to modify further
Breakdown part 2
="J"&MODE(--RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*10000+{1,2,3,4,5},{41,45}-ROW(1:8)*5)))
modified_scores_2Darray_v2:{=modified_scores_2Darr
Breakdown part 3
="J"&MODE(--RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*10000+{1,2,3,4,5},{41,45}-ROW(1:8)*5)))
modified_scores_2Darray_v3:{=ROW(1:8)*1000000 + modified_scores_2Darray_v2}
A similar clever trick as we’ve seen on the prior questions. The ROW() formula produces an 8x1 array with numbers 1 through 8 as ROWS. They are then multiplied by 1,000,000 to produce the array {1M; 2M; 3M; 4M; 5M; 6M; 7M; 8M}. When this gets added to the 8x5 array of modified scores, 1M is added to all the scores in the first row, 2M is added to all the scores in the second row, etc… J1’s score for athlete number 3150 (which was in row 1) goes from 89731 to 189731. Now for each of our modified scores, the first digit is the row number (indicating the athlete), the next 5 digits represent the original score, and the last digit is the column number (indicating the judge). It also means the 1^{st} row (where all 5 columns had 1M added) will have the 5 lowest modified scores (ranks 36 through 40), the 2^{nd} row will have the next 5 lowest modified scores (ranks 31 through 35) … the 8^{th} row will have the 5 highest modified scores (ranks 5 through 1). In effect we’ve created a pattern where the highest score for each row (the highest score for each athlete) have rankings 36, 31, 26, 21, 16, 11, 6, 1 when looking across the entire array of scores. Similarly, the lowest scores for each row (the lowest scores for each athlete) have rankings 40, 35, 30, 25, 20, 15, 10, 5.
Breakdown part 4
="J"&MODE(--RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*1
max_and_min_ranks:{= {41,45}-ROW(1:8)*5}
Again ROW(1:8) will produce the 8x1 array of numbers from 1 to 8. When this is multiplied by 5, the numbers go from 5 to 40 in increments of 5…{5; 10; 15; 20; 25; 35; 40}. When we take the 1x2 array {41,45} and subtract this 8x1 array with multiples of 5, the result is an 8x2 array where 41 and 45 have a different multiple of 5 subtracted in each row..{36,40; 31,35; 26,30; 21,25; 16,20; 11,15; 6,10; 1,5}. Do these rankings look familiar? :)
Breakdown part 5
="J"&MODE(--RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*10000+{1,2,3,4,5},{41,45}-ROW(1:8)*5)))
max_min_scores_2Darray:{=LARGE(modified_scores_2Da
Breakdown part 6
="J"&MODE(--RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*10000+{1,2,3,
judge_exclusions_txt:{=RIGHT(max_min_scores_2Darra
Breakdown part 7
="J"&MODE(--RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*10000+{1,2,
judge_exclusions_num:{=--judge_exclusions_txt}
The “-- " quickly turns the judge numbers from text strings to numeric values. The first “-“ turns the text into a numeric value and multiples it by -1. The second “-“ multiplies by -1 again so that we are back to positive numbers (the actual judge number)
Breakdown part 8
="J"&MODE(--RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*10000+
most_excluded_judge_num: {=MODE(judge_exclusions_num)}
The MODE() function returns the most frequently occurring number in an array. In this case, the most frequently occurring number is the most excluded judge
Breakdown part 9
="J"&MODE(--RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*10
ANSWER: {=”J” & most_excluded_judge_num}
Of course we finish with some basic string concatenation to format our Judge Number correctly.
Find the country for each judge (44:55) – we’ll use the INDEX/MATCH combination to find the country for each judge.
Determine instances where the judge and athlete are from the same country (46:15) – a simple test of athlete_country = judge_country will give us a TRUE when this is the case
Change our TRUE/FALSE test to 1/0 (47:00) – just like question 4, we will use IF() to change our TRUE/FALSE into 1 or 0
Copy this calculation for all judges/athletes (47:35)
Find out when score exclusions align with country matches (48:35) – since our exclusion cases are 1s, our country matches are 1s, and everything else are 0s, multiplying our exclusion tests with our country match test will return 1s when both are 1s (a score exclusion and a country match) and 0s otherwise
Find the total number of cases where score exclusion aligns with country matches (50:22)
Breakdown part 1
=SUM(N(INDEX(E50:E54,T(IF(1,RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*10000+
judge_exclusions_txt:{=T(IF(1,RIGHT(…)))}
See question 4 compact solution breakdown parts 1 through 6. This is almost the exact same calculation. Question 4 parts 1 through 6, get us up to the RIGHT(…) part of to the function. Without getting into detail, I’ll say that the T(IF(…)) wrapper functions force Excels order of operations to maintain the output we saw earlier. Our end result is a 2x8 array showing the judge numbers for all excluded scores (again same output from question 4 parts 1 through 6). Where the first row are the judges excluded for the first athlete in Table 1, the second row are the judges excluded for the second athlete, etc…
Breakdown part 2
=SUM(N(INDEX(E50:E54,T(IF(1,RIGHT(LARGE(ROW(1:8)*1000000+
judge_exclusion_countries:{=INDEX(E50:E54, judge_exclusions_txt)}
E50:E54 are the judges countries (again this solution depends on the ordering of the judges in Table 3 to match the column order of Table 1). So our INDEX() looks up all the countries based on judge number. We end up with a 2x8 array showing the judge country associated with each excluded score (as in part 1, the rows of this array align with the ordering of athletes in Table 1)
Breakdown part 3
=SUM(N(INDEX(E50:E54,T(IF(1,RIGHT(LARGE(ROW(1:8)*1000000+
country_matches_TRUE_FALSE:{=judge_exclusion_count
E38:E45 are the athlete countries (again this solution depends on the ordering of the athletes in Table 1 to match Table 2). Comparing our judge countries to the athlete countries produces a 2x8 array of TRUE/FALSE values. The TRUEs represent cases where the country of the judge with an excluded score matches the athlete country, FALSEs represent non-matches
Breakdown part 4
=SUM(N(INDEX(E50:E54,T(IF(1,RIGHT(LARGE(ROW(1:8)*100000
Breakdown part 5
=SUM(N(INDEX(E50:E54,T(IF(1,RIGHT(LARGE(ROW(1:8)*1
Now that we have 1s for country matches and 0s for everything else, a simple sum gets us the final count of matches.
If you’re reading this, you’ve gotten through the entire walkthrough and just completed my bonus endurance challenge. Thanks for going through the post. I know there was a lot to digest here, and if you’ve never seen array formulas before then the compact solutions may have you scratching your head. Again, I recommend going back and checking out some of array function documentation as a starting point. Also use the trick I showed you in the video (select a portion of the formula in the formula bar and click F9 to see how it calculates). Alternatively, select a cell that’s giving you trouble, go the “Formula” ribbon, and click the “Evaluate Formula” button. This will help you digest the arrays and see Excel’s order of operations. These can help bridge some gaps for you.
And of course, USE THIS COMMUNITY! Sometimes nothing beats learning from someone who has been in the same boat as you.
Posts on the remaining tests will be coming as quickly as I can churn them out. Thanks!
Round 1 Post | Round 2 Post |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.