SOLVED

Any to any comparison in excel

Copper Contributor

hi Everyone

Book1.xls has 213 list of variables on the rows side and on the columns side, you have list of 15 vendors which provide those variables.

 

Client simply would like to know which vendors to buy from starting by the one that has the most variables, then the next one that covers the majority remaining then the next. Client would like to see multiple scenarios of going with every vendor and then design an overlap map 

 

So the task is to produce "any to any" comparison. Basically task is to present multiple scenarios by showing Vendor A variables (indicated by 1 if available or 0 if not. Can be changed to yes/no, or true/false) then showing next vendor maybe B that cam cover most of the remaining variables from vendor A, then vendor C showing again the remaining variables from Vendor A and B.

 

Then repeating the above with vendors A, then C, then B then D... till closing all 15 vendors

 

Then restarting the whole process with Vendor B at the center and finding the minimum list of vendors to close needed variables in an iterative manner.

 

I know this is complicated and I dont even know where to start. I am hoping brilliant minds like yours can propose a solution for that.

 

if you wish to transpose the data if this makes the process easier that's fine.


@Riny_van_Eekelen 

@peiyezhu 
@

18 Replies
I have downloaded your file.
But only the source table (I guess)。Can you upload an expected result table based on Sheet1 so that I can get what your need easily.

what did you mean :
Client would like to see multiple scenarios of going with every vendor

Do you have any other input criterias besides Sheet1 you shared?
And what is the expected result based on all your inputs (include all criterias and raw dataset)?

thank you for your kind response.
it's just the table available. Forget about column A, this is just for classification.

Target data columns is the main column containing all 213 variables which my client requires. We asked 15 vendors to see if they have the data and they came back with a yes or no answer which is now reflected on that matrix.

Now client says: It's not enough finding the top match of variables across all 15 vendors. He would like to understand multiple scenarios meaning:

- If I go with first vendor covering let's say 50% of the variables, how many more vendors I require to complete the remaining variables? 2 more? 1 more? 7 more?
- If I go with second vendor.... same question

So the client is saying: present to me a complete analysis so that he can decide not only on coverage and minimum number of vendors but also on other factors which will be added later on such as price maybe, accuracy, and other parameters'.

It could be for example that all variables can be covered by Vendor 7 and Vendor 8 but they happen to be very expensive combined. However going with vendor 3, 9 and 14 could cover all variables too but maybe less expensive and with better feature.

So again client requires multiple scenarios of assessment to make the right conclusion. Hope this helps.

One note I missed in my original post: I think there's 9 or 10 variables that are not covered by any vendors so achieving 100% match is not possible.


and again the result should be a new table looking something like the following:

- Vendor 1 covers 80 variables then combined with Vendor 4 to cover 90 variables...
- Vendor 1 covers 80 variables then combined with vendor 9 covering 50 variables then...
....
- Vendor 2 covering 120 variables with vendor 1 covering 30 variables

In the above example, clearly the function should NOT look for already covered variables of the first vendor in the next vendor and the subsequent ones. For example, if vendor 1 covers tsnuami variables, no need to ask subsequent vendors for tsunami variables, we check with the remaining required variables from the currently full list of variables covered.

@samery Just wondering, is the file you provided complete? There currently are no scenarios where all of the variables are covered. If you filter out all of the 1's, you're left with a list of 21 variables that none of the companies are covering.

greetings @djclements
yes that is correct. At the moment, we noticed that some of the vendors are unable to provide the variables that we wish to have. As such, we are going to have to exclude them for now.

But the table is complete. The list of 200+ rows are the variables on the B column that are actual variables that you can buy from data providers. On the columns side you will find vendors that provide such variables. 1 indicate that the vendor can sell u that variable while 0 means that vendor doesn't have this variable.
Hi everyone

really hope to get your help so let's change the question differently.

based on the attached excel sheet, let's see I want to create one table for every vendor separately. It goes something like this:
Vendor 1 has 50 variables => this can be done via filter function of excel.
Vendor 2 has 40 variables on top of vendor 1 => this should be done again by filter function but must filter OUT variables already included in vendor 1
Vendor 3 offers 10 variables on top of vendor 1 and 2 again by filtering out already covered variables.
etc.. till end of vendor list. ==> this would conclude assessment of vendor 1

2nd separate table for Vendor 2 starts the same process and on an on...

So the first simple question: how do we use a filter function to filter from an array of 100 rows that have values 1 and filtering out already covered rows from previous rounds?
how do we use a filter function to filter from an array of 100 rows that have values 1 and filtering out

Sorry,I can not get you by now.
If possible,can you upload an expected result workbook.based on the source dataset you have uploaded.

@peiyezhu 

You are right probably, I am complicating things.

 

Please redownload the file again and go to cell B235. The idea is to show what every vendor can provide and then the extra variables that the next vendor can provide above and beyond the variables provided by the previous providers. IS this now clear?

best response confirmed by samery (Copper Contributor)
Solution

@samery Are you using Excel for MS365? If so, here's a couple of recursive formulas you could try. However, they will only return the most optimized combination of vendors...

 

To return a summary of the optimal combination of vendors with their total items covered, try the following formula in any cell (with nothing below it):

 

=LET(
F, LAMBDA(X,head,data,val, LET(
   a, BYCOL(data, LAMBDA(col, SUM(col))),
   b, MAX(a),
   c, XMATCH(b, a),
   IF(b=0, val, X(X, head, FILTER(data, INDEX(data,, c)=0, 0),
      VSTACK(val, HSTACK(INDEX(head, c), b)))))),
F(F, C1:Q1, C2:Q231, {"Company","Total Covered"}))

 

With your current data, the results should look something like this:

 

CompanyTotal Covered
S&P138
Oxford eco0mics28
ECONDB20
Kpler11
blue water5
alphaliner2
economist intelligence2
Sinay1
Oxford Analytica1
TDM1

 

To return a detailed list of the optimal combination of vendors with all of their items covered, try the following formula in any cell (will nothing below it):

 

=LET(
F, LAMBDA(X,head,data,val, LET(
   a, BYCOL(data, LAMBDA(col, SUM(col))),
   b, MAX(a),
   c, XMATCH(b, a),
   d, INDEX(head, c),
   IF(b=0, val, X(X, head, FILTER(data, INDEX(data,, c)=0, 0),
      VSTACK(val, IFNA(HSTACK(d, FILTER(TAKE(data,, 1), INDEX(data,, c)=1)), d)))))),
F(F, B1:Q1, B2:Q231, {"Company","Items Covered"}))

 

Finally, to return a list of all items not covered by any vendor, try the following formula in any cell (with nothing below it):

 

=LET(
data, B2:Q231,
qty, BYROW(data, LAMBDA(r, SUM(r))),
VSTACK("Items Not Covered", FILTER(TAKE(data,, 1), qty=0, "none")))

 

It's a start, anyways. I'll think about it some more and try to find a way to generate other combinations of vendors.

Thanks for your response.
B235=80
I still can not find out how to calculate this 80.
variables are B234 and A235?

How these two variables came out 80 manually?
thank you so much. let me digest what you wrote.
B235 = 80 then 40 then 0... are all just dummy data just to let you know what I had in mind.
But I think you captured what I am looking for... hopefully when I understand the response.

just a question please:
in this formula:
=LET(
F, LAMBDA(X,head,data,val, LET(
a, BYCOL(data, LAMBDA(col, SUM(col))),
b, MAX(a),
c, XMATCH(b, a),
IF(b=0, val, X(X, head, FILTER(data, INDEX(data,, c)=0, 0),
VSTACK(val, HSTACK(INDEX(head, c), b)))))),
F(F, C1:Q1, C2:Q231, {"Company","Total Covered"}))

The result indicates that S&P should be the lead which is obvious then all the other vendors were added very nicely depending on the extra variables that they provide on top of the previous one. This is what the client wants which is great. But it assumes all variables have equal weighting. Client is saying what if vendor X say Factset which has the least number of variables but nonetheless carries better coverage. So client says what if we trigger another scenario by starting not with a top vendor rather any other vendor as the start then align all the other vendors cumulatively till we close as much as possible the remaining variables.

Now from a sorting standpoint, the result is the same but from a variable importance standpoint the client is insisting on having visibility on all scenarios and then he can conclude depending on other factors (such as coverage, weight, refresh rate,.... and many other stuff which are not mentioned to avoid complication).

So may I ask you the following:
How can please revise your formula to rerun the same logic on each and every vendor (clearly in different tables)? in other words, can your formula be reprogrammed to start with a vendor not as the top in terms of number of variables but as a variable in a cell then a formula would work its way towards the other vendors to add remaining vendors?

@samery Please see the attached workbook (Book2.xlsx). Sheet2 is setup for you to select the first vendor to start with, then the formulas will generate the optimal combination of additional vendors thereafter.

 

I've also completed the summary you were aiming for on Sheet1, but it's not an optimized summary, as it just goes through the list of vendors in the order that they appear (not by the number of items they cover).

 

Also worth noting, the number of possible combinations with 15 different vendors is 15! = 1,307,674,368,000. So, the prospect of getting all possible combinations is not very realistic. I think the best combination of results are demonstrated on Sheet2, although they are still weighted by the next vendor with the maximum number of items remaining. Perhaps there could be a way to weight them by other factors, but I would need to see what those factors are.

 

Hopefully the attached workbook will provide you with enough scenarios. Cheers!

@samery 

You seem to have more exchanges since I last looked. 

What I have tried is a brute force approach of selecting combinations from the vendor list.  Ultimately there are 32768 combinations and the 'best' result is simply use all vendors.  What I have done is limit the search to 3 or 4 vendors and list the combinations that give the highest variable count.

image.png

The combinations are generated by converting a sequence to binary and splitting the bits across the row.  I sort that by the count of set bits and truncate the list to give the number of options considered.

Your work is amazing, neat, clear and straight forward. It answers exactly what the client wants to see. Thank you so much
your work is very exhaustive and it blew my mind and my team mind. Some people spent 2 days crunching combinations in python and you did it in excel. Kudos to your lovely and excellent work

@djclements

hi again

 

I promise I will leave you alone after this last post so please bear with me.

you noticed I hinted earlier some other complications related to variables such as refresh rate and scope.

 

I tried to adjust your great work all by myself but I didnt know how to do it.

 

First: Please check the attached book2.xls, tab Raw data, it has the actual replica of what I have along with 2 other descriptive fields for every variable: the refresh rate and the scope. I know that this data is not available for all variables.

 

Second: now please go back to the 2nd tab you created, basically I am supposed to add couple of more fields for every data from the raw data tab so that the client would know what exactly he is buying. Obviously, I should pickup the refresh rate and the scope of the relevant vendor.

 

Such information should help my client to decide whether or not to buy a particular variable form Vendor X for example if the scope is weak. I also added another simple sum small column next to the nice table you put on the top left on coverage only so that the client will visually see how much before and after, meaning how much client should consumer form vendor vs how much vendor is proving as a simple sum. 

 

do you think you can help me with that as well?

 

many thanks in advance.

Regards

Samer. 

@samery Please see the newly attached workbook (Book3.xlsx). I moved all of the custom functions to Name Manager and added the ability for you to manually select multiple vendors (up to 5). They will be prioritized in the order you choose, then all subsequent vendors will automatically be filled based on the number of remaining items they can cover.

 

In order to get the Refresh and Scope variables to pull in, I had to modify the headers slightly on the RawData sheet... all merged headers were unmerged and repeated across their applicable columns. I also verified the Target Data labels on Sheet1 were identical to those on the RawData sheet. All headers and labels must match in order to pull in the additional information.

 

I wasn't exactly sure what you wanted the Full Coverage column to do, so I set it up to pull the values from the total row on Sheet1 for each vendor in the list. If that's not what you wanted, please try to explain it again.

hi @djclements
I think the addition of up to 5 is more then what I expected. There was few variables that have bad coverage so to speak so I wanted to show the client that from a formula standpoint, that's how variables are distributed but client to decide which coverage to consider or not. Warm thanks.

The totals I mentioned are exactly what you presented: how much client offers vs how much client needs.
I think the work is amazing and I am really and obsoletely thankful.
1 best response

Accepted Solutions
best response confirmed by samery (Copper Contributor)
Solution

@samery Are you using Excel for MS365? If so, here's a couple of recursive formulas you could try. However, they will only return the most optimized combination of vendors...

 

To return a summary of the optimal combination of vendors with their total items covered, try the following formula in any cell (with nothing below it):

 

=LET(
F, LAMBDA(X,head,data,val, LET(
   a, BYCOL(data, LAMBDA(col, SUM(col))),
   b, MAX(a),
   c, XMATCH(b, a),
   IF(b=0, val, X(X, head, FILTER(data, INDEX(data,, c)=0, 0),
      VSTACK(val, HSTACK(INDEX(head, c), b)))))),
F(F, C1:Q1, C2:Q231, {"Company","Total Covered"}))

 

With your current data, the results should look something like this:

 

CompanyTotal Covered
S&P138
Oxford eco0mics28
ECONDB20
Kpler11
blue water5
alphaliner2
economist intelligence2
Sinay1
Oxford Analytica1
TDM1

 

To return a detailed list of the optimal combination of vendors with all of their items covered, try the following formula in any cell (will nothing below it):

 

=LET(
F, LAMBDA(X,head,data,val, LET(
   a, BYCOL(data, LAMBDA(col, SUM(col))),
   b, MAX(a),
   c, XMATCH(b, a),
   d, INDEX(head, c),
   IF(b=0, val, X(X, head, FILTER(data, INDEX(data,, c)=0, 0),
      VSTACK(val, IFNA(HSTACK(d, FILTER(TAKE(data,, 1), INDEX(data,, c)=1)), d)))))),
F(F, B1:Q1, B2:Q231, {"Company","Items Covered"}))

 

Finally, to return a list of all items not covered by any vendor, try the following formula in any cell (with nothing below it):

 

=LET(
data, B2:Q231,
qty, BYROW(data, LAMBDA(r, SUM(r))),
VSTACK("Items Not Covered", FILTER(TAKE(data,, 1), qty=0, "none")))

 

It's a start, anyways. I'll think about it some more and try to find a way to generate other combinations of vendors.

View solution in original post