average of column b of all values in col d found in col a

Brass Contributor

I have 3 columns( in reality these columns have up to 1200 entries)

Col1 (named range DBNUM Col2 (named range EQUITY)  Col15(named range DBNUMRN)

1                                      22                              3

2                                       45                             4

3                                      12

4                                      9

5                                      7

 

I am trying to find the average of the values in col2 that correspond the values in the 3r column that are found in the 1st col

29 Replies

@Dichotomy66 

Assuming your DBNUM field is a strictly increasing set of values, LOOKUP may be used to return values from EQUITY column.

= AVERAGE( LOOKUP( DBNUMRN, DBNUM, EQUITY ) )

@Peter BartholomewThanks Peter yours is the only formula of about 6 I have tried that has retuned the correct value. The one issue I am running into is that for some unknown reason the total sheet has become volatile,(it may be due to the sheer number of calcs and dynamic arrays it is building) As a result at times the array DBNUMRN (and some others) will at times have a blank row in them. If I click on a few cells in the sheet and get the blank to go away your formula works fine. In this same volatile behavior the results in DBNUMRN calc have some results show up in non consecutive order. Somewhere something on the sheet is causing the volatility and leading to the calculation getting into an iterations loop. I went and set max iterations to 20 just to calm this effect but that's treating the symptom not the problem. I have been very careful to have no circular references in the sheet so I don't know why it is doing this

You might have used the volatile OFFSET in naming your ranges. Use INDEX instead. Avoid array formulas and replace them with non-array ones. Also, try replacing the volatile INDIRECT with CHOOSE.

@TwifooI used INDEX for all my named ranges. No OFFSET in the entire sheet, I do have over 18 Dynamic named ranges though. I did use INDIRECT for a small number of formulas 18 to be exact, The volatility started showing up before that. However likely I will go back and try to find a way using index instead, Not sure I will be successful as using INDIRECT(ADDRESS... for this application may be too difficult

Please specify your INDIRECT formula. Perhaps, I can suggest an alternative that avoids the volatile INDIRECT.
By the way, you didn’t mention if you also have array formulas.

The entire sheet is all array formulas

 

=IFERROR(INDEX($BM$10:$BO$10,,(INDIRECT(ADDRESS(ROW()+10,20)))) is the basic building block formula

Column 20 is a helper column built with a ton of criteria that returns the column in BM10:BO10 I want the data from. When the iferror occurs it goes to the same formula but looking at col 21.. etc etc until it has looked thru 4 columns

@Dichotomy66 

As an FYI the CountA for the sheet is currently at 13807 and only about 800 of those do not contain formulas

Please attach your sample file. I believe we can avoid the volatile INDIRECT. You also admitted the sheet is all array formulas, which another way of saying “the sheet calculates very slowly”. We will try to avoid array formulas, too.

@Twifoo 

Interesting difference in approach.  I work very hard to ensure every possible calculation is an array formula in order to simplify the logic of the model.  I have come across factors of 2 lost in speed for a financial model (more for IFS are replaced since they are very fast) but nothing to put me off array formulas as a basic building tool.  I have even gained a factor of 3 in speed on one client workbook by omitting intermediate step-by-step calculations.

 

There used to be concerns about array formulas mysteriously stopping calculating part way through without reporting an error but I think that is a thing of the past.  For the future, modern Dynamic Arrays have the potential to change everything.

@Peter BartholomewI agree and as my sheet is constantly referring to arrays in one table to an array in another table any changes would be minor

I think my real problem is with the dynamic named ranges as it is when I click on a blank cell in the same column but above the named ranges start cell that the sheet does a minor recalc that glitches though ti does still do the calcs. I am in the process of switching the data fields to structured tables and using those for references rather than the named ranges. This should improve accuracy and speed as the tables are not at the top of the worksheet

 

Though if you could take a peek at the indirect formula mentioned above I would appreciate it

As with everything else in life, array formulas are efficient if used sparingly. When used excessively, as described here, array formulas become inefficient.

@Dichotomy66 

You may not have noticed but in the tiny implementation based upon your problem definition, I converted both ranges to tables.  I then used defined names to refer to ranges within the table.  This makes the named ranges dynamic.

 

@TwifooA I am reworking some things calculation speed doesn't seem to be an issue as it does this in about 1sec. The weird glitch when you click on an empty cell that is in a dynamic named ranges column but above its start point that is having the data reshuffle  in areas across the sheet is a pain. No data results are being dropped, but occasionally because it will insert a blank this causes a problem until I click on another cell and the blank goes away. 

 

No sense in sending it to you to look at the indirects until after I have modified the sheet formulas to use structured table references instead of dynamic named ranges. This does seem to be working better on a bunch of levels While I am at it I will look for dtat columns where I used array formulas and didn't need to.

Yes I did notice that. At the time I was trying to avoid tables but alas another theory shot to hell lol. The larger problem I had with your solution is I was not just trying to sort large to small but thru comparison of what turned about to be 18 different criteria to establish the ordering,,, particularly for the middle values in the results column. I will assuredly revisit this though once I have reset my other 7 dynamic tables to use structured table references rather than dynamic named ranges

@Twifoo 

We may well finish up at a similar place.  It is just that I start with the assumption of array calculation and only decompose to single cell calculation when I have reason to do so.  That might be a use case that required incremental updates and only a small proportion of the calculation changes.  The other reason to use relative referencing is that I simply run out of array dimensions.

 

I think my use of arrays will increase with the modern Dynamic Arrays.  I really like the fact that the formula is held in a single cell and the spill region is determined without further user input (either manually determined fill ranges or the cumbersome CSE).  

@Dichotomy66 

SO the glitch I am having seems to be because EXCEL is saying I have circular references. I was careful not to have any, and the circular reference in error checking is greyed out. It keeps saying circular reference and I go to the cell it listed in the bottom left corner... refill the cell and then it sends me to another one somewhere else which doesn't exist. After 15 or so completely random cells always within a column of similar cells I gave up After looking online the only real explanation I found that made any sense was it is due to too many array formulas ,,,, BONUS points to  @Twifoo

Not sure that is actually the issue as it isn't resolved yet other than to allow iterations in calculate

@Dichotomy66 

I am with @Twifoo as far as the goal of removing

=INDIRECT(ADDRESS(ROW()+10,20)

is concerned.  It is a volatile function do not see what it achieves over and above a relative reference to the addressed cell.  Since the column picked up using INDEX value depends on the content of the cell, it is pretty arbitrary and circular references are likely to be a problem to track down.

@Peter Bartholomew 

=IFERROR(INDEX($BC$10:$BL$10,,(INDIRECT(ADDRESS(ROW(),20)))),(IFERROR(INDEX($BC$10:$BL$10,,(INDIRECT(ADDRESS((ROW()-(COUNT($T$2:$T$11)),21))…………..

 

So in the results column it travels down col 20 fine and yes I could use relative reference in that snippet. Now when I get to the next iferror I am a nonspecific number down the results column.. hence the (-COUNT...) because when I shift over to col 21 I need the result at the top. But I have no way of knowing when this event would trigger, 

The Indirect allows me to start this second iferror looking at the top of its column

 

Does that make sense?

Your use of multiple IFERROR functions is inefficient. It would be better if reduced to a minimum, perhaps only one.
Such inefficiency is further worsened by the multiple INDIRECT functions within them. INDIRECT can be completely avoided in your case. You only have to establish a definitive logic that determines which column to CHOOSE; otherwise, you need to refer to only one column. Succinctly, a change in your data layout is inevitable.