Tables are fantastic but cumulative totals are a pain

MVP

Cumulative Totals in TablesCumulative Totals in Tables

Excel Tables have too many benefits to list here, if you've never used them then check out my article here

Tables = Ctrl +T

 

One frustration I do have though is that cumulative totals in a column are not straightforward.

 

In "normal" Excel to get a running total you just add the cell above to the cell on the current row, and that's it.

 

For a table, with it's structured references and headings this proves problematic.

 

Also, ideally you want a running total that works when you insert rows into the Table or add lines of data to the bottom of it.

 

*** Update 6/12/2019***

 

My preferred approach is now shown in a post further down the responses

 

https://techcommunity.microsoft.com/gxcuf89792/board/message?board.id=ExcelGeneral&message.id=45961#...

 

and I've attached a demo file here

*******************************************************************

 

 

This was my original post...

 

So here are 2 options, and both make use of the INDEX function.

INDEX has several uses and one of them is to make a dynamic range.

This is a very odd concept when you first see it written

 

e.g. =SUM(A1:A10) can be re-written with an INDEX formula as

 

= SUM( INDEX(A:A , 1) : INDEX(A:A , 10) )

 

Essentially it's saying SUM everything between the 1st cell of A:A and the 10th cell of A:A

 

Stick with me, like I say, I know it's unfamiliar and a little weird.

 

Let's apply this to a Table that has weekly units sold for January and February

To keep things simple I've set it so that we've sold 1 unit a week.

 

So the formula we put in the Cumulative Sum column is

 

=SUM( INDEX( [Units Sold] , 1) : [@[Units Sold]])

 

So it adds up the values from row 1 of the Units Sold column as far as each row (that's where the @ [Units Sold] comes in)

 

I'll agree it's not a very attractive formula but it works and I wish there was a simpler way.

 

Importantly it works if you insert rows or add more data to the bottom of the Table. Brilliant!

 

Option 2 : A little more advanced

 

If you've stuck with me this far then it's worth reading this extra bit...

 

The SUM approach has a drawback.

 

If you filter the table for February you still get the YTD cumulative rather than just the February Cumulative

 

An alternative is the little known AGGREGATE function (you could also use SUBTOTAL).

 

=AGGREGATE( 9, 7, INDEX( [Units Sold] , 1) : [@[Units Sold]] ) )

 

The 9 means SUM

 

The 7 means ignore Hidden Rows AND Errors

 

So when the data is filtered the rows are hidden, and the Aggregate works nicely. Even if one of your values was an Error e.g. #VALUE or DIV/0 the cumulative would ignore it and the total still works.

 

This ignoring of elements can come in really useful, but just be careful that you really do want to do this! Spotting errors are normally a good thing.

 

Any suggestions of alternative approaches would be appreciated.

 

 

Hope you find this useful

 

Wyn

My other Linked In articles

 

AMAZING EXCEL SOLUTIONS
www.accessanalytic.com.au
20 Replies

Nice write-up, but it's missing one essential thing:

 

Readers, be aware that for large data sets, this technique can slow down Excel performance when the sheet is recalculated. In every row, the formula will sum from the first table data row to the current data row. If you only have a few hundred rows, that's still fast. But if the data set goes into the hundreds of thousands of rows, you will notice a performance hit. In each row, Excel has to calculate an ever growing range of cells.

 

But in each row it also adds numbers that have already been added. This is unnecessary repetition of calculation and very inefficient.

 

In a table with 500,000 rows there are 499,990 formulas. In row 3 the formula processes 2 cells, in row 4 the formula processes 3 cells, [...], in row 499,999 it processes 499,998 cells, and in row 500,000 the formula processes 499,999 cells. That's a total of 125,000,249,999 to process, and that will take time.

 

A faster approach can be realised by adding just the previous row's total with the current row's sold units. That way, each of the 499,999 formulas calculates only two cells, 1 Million in total. Or, in a scenario with a debit and credit contributing to the balance, three cells, which would be 1,5 million cells to process.

 

Unfortunately, structured referencing does not have a handle for "the previous row", so we still need to revert to A1 notation to do this. Consider the following screenshot:

 

TableRunningTotal.png

The first scenario calculates the running total with this formula starting in E3:

 

=SUM(E2,D3)

 

The Sum()function ignores text values, so in the first instance of the formula, E2 is ignored and D3 is added to the total.

 

The green table has the typical credit/debit pattern with the current balance. Here the formula in J3 is

 

=SUM(J2,H3,-I3)

 

Another way to write this formula would be

 

=N(J2)+H3-I3

 

The plus and minus operators don't tolerate text and will return an error if any of the cells contain text. The N() function converts a cell value to a number and will return 0 if it contains text, so the formula will not return an error in the first data row, despite using plus and minus operators and J2 referring to the column title, which is text.

 

One drawback with this solution is that when a row is inserted in the middle of the table, the formula in the inserted row will have a wrong reference. But Excel picks this up and sets the green warning triangle, which when clicked informs that the formula is inconsistent. A correct formula can quickly be restored by clicking the warning drop-down and selecting "Restore to calculated column formula".

 

RestoreColumn.png

In many situations that approach is faster than waiting for over a billion cells to be processed, but it can be argued that it's easy to miss the warning. 

Fair point @Ingeborg. An alternative for bigger data sets that can handle inserted / deleted rows would be this: I selected cell A2 and then defined the name "RowAbove" as = N(A1) Then my cumulative formula in the Table can be set as RowAbove + [@Units Sold]

You can also do the same thing with an old-style CSE array formula.  By defining an offset range 'PriorBalance' to do the job of your relative reference 'RowAbove', one can use formulas such 

= IF( Initialise?, CarriedForward, PriorBalance ) + Credit

for accumulation.  It is robust to any drag-and-drop that may be used to adjust the credit amounts and no one is going to insert a row through it!  Such formulas are out of spec for a modern dynamic arrays though.  Neither exist within the Table though.

 

p.s. 'Initialise?' simply holds the first row test

= PriorBalance ="Balance"

@Wyn Hopkins 

This is great stuff!

I have been struggling with references in tables to cells in previous rows. I have always used normal cell reference and that worked because in those particular cases I do not need to insert rows in the middle of the table or even sort the table. This tip of yours will be very useful in the future.

Thank you for sharing this!

Glad to help :)

@Wyn Hopkins

I guess this article is one that will just run and run despite its original date!

A solution I adopted was to define a named reference 'prior' that refers to

=Sheet1!R[-1]

in R1C1 notation (i.e. the row above).

 

The accumulation may then be expressed using Range intersection

= [@amount] + IFERROR(prior [balance], 0)

where the error traps the NULL reference that results from applying the formula to the first row.

 

To accumulate over a filtered set, one could introduce a helper column 'selected?' to indicate whether the row is filtered or not (using a form of COUNT)

=SUBTOTAL(103,[@amount])

The accumulation then becomes

= IF([selected?], [@amount], 0) + IFERROR(prior [balance], 0)

 

There remains the problem that a Table is designed to hold an unsorted list whereas accumulation is essentially an array operation (integration) and requires the terms to appear in the correct order.  To 'harden' the calculation in the presence of Sort is another challenge since 'prior' then has to pick  out the record that immediately precedes the  current record in terms of date, not position.

@Peter Bartholomew 

Thanks a lot for sharing this. It will be of great help in the future. Some of the work that I do includes calculators for financial models. I tend to use tables, not because I need to filter or sort, but mostly because formulas will make much more sense. Very often I need to refer to cells in the previous rows of the table. Next time, I will use the tricks that you both shared. ;)

@Peter Bartholomew , I didn't catch why do you need helper column. IMHO,

= SUBTOTAL(4,[@amount]) + IFERROR(prior [balance], 0)

shall work (or 9, 109, etc)

 

@Sergei Baklan 

Agreed.  It was just the way I developed the solution.  I should have reduced it to one of your suggested formulas rather just than settling on the first solution that worked. :(

Hi, @Peter Bartholomew and @Wyn Hopkins!

Could you please explain the steps in more detail?

When I try to set the name prior as referring to =Sheet1!R[-1], I get an error message saying that the formula is not valid.

I tried =Sheet1!R[-1]C but I got the same error.

If I set up the name by selecting A2 and referring to A1 (no dollar sign), the formula is accepted. But I cannot get it to work with the table field names as prior [balance]. It will only work if I use the name alone (no table field associated) to refer to the previous row on the same column.

 

Any clarification/help would be very much appreciated. 

Thanks! :)

 

 

@Celia_Alves 

 

image.png

 

 

With the R[-1] method I may be wrong but I think you have to go to Options > Formulas and set it to R1C1 method

@Celia_Alves 

 

R1C1 notation is more suitable and more natural for named relative reference. Not necessary to think about implicit intersection. However, that works fine in A1 notation as well.

In A1 notation exactly the same will be if, for example, you stay on cell A2 and add named reference RowAbove as

=Sheet1!A1

with relative references. You may stay now on any other cell, let say K5 and check in name manager what is the formula for RowAbove - now it will show Sheet1!K4.

 

Same for any other relative references. It's important to take into account where do you stay adding such named references.

 

@Celia_Alves 

I rarely, if ever, use a direct cell reference, so it makes little difference to me whether my workbook is set to the familiar A1 notation or the earlier R1C1 notation.  Why I chose to use it here is that it is less ambiguous.  If your calculation is set to R1C1,

=Sheet1!R[-1]

is a formula that references the entire row immediately above the active cell with the formula.  If you switch to A1 notation, the reference will need to be defined as

=Sheet1!9:9 if the active cell is in row 10

=Sheet1!10:10 if the active cell is in row 11

etc.

This definition (with an adjective used as a name) is made with an eye on generating readable formulas, e.g.

= prior [balance]

is the intersection of the row above the formula and the [balance] column (any two range references separated by a space will return the intersection).  Since the row prior intersects all of the columns of the table, one could also use

=(prior [date])

to give the date of the previous transactions.

 

If your workbook is sufficiently small that you can afford to use volatile functions, then you could, as an alternative, define 'prior' by the formula

=OFFSET(Table1[@],-1,0)

i.e. the table row above the current one.  Sample formulas might include the following

= IF( ISREF(prior Table1[#Headers]), "", [@Date] - (prior [Date]) )

for the interval between two transactions (could be of relevance if you are fortunate enough to accrue interest on current balances) or

= IF( ISREF(prior Table1[#Headers]), BFwd, [@Amount] + (prior [Balance]) )

for a running balance.  Here the first row will be characterised by the range 'prior' coinciding with the [#Headers] row so, instead of generating an error, the formula will reference the 'balance brought forward'.

 

Will your colleagues be able to read your formulas?  They will be able to deduce the intent but they would run a mile if you suggested they should take responsibility for the workbook!

@Peter Bartholomew 

need help.

we have inventory movement table for various stock items with stock  In and Out information.

I am try to create a pivot report, where we can new fields to table such as opening balance, running balance and closing balance for any given point in a time.

can someone help, how to write a formula for opening balance, running balance and closing balance.

see attached spreadsheet for example

@Sam_3026 

Unless @Wyn Hopkins wishes your question to be dealt with here, I suggest you open a new thread. 

I would also observe that since there are some of the most competent Excel practitioners worldwide responding to this thread it would be a mistake to narrow your potential solutions to a specific approach.

Hi @Sam_3026 ,

 

Yes if you could just post a new item to the community about this that would be great.

 

@   us in if you like

 

Cheers

 

Wyn

@Wyn Hopkins 

 

Hi, I'm fairly late on this thread and I'm relatively new to Excel tables and structured references.  I'm trying to set up a table which handles keeping track of finances for a group.  What I'm wondering from reading through this thread is whether there is a consensus position I could take away from this to use in my table.  My table has Date, Details, In, Out and Balance, so I would like to use a running total for the Balance field.  What would you think might be the best way to set this up, any suggestions?

Dave

@davxl 

As far as your tables are not at least dozens thousands of rows you may use any approach which is more familiar to you and which you understands better. After that you could start consider performance as the main criteria.

 

Practically everything in Excel could be done by several ways, abstractly no one of them is better than others, everything depends on concrete situation and goals.

@Sergei Baklan 

 

Thanks for the advice Sergei, my table will only be 1 or 2 thousand rows at the most (for at least a number of years), so I'll go with what I can understand.  Much appreciated.