SOLVED

Fill BLANKs down, up or both in column(s) - similar to Power Query

Silver Contributor

Hi

 

Looked at +/- replicating Fill Down and Fill Up functionnalities in Power Query with a LAMBDA. My attempt below:

 

Sample.png

 

 

 

 

// FillDownUp:
=LAMBDA(array,
  LET(
    Data,       DROP(array, 1),
    DataCols,   COLUMNS(Data),
    FillCol,    LAMBDA(array,
                  SCAN(
                    IF( ISBLANK( TAKE(array, 1) ),
                      XLOOKUP(FALSE, ISBLANK(array), array, ""),
                      ""
                    ),
                    array,
                    LAMBDA(seed,x, IF( ISBLANK(x), seed, x))
                  )
    ),
    FillStack,  LAMBDA(seed,col, HSTACK( seed, FillCol( CHOOSECOLS(Data, col) ) )),
    DataFilled, IF(DataCols = 1, FillCol(Data),
                  DROP( REDUCE(0, SEQUENCE(DataCols), FillStack),, 1)
    ),
    VSTACK(CHOOSEROWS(array, 1), DataFilled)
  )
)

 

Open to alternatives...

15 Replies

@L z. 

Looks nice. My variant for FillCol

FillCol = LAMBDA(array,
    LET(
    reverse, LAMBDA(arr, SORTBY(arr, SEQUENCE(ROWS(arr)),-1) ),
    down, LAMBDA(arr, SCAN(, arr, LAMBDA(a,v, IF(v="", a, v))) ),
    reverse(down(reverse(down(array))))
    )
)

More correct name will be Fill Down and Up (not fill up and down)

@L z. 

And back to this Can SCAN() and BYROW() be combined? - Microsoft Community Hub discussion one of variants which is similar to your one

FillUpDn = LAMBDA(array,
LET(
    fnFill, LAMBDA(array,
    LET(
        reverse, LAMBDA(arr, SORTBY(arr, SEQUENCE(ROWS(arr)), -1)),
        down, LAMBDA(arr, SCAN(, arr, LAMBDA(a, v, IF(v = "", a, v)))),
        reverse(down(reverse(down(array))))
    ) ),
    fillData, LAMBDA(arr,
    REDUCE(
        1,
        SEQUENCE(COLUMNS(arr)),
        LAMBDA(A, i, IF(SEQUENCE(,i) = i, fnFill(INDEX(arr,, i )), A))
    ) ),
    VSTACK(TAKE(array,1), fillData(DROP(array,1)) )
))
best response confirmed by L z. (Silver Contributor)
Solution

@L z. 

My variant:

 

Filler
=LAMBDA(table,LET(
    header, TAKE(table, 1),
    data, DROP(table, 1),
    counter, SEQUENCE(COLUMNS(data)),
    FillDown, LAMBDA(a, v, IF(v = "", a, v)),
    Complete, LAMBDA(vector, LET(ini, TAKE(TOCOL(vector, 1), 1), SCAN(ini, vector, FillDown))),
    Stack, DROP(REDUCE("", counter, LAMBDA(a, v, HSTACK(a, Complete(INDEX(data, , v))))), , 1),
    VSTACK(header, Stack)
))

@Sergei Baklan 

More correct name will be Fill Down and Up (not fill up and down) Named it UpDown as in reality this is what my approach does. Though, this is quite confusing ==> Initial post revised

Thank you @Sergei Baklan & @Patrick2788 

 

What I had in mind (wasn't sure that was realistic) was to provide a single LAMBDA (to allow copy/paste in Name Manager) for cases like (but not only):

 

1.png

 

where, for whatever reason Power Query cannot be used. Added a couple of optional parameters:

 

Sample.png

 

I don't post the formula as it's a monster now. It's available in the attached workbook - seems to work decently... I stop there but will wait a couple of days before closing this thread in case you have feedback/recommendation (I'm gonna change the title of the discussion accordingly)

@L z. 

I like this formula. Sure that could be other variants, as for everything in Excel (and not only), but the only reason to revise the formula could be the performance, if it works fine on hundreds thousands of rows. On the other hand such is not practical for the majority of Excel cases.

It works fine and do it function, thank you for sharing.

@Sergei Baklan 

I'm usually concerned by perf. (and scalabily) and did check with a few thousands of rows, no problem from my perspective although there are probably faster approaches. TBH I'm not too concerned about this for this kind of transformation as, theoritically, this isn't somethig people have to do ten times a day

 

Didn't post it but also checked with 'large' tables. To give you a sense (on Core i7/8 GB/SSD):
Random table 300k rows * 7 cols to fill down then up

- LAMBDA 12 secs
- PQ (firewall off) 6 secs

Same random table 500k rows * 7 cols to fill down then up
- LAMBDA 22 secs
- PQ (firewall off) 9 secs

@L z. 

Interesting, thank you. Yes, good enough for most practical cases. 

@Sergei Baklan 

And timing obviously varies between tables. I took a different random one still 300k * 7 cols to fill Down then Up:

- PQ (firewall off): 6sec

- FillColumns I posted earlier: 13+sec

 

Also compared, with same table, 3 options:

// By @Patrick2788: 5sec
=LAMBDA(table,
  LET(
    header,   TAKE(table, 1),
    data,     DROP(table, 1),
    counter,  SEQUENCE(COLUMNS(data)),
    FillDown, LAMBDA(a, v, IF(v = "", a, v)),
    Complete, LAMBDA(vector, LET(ini, TAKE(TOCOL(vector, 1), 1), SCAN(ini, vector, FillDown))),
    Stack,    DROP(REDUCE("",counter, LAMBDA(a,v, HSTACK(a,Complete(INDEX(data,,v))))),,1),
    VSTACK(header, Stack)
  )
)
// From thread 'Can SCAN() and BYROW() be combined?': 6sec
=LAMBDA(array,
LET(
    fnFill, LAMBDA(array,
      LET(
          reverse,  LAMBDA(arr, SORTBY(arr, SEQUENCE(ROWS(arr)), -1)),
          down,     LAMBDA(arr, SCAN(, arr, LAMBDA(a, v, IF(v = "", a, v)))),
          reverse(down(reverse(down(array))))
      )
    ),
    fillData, LAMBDA(arr,
      REDUCE(
          1,
          SEQUENCE(COLUMNS(arr)),
          LAMBDA(A, i, IF(SEQUENCE(,i) = i, fnFill(INDEX(arr,, i )), A))
      )
    ),
    VSTACK(TAKE(array,1), fillData(DROP(array,1)) )
  )
)
// From FillColums as a separate LAMBDA: 5sec
// with @Patrick2788 packaging
=LAMBDA(table,
  LET(
    header,     TAKE(table, 1),
    data,       DROP(table, 1),
    counter,    SEQUENCE(COLUMNS(data)),
    FirstValue, LAMBDA(arr, IF(TAKE(arr,1)="",XLOOKUP(TRUE,arr<>"",arr,""),"")),  
    Fill,       LAMBDA(seed,v, IF(v="",seed,v)),  
    FillCol,    LAMBDA(arr, SCAN(FirstValue(arr),arr,Fill)),
    Stack,      DROP(REDUCE("",counter, LAMBDA(a,v, HSTACK(a,FillCol(INDEX(data,,v))))),,1),
    VSTACK(header, Stack)
  )
)

 

@L z. Very nice display of various dynamic array techniques here. Personally, I still prefer VBA for filling down. It requires very little code, is highly efficient, and can be easily adapted to fit any scenario (ie: excluding certain rows or filling down based on values in another column). The basic fill down code for your original screenshot of Table2 would be something like this:

 

Sub FillDown()

    Dim rg As Range, arr As Variant, i As Long, j As Long
    Set rg = Sheet1.ListObjects("Table2").DataBodyRange
    arr = rg.Value

    For i = 2 To UBound(arr, 1)
        For j = 1 To UBound(arr, 2)
            If arr(i, j) = vbNullString Then arr(i, j) = arr(i - 1, j)
        Next j
    Next i

    rg.Value = arr

End Sub

 

I've never ran into a scenario where I also needed to fill up; however, that can be done by simply reversing the direction of the first loop and filling values from the row below:

 

Sub FillDownUp()

    Dim rg As Range, arr As Variant, i As Long, j As Long
    Set rg = Sheet1.ListObjects("Table2").DataBodyRange
    arr = rg.Value

'Fill down
    For i = 2 To UBound(arr, 1)
        For j = 1 To UBound(arr, 2)
            If arr(i, j) = vbNullString Then arr(i, j) = arr(i - 1, j)
        Next j
    Next i

'Fill up
    For i = UBound(arr, 1) - 1 To 1 Step -1
        For j = 1 To UBound(arr, 2)
            If arr(i, j) = vbNullString Then arr(i, j) = arr(i + 1, j)
        Next j
    Next i

    rg.Value = arr

End Sub

 

When tested with 520,000 rows x 5 columns of data, it completed in 4.25 seconds; with 1,040,000 rows of data, it took 8.5 seconds.

@djclements 

 

VBA is one more option and I'm not surprised it'as a fast one
Thanks for posting your code and taking time to measure efficiency

@L z. 

Sadly, I didn't see this question first time around!

My first thought is that the problem as posed doesn't really require the equivalent of the PQ FillUp functionality.  It is really just FillDown but with non-standard initialisation using the first non-blank value.

FILLDOWNλ
= LET(
    init, TAKE(TOCOL(column, 3), 1), 
    SCAN(init, column, LAMBDA(a, v, IF(v <> "", v, a)))
  )

Having defined a function that works with a single column one would naturally expect

= BYCOL(Table, FILLDOWNλ)

to provide the solution for the complete table.  Unfortunately, because of the Microsoft error in specifying the functionality required to handle arrays of arrays, one is stuck with implementing one's own version of BYCOL.

BYCOLλ
= LET(
    n₀, COLUMNS(array),
    n₁, QUOTIENT(n₀, 2),
    n₂, n₀ - n₁,
    A₁, TAKE(array, , n₁),
    A₂, TAKE(array, , -n₂),
    X₁, IF(n₁ > 1, BYCOLλ(A₁, FNλ), FNλ(A₁)),
    X₂, IF(n₂ > 1, BYCOLλ(A₂, FNλ), FNλ(A₂)),
    HSTACK(X₁, X₂)
  )

The double recursion is a bit painful but at least the function can be reused and it is straightforward to apply

= BYCOLλ(Table, FILLDOWNλ)

 

@L z. 

I did some timing runs using Charles Williams's 'Calc Range' routine and it returned about 3500 ms for 500,000 rows.

@Peter Bartholomew 

Still need to fully understand how this works but this seems very straightfoward :)

The core functionality of BYCOLλ  is also intended to be widely deployable.  The assumption is that some function FNλ needs to be applied to every column of an array (or range).  Rather than rewriting the function FNλ to address multiple columns, BYCOLλ splits to problem into two parts and solves each half independently.  Recursion means that eventually one reaches the point where the argument is a single column and FNλ may be applied directly to give a single result.

 

Whereas the built-in BYCOL function will only return the result if each application of the function returns a scalar value, BYCOLλ uses HSTACK to build an array of arrays.

 

A problem with the alternative approach of using REDUCE and HSTACK to build an array of array is that the early results may be copied 10,000s of times during the stacking steps.  This gets to be expensive in terms of computing resource.  The bisection approach limits the steps of copying any given result to form build bigger datasets to 14 at the most.

 

I hope that by describing the intent of the 'home-knitted' helper function, I will simplify life for anyone planning to reverse engineer it from the code.

1 best response

Accepted Solutions
best response confirmed by L z. (Silver Contributor)
Solution

@L z. 

My variant:

 

Filler
=LAMBDA(table,LET(
    header, TAKE(table, 1),
    data, DROP(table, 1),
    counter, SEQUENCE(COLUMNS(data)),
    FillDown, LAMBDA(a, v, IF(v = "", a, v)),
    Complete, LAMBDA(vector, LET(ini, TAKE(TOCOL(vector, 1), 1), SCAN(ini, vector, FillDown))),
    Stack, DROP(REDUCE("", counter, LAMBDA(a, v, HSTACK(a, Complete(INDEX(data, , v))))), , 1),
    VSTACK(header, Stack)
))

View solution in original post