Nov 27 2023 07:05 AM - edited Nov 28 2023 07:42 AM
Hi
Looked at +/- replicating Fill Down and Fill Up functionnalities in Power Query with a LAMBDA. My attempt below:
// FillDownUp:
=LAMBDA(array,
LET(
Data, DROP(array, 1),
DataCols, COLUMNS(Data),
FillCol, LAMBDA(array,
SCAN(
IF( ISBLANK( TAKE(array, 1) ),
XLOOKUP(FALSE, ISBLANK(array), array, ""),
""
),
array,
LAMBDA(seed,x, IF( ISBLANK(x), seed, x))
)
),
FillStack, LAMBDA(seed,col, HSTACK( seed, FillCol( CHOOSECOLS(Data, col) ) )),
DataFilled, IF(DataCols = 1, FillCol(Data),
DROP( REDUCE(0, SEQUENCE(DataCols), FillStack),, 1)
),
VSTACK(CHOOSEROWS(array, 1), DataFilled)
)
)
Open to alternatives...
Nov 27 2023 07:41 AM
Looks nice. My variant for FillCol
FillCol = LAMBDA(array,
LET(
reverse, LAMBDA(arr, SORTBY(arr, SEQUENCE(ROWS(arr)),-1) ),
down, LAMBDA(arr, SCAN(, arr, LAMBDA(a,v, IF(v="", a, v))) ),
reverse(down(reverse(down(array))))
)
)
More correct name will be Fill Down and Up (not fill up and down)
Nov 27 2023 12:26 PM
And back to this Can SCAN() and BYROW() be combined? - Microsoft Community Hub discussion one of variants which is similar to your one
FillUpDn = LAMBDA(array,
LET(
fnFill, LAMBDA(array,
LET(
reverse, LAMBDA(arr, SORTBY(arr, SEQUENCE(ROWS(arr)), -1)),
down, LAMBDA(arr, SCAN(, arr, LAMBDA(a, v, IF(v = "", a, v)))),
reverse(down(reverse(down(array))))
) ),
fillData, LAMBDA(arr,
REDUCE(
1,
SEQUENCE(COLUMNS(arr)),
LAMBDA(A, i, IF(SEQUENCE(,i) = i, fnFill(INDEX(arr,, i )), A))
) ),
VSTACK(TAKE(array,1), fillData(DROP(array,1)) )
))
Nov 27 2023 03:22 PM
SolutionMy variant:
Filler
=LAMBDA(table,LET(
header, TAKE(table, 1),
data, DROP(table, 1),
counter, SEQUENCE(COLUMNS(data)),
FillDown, LAMBDA(a, v, IF(v = "", a, v)),
Complete, LAMBDA(vector, LET(ini, TAKE(TOCOL(vector, 1), 1), SCAN(ini, vector, FillDown))),
Stack, DROP(REDUCE("", counter, LAMBDA(a, v, HSTACK(a, Complete(INDEX(data, , v))))), , 1),
VSTACK(header, Stack)
))
Nov 27 2023 09:30 PM
More correct name will be Fill Down and Up (not fill up and down) Named it UpDown as in reality this is what my approach does. Though, this is quite confusing ==> Initial post revised
Nov 28 2023 07:41 AM
Thank you @Sergei Baklan & @Patrick2788
What I had in mind (wasn't sure that was realistic) was to provide a single LAMBDA (to allow copy/paste in Name Manager) for cases like (but not only):
where, for whatever reason Power Query cannot be used. Added a couple of optional parameters:
I don't post the formula as it's a monster now. It's available in the attached workbook - seems to work decently... I stop there but will wait a couple of days before closing this thread in case you have feedback/recommendation (I'm gonna change the title of the discussion accordingly)
Nov 28 2023 08:50 AM
I like this formula. Sure that could be other variants, as for everything in Excel (and not only), but the only reason to revise the formula could be the performance, if it works fine on hundreds thousands of rows. On the other hand such is not practical for the majority of Excel cases.
It works fine and do it function, thank you for sharing.
Nov 29 2023 12:07 AM
I'm usually concerned by perf. (and scalabily) and did check with a few thousands of rows, no problem from my perspective although there are probably faster approaches. TBH I'm not too concerned about this for this kind of transformation as, theoritically, this isn't somethig people have to do ten times a day
Didn't post it but also checked with 'large' tables. To give you a sense (on Core i7/8 GB/SSD):
Random table 300k rows * 7 cols to fill down then up
- LAMBDA 12 secs
- PQ (firewall off) 6 secs
Same random table 500k rows * 7 cols to fill down then up
- LAMBDA 22 secs
- PQ (firewall off) 9 secs
Nov 29 2023 07:35 AM
Interesting, thank you. Yes, good enough for most practical cases.
Nov 29 2023 09:08 AM
And timing obviously varies between tables. I took a different random one still 300k * 7 cols to fill Down then Up:
- PQ (firewall off): 6sec
- FillColumns I posted earlier: 13+sec
Also compared, with same table, 3 options:
// By @Patrick2788: 5sec
=LAMBDA(table,
LET(
header, TAKE(table, 1),
data, DROP(table, 1),
counter, SEQUENCE(COLUMNS(data)),
FillDown, LAMBDA(a, v, IF(v = "", a, v)),
Complete, LAMBDA(vector, LET(ini, TAKE(TOCOL(vector, 1), 1), SCAN(ini, vector, FillDown))),
Stack, DROP(REDUCE("",counter, LAMBDA(a,v, HSTACK(a,Complete(INDEX(data,,v))))),,1),
VSTACK(header, Stack)
)
)
// From thread 'Can SCAN() and BYROW() be combined?': 6sec
=LAMBDA(array,
LET(
fnFill, LAMBDA(array,
LET(
reverse, LAMBDA(arr, SORTBY(arr, SEQUENCE(ROWS(arr)), -1)),
down, LAMBDA(arr, SCAN(, arr, LAMBDA(a, v, IF(v = "", a, v)))),
reverse(down(reverse(down(array))))
)
),
fillData, LAMBDA(arr,
REDUCE(
1,
SEQUENCE(COLUMNS(arr)),
LAMBDA(A, i, IF(SEQUENCE(,i) = i, fnFill(INDEX(arr,, i )), A))
)
),
VSTACK(TAKE(array,1), fillData(DROP(array,1)) )
)
)
// From FillColums as a separate LAMBDA: 5sec
// with @Patrick2788 packaging
=LAMBDA(table,
LET(
header, TAKE(table, 1),
data, DROP(table, 1),
counter, SEQUENCE(COLUMNS(data)),
FirstValue, LAMBDA(arr, IF(TAKE(arr,1)="",XLOOKUP(TRUE,arr<>"",arr,""),"")),
Fill, LAMBDA(seed,v, IF(v="",seed,v)),
FillCol, LAMBDA(arr, SCAN(FirstValue(arr),arr,Fill)),
Stack, DROP(REDUCE("",counter, LAMBDA(a,v, HSTACK(a,FillCol(INDEX(data,,v))))),,1),
VSTACK(header, Stack)
)
)
Nov 29 2023 10:09 AM
@L z. Very nice display of various dynamic array techniques here. Personally, I still prefer VBA for filling down. It requires very little code, is highly efficient, and can be easily adapted to fit any scenario (ie: excluding certain rows or filling down based on values in another column). The basic fill down code for your original screenshot of Table2 would be something like this:
Sub FillDown()
Dim rg As Range, arr As Variant, i As Long, j As Long
Set rg = Sheet1.ListObjects("Table2").DataBodyRange
arr = rg.Value
For i = 2 To UBound(arr, 1)
For j = 1 To UBound(arr, 2)
If arr(i, j) = vbNullString Then arr(i, j) = arr(i - 1, j)
Next j
Next i
rg.Value = arr
End Sub
I've never ran into a scenario where I also needed to fill up; however, that can be done by simply reversing the direction of the first loop and filling values from the row below:
Sub FillDownUp()
Dim rg As Range, arr As Variant, i As Long, j As Long
Set rg = Sheet1.ListObjects("Table2").DataBodyRange
arr = rg.Value
'Fill down
For i = 2 To UBound(arr, 1)
For j = 1 To UBound(arr, 2)
If arr(i, j) = vbNullString Then arr(i, j) = arr(i - 1, j)
Next j
Next i
'Fill up
For i = UBound(arr, 1) - 1 To 1 Step -1
For j = 1 To UBound(arr, 2)
If arr(i, j) = vbNullString Then arr(i, j) = arr(i + 1, j)
Next j
Next i
rg.Value = arr
End Sub
When tested with 520,000 rows x 5 columns of data, it completed in 4.25 seconds; with 1,040,000 rows of data, it took 8.5 seconds.
Nov 29 2023 07:51 PM
VBA is one more option and I'm not surprised it'as a fast one
Thanks for posting your code and taking time to measure efficiency
Jan 01 2024 09:36 AM
Sadly, I didn't see this question first time around!
My first thought is that the problem as posed doesn't really require the equivalent of the PQ FillUp functionality. It is really just FillDown but with non-standard initialisation using the first non-blank value.
FILLDOWNλ
= LET(
init, TAKE(TOCOL(column, 3), 1),
SCAN(init, column, LAMBDA(a, v, IF(v <> "", v, a)))
)
Having defined a function that works with a single column one would naturally expect
= BYCOL(Table, FILLDOWNλ)
to provide the solution for the complete table. Unfortunately, because of the Microsoft error in specifying the functionality required to handle arrays of arrays, one is stuck with implementing one's own version of BYCOL.
BYCOLλ
= LET(
n₀, COLUMNS(array),
n₁, QUOTIENT(n₀, 2),
n₂, n₀ - n₁,
A₁, TAKE(array, , n₁),
A₂, TAKE(array, , -n₂),
X₁, IF(n₁ > 1, BYCOLλ(A₁, FNλ), FNλ(A₁)),
X₂, IF(n₂ > 1, BYCOLλ(A₂, FNλ), FNλ(A₂)),
HSTACK(X₁, X₂)
)
The double recursion is a bit painful but at least the function can be reused and it is straightforward to apply
= BYCOLλ(Table, FILLDOWNλ)
Jan 01 2024 10:27 AM - edited Jan 01 2024 10:28 AM
I did some timing runs using Charles Williams's 'Calc Range' routine and it returned about 3500 ms for 500,000 rows.
Jan 04 2024 07:05 AM
Still need to fully understand how this works but this seems very straightfoward :)
Jan 04 2024 09:49 AM
The core functionality of BYCOLλ is also intended to be widely deployable. The assumption is that some function FNλ needs to be applied to every column of an array (or range). Rather than rewriting the function FNλ to address multiple columns, BYCOLλ splits to problem into two parts and solves each half independently. Recursion means that eventually one reaches the point where the argument is a single column and FNλ may be applied directly to give a single result.
Whereas the built-in BYCOL function will only return the result if each application of the function returns a scalar value, BYCOLλ uses HSTACK to build an array of arrays.
A problem with the alternative approach of using REDUCE and HSTACK to build an array of array is that the early results may be copied 10,000s of times during the stacking steps. This gets to be expensive in terms of computing resource. The bisection approach limits the steps of copying any given result to form build bigger datasets to 14 at the most.
I hope that by describing the intent of the 'home-knitted' helper function, I will simplify life for anyone planning to reverse engineer it from the code.
Nov 27 2023 03:22 PM
SolutionMy variant:
Filler
=LAMBDA(table,LET(
header, TAKE(table, 1),
data, DROP(table, 1),
counter, SEQUENCE(COLUMNS(data)),
FillDown, LAMBDA(a, v, IF(v = "", a, v)),
Complete, LAMBDA(vector, LET(ini, TAKE(TOCOL(vector, 1), 1), SCAN(ini, vector, FillDown))),
Stack, DROP(REDUCE("", counter, LAMBDA(a, v, HSTACK(a, Complete(INDEX(data, , v))))), , 1),
VSTACK(header, Stack)
))