Forum Discussion
Spilled Array based on different Time Periods and applied periods - Help Needed
James_Buist My first response used a brute-force method to compare the target periods against all dates between the earliest start date and the oldest end date, which can be quite labor intensive and may affect performance with a larger dataset/date range. It was also originally written as a generalized method to handle any start day, including the 30th and 31st, which is probably overkill for your needs.
Another approach could be to generate an array of thunks (uncalled lambda functions), which contain the required sequence of dates for each line number. Then, the array(s) can be broadcast across (and down) to fill the entire output grid for each period, and MAP can be used to loop through the resulting 3D array, calling each applicable thunk (function) as needed.
Assuming I've understood your request correctly, it could look something like this:
=LET(
periodStart, G2:R2,
periodEnd, G3:R3,
interval, B5:B12,
startDate, C5:C12,
endDate, D5:D12+(interval>1),
duration, ((YEAR(endDate)-YEAR(startDate))*12+MONTH(endDate)-MONTH(startDate)+1)/interval,
seqλ, MAP(startDate,duration,interval,LAMBDA(sd,d,i,LAMBDA(x,y,
LET(a,EDATE(sd,SEQUENCE(d,,0,i)),--IFERROR(OR((a>=x)*(a<=y)),0))))),
MAP(IF({1},periodStart,interval),IF({1},periodEnd,interval),IF({1},seqλ,periodEnd),LAMBDA(ps,pe,λ,λ(ps,pe)))
)
It's not the easiest method to understand, but the amount of data being handled is much less than in my first approach. 😉
EDIT: my first two attempts only checked if the start date landed between the period start and period ending dates, which was based on your manually entered "expected results"; however, after re-reading your instructions in the workbook where you said, "if a start or end date hits the middle of a time period... then it is fine just to apply it to the entire period", I'm thinking this might be more appropriate:
=LET(
periodStart, G2:R2,
periodEnd, G3:R3,
interval, B5:B12,
startDate, C5:C12,
endDate, D5:D12,
duration, ((YEAR(endDate)-YEAR(startDate))*12+MONTH(endDate)-MONTH(startDate)+1)/interval,
seqλ, MAP(startDate,duration,interval,LAMBDA(sd,d,i,LAMBDA(x,y,
LET(a,EDATE(sd,SEQUENCE(d,,0,i)),b,EOMONTH(a,i-1),--IFERROR(OR(((a>=x)*(a<=y))+((b>=x)*(b<=y))),0))))),
MAP(IF({1},periodStart,interval),IF({1},periodEnd,interval),IF({1},seqλ,periodEnd),LAMBDA(ps,pe,λ,λ(ps,pe)))
)
Check them out and see if either one does what you want...
Amazing. But I see this one was definitely more of a challenge than my last few queries.
Overall, Version 1 seemed to work. Again, there is a lot to test but I think it does.
V2 is definitely faster. I have rolled it out over 100 cols x around 10 rows. But will be replicating this around 8 times in the model.
It seems to work bar a small glitch. It drops off the last entry when the Applied Period is > 1. If the end date falls on the month of the final period then this final period flag does not show. Ie if End Date is Oct 22 then it doesn’t show flag in Oct 22 when that is the final due flag, but if end date is Nov 22 then it will show the Oct 22 flag. IT only seems to affect the last flag in the series whether it ends before the model end or not.
V3 has the same last flag issue but also creates and extra flag in the month before the actual flag when the applied period >1
Sample files attached
- djclementsOct 13, 2024Silver Contributor
James_Buist It looks like v2 is what you needed after all. I just botched the duration calculation. Remove the +(interval>1) from the endDate and wrap the duration formula inside ROUNDUP(...,0), and it should work as expected:
=LET( periodStart, G2:R2, periodEnd, G3:R3, interval, B5:B12, startDate, C5:C12, endDate, D5:D12, duration, ROUNDUP(((YEAR(endDate)-YEAR(startDate))*12+MONTH(endDate)-MONTH(startDate)+1)/interval,0), seqλ, MAP(startDate,duration,interval,LAMBDA(sd,d,i,LAMBDA(x,y, LET(a,EDATE(sd,SEQUENCE(d,,0,i)),--IFERROR(OR((a>=x)*(a<=y)),0))))), MAP(IF({1},periodStart,interval),IF({1},periodEnd,interval),IF({1},seqλ,periodEnd),LAMBDA(ps,pe,λ,λ(ps,pe))) )In the revised file (attached), I've also added an interactive drop-down where you can select the desired row_num and col_num, and it will show you what sequence of dates is returned for the intersecting cell in the output grid. If something doesn't look right, just let me know....
- PeterBartholomew1Oct 13, 2024Silver Contributor
I could do with a commentary to accompany the main formula! I assume the {1} in
MAP( IF({1}, periodStart, interval), IF({1}, periodEnd, interval), IF({1}, seqλ, periodEnd), LAMBDA(ps, pe, λ, λ(ps, pe)) )is controlling a recursive sequence either passing or evaluating a Lambda function. That formula could provide the basis for a masterclass worth attending!
The variation on the idea of a focus cell is cute, but maybe the idea could be pushed a bit further. The row_num is simply the expense line number, but the col_num could be changed to highlight the payment number for the particular line. A value of 1 would return the first payment and -1 the last. Zero could be all payments for the given line item, or would that be pushing things too far?
- djclementsOct 13, 2024Silver Contributor
Hi Peter, I'll try my best to explain, with a few visuals...
When the logical_test argument of the IF function receives an array object (type 64), all 3 arguments are automatically resized via broadcasting. I know you're already aware of what that means, but I'll elaborate for anyone else who may be reading this. Arguments containing a single value or vector (single row or column) will be broadcast either across or down (or both in the case of a single value) to fill the same number of rows and columns as the argument with the greatest number of rows and the argument with the greatest number of columns.
For example, with the first array, IF({1},periodStart,interval), periodStart is a 1R x 12C vector and interval is an 8R x 1C vector; hence, periodStart is broadcast 8 rows down and interval is broadcast 12 columns across, so both end up filling an 8R x 12C array each. Likewise, the logical_test {1} (single element array) is broadcast both down and across to fill an 8R x 12C array with 1's. Since the IF function treats any number other than 0 as TRUE, only the [value_if_true] argument is returned, which is the periodStart array in this example. This is how it would look:
Broadcasting Example 1: [value_if_true]
Conversely, if you were to use {0} as the logical_test, it would output the [value_if_false] argument, which is the interval array:
Broadcasting Example 2: [value_if_false]
Regarding the final MAP formula, all three array arguments use the same method to broadcast the periodStart, periodEnd and seqλ vectors down and across, thus simulating a 3D array. Note: seqλ is the same size as interval, so they are interchangeable as the [value_if_false] argument for the first two arrays, since the only purpose of the [value_if_false] argument is to resize the [value_if_true] argument. MAP then loops through all three arrays together in row-major order.
Broadcasting Example 3: MAP with 3D array
Even though seqλ is an array of thunks, it's still just a vector, so it can be broadcast in the same manner. Personally, I prefer this method over MAKEARRAY with INDEX, as it is noticeably faster over larger datasets.
Thanks for the feedback on the "focus" cell idea. It seemed like a handy way to pick out an individual cell within the output grid and see what's actually being returned. It can be difficult to visualize what's happening with an array of thunks when you don't get to see anything until the final output.
I don't quite follow what you were saying about taking it a step further, lol (sorry). However, I'm sure a person could write it in whatever manner they see fit, to extract the desired data. Cheers!