User Profile
joelb95
Brass Contributor
Joined 2 years ago
User Widgets
Recent Discussions
Re: Conditional Formatting for Spilled Arrays
I tried re-downloading it and opening it, but it literally opens as a blank workbook with just the name of the file. Not sure why. Any idea on what I might do differently? I am even changing the properties on the file to allow it to open with macros before opening the file for the first time, so I'm not sure why it would be stripping any content from the file after downloading but before opening.20Views0likes0CommentsRe: Conditional Formatting for Spilled Arrays
Hi Harun, It is a little difficult to make an example file without adding a bunch of fake data/reports. I'll try a bit of a narrative and see if that helps. Imagine you had a customer list and you are trying to show how much in sales you had. Say you had 5 customers and 5 products and you wanted a presentation where there is a header row like, "Name", "Product 1 Sales", "Product 2 Sales", etc. (a total of six columns), a body where each customer gets their own row "(Customer A", "Customer B", etc.) and a totals row for each product so like "Totals", "=sum(product 1 sales)", "=sum(product 2 sales)", etc. Now imagine you want that report formatted in a particular way, but your data is still changing. So maybe you end up with 20 customers or 12 products, but you given the dynamic nature of the data, that could change between when you create the report and when you use it. Traditionally what would do is just hand format each of the rows/columns/cells in the report range - say A1 to E1 for headers, A2 to A6 for the customer names, B2 to E6 for the total product amounts by customer, and A7 to A6 for the total amounts by product. A fairly early alternative was to use pivot tables, so the customers would generate automatically and you could add a column for each new product - the main problem being that you are stuck with whatever pivot table formatting is made available to you within the general constraints of a pivot table. More recently you can use something like groupby (a dynamic array formula that spills into a range of indefinite size). So as you add products, the number of columns grows and as you add customers, the number of rows grow. The big problem is that Excel does not natively have a way for you to format the header rows, the total rows, or the data rows except using the manual traditional method. An alternative might be to use conditional formatting where you keep going back into the "applies to" range and updating the ranges to match the current size of your report's spilled array. What my file does is sort of fudge the dynamic nature of a pivot table's formatting with the dynamism of the groupby spilled array. It can't perfectly replicate the pivot table's built in conditioning, but it can approximate it without a huge amount of overhead. In effect, by creating a conditional formatting applies to range that you know is always going to be bigger than your dynamic array's spilled range, you ensure that your spilled range is always formatted according to your rules. For example, if you say a conditional formatting rule applies to cells A1:Z1000, you've created the possibility for your dynamic array to have about 1000 customers and 25 products and still get covered by your rule. Anything less than that, you are fine - anything more than that, your conditional formatting stops working. The problem is, if you have your applies to range be the entire sheet, you may very well have a million cells checking to see what conditional formatting should be applied to them - if you had a dozen rules that might apply, that means you might have twelve million conditional checks. The only way to avoid either not having your conditional formatting apply to your entire dynamic range and to avoid having every cell on the sheet checked to see how it should be formatted is to find a goldilocks sort of zone - one where just the right number of cells are checked - not too many, not too few. My system is designed to automate the calculation of the goldilock zone and make it simple for you to set the applies to range for each conditional formatting rule. The system also adds the ability for you to use something known as "declarative programming" like statements to select what portion of a dynamic array you want the rule applied to. Rather than having to figure out how to have a condition apply just to the last row of your dynamic array (within the goldilocks applies to range), you just write a formula like cf.in_last_row(named_dynamic_array). In this way, it is dead simple for you to understand what the conditional formatting rule is meant to achieve. It is not the sort of solution the typical excel user would employ - it is very "abstract" in the sense that it is meant to apply to any problem within the general category of problems of conditionally formatting dynamic arrays. You need to be able to understand what that problem is in the first place to have any appreciation for why you might use the tool offered. So the likely user base is someone that is designing automated reporting for people that are otherwise unable to use macro enabled workbooks (VBA), ExcelScripts (effectively javascript), or custom addons into excel. In other words, it is an entirely formulaic solution that will work on any computer that has excel 365 installed absent the system's admin disabling excel's name manager or modern excel 365 functionality.2Views0likes0CommentsConditional Formatting for Spilled Arrays
I have recently composed a bunch of functions to make using conditional formatting with spilled arrays more manageable. You can check out those formulas at the end of my post. I am curious if anyone has a different way to accomplish this goal. I will briefly describe the overall method and give a use case. In essence, to use conditional formatting you need to 1)specify a range of cells where the rule will be conditionally applied (the "Conditional Formatting Range") and 2) specify the condition for application. In many cases, the formatting is conditioned upon the placement of the cell/column/row/etc. relative to the spilled array which generated it (the "Spilled Array Range"). The problem (so far as I know) is that Excel's conditional formatting manager does not allow dynamic range references to specify the Conditional Formatting Range, so there is a fundamental disconnect between the Conditional Formatting Range and the possible range of the Spilled Array Range you want to format. It occurred to me that one possible solution to this problem is to A) set the Conditional Formatting Range to a defined range that is certain to be larger than your Spilled Array Range and B) create conditions that inherently self limit themselves to the Spilled Array Range regardless of the size of your Conditional Formatting Range. Mindful of the perils of using volatile functions, I wanted to create a solution that avoids turning values (string references primarily) into range references via functions like indirect and offset. That meant that I was limited to selecting a subrange from a larger range and constructing a dynamic range by way of a pattern like "index(array1, starting_row, starting_col):index(array2, ending_row, ending_col)" where the first index returns a single cell range reference to the first cell of the target range and the second index returns a single cell range reference to the last cell of the target range. This idea sort of distills down to the last few lines of the function I created: result, IF( NOT(base_in_container), NA(), LET( start_ref, INDEX(container_array, idx_base_in_container_first_row, idx_base_in_container_first_col), end_ref, INDEX(container_array, idx_base_in_container_last_row, idx_base_in_container_last_col), start_ref:end_ref ) ), If you name the range created in this way, you can simply enter "=named_range" into the applies to field in the conditional formatting interface and Excel will, at the time you apply the rule, resolve that name into whatever the then current range reference is of named_range, e.g. "A$7$:$G$52". Assuming your spilled array is contained in that range, your conditional formatting rule will apply to the spilled array. I call this larger containing range something like "CF_OVERSIZED_ARRAY." Once CF_OVERSIZED_ARRAY is set to a rule, you never have to change the Conditional Formatting Range again unless your spilled array size possibly exceeds whatever range you initially selected. (For instance, if your oversized array is 50 rows and 12 columns you need not change its size unless your spilled array suddenly grows from say 7 columns to 14). The elegance of this method over directly hardcoding the value is that if you have many conditional formatting rules, by entering "=named_range" for the applies to range in each rule, you both eliminate the possibility of inconsistent application ranges and have a visual confirmation when entering each rule that the value you are setting is exactly what you intend (rather than something like "=worksheet1!$A$23:$H$79"). Furthermore, by programmatically defining the oversized array, you can make it as "small" as reasonable, thereby avoiding having conditional formatting apply to many unused cells (such as the whole worksheet). At this point, the next computation minimization occurs - a guard clause for each condition is specified such that any cell in CF_OVERSIZED_ARRAY outside of the Spilled Array Range immediately returns a false and no further condition checking is performed. The general formula for checking if a cell is within the Spilled Array Range is as follows along with an example of a guard clause: is_within_array = LAMBDA( range, LET( start_row, ROW(TAKE(range, 1, 1)), start_col, COLUMN(TAKE(range, 1, 1)), AND( ROW() >= start_row, ROW() < start_row + ROWS(range), COLUMN() >= start_col, COLUMN() < start_col + COLUMNS(range) ) ) ); is_in_row_of_array = LAMBDA( range, start_index, [end_index], IF( NOT(is_within_array(range)), FALSE, LET( Now that the basic structure has been established, a number of workhorse functions are established: is_in_row_of_array - cell is within a specified row or range of rows in the spilled array such as "is in row 1" or "is in rows 4 through 6", is_in_col_of_array - cell is within a specified column or range of columns in the spilled array such as "is in column 1" or "is in columns 4 through 6", is_in_slice_of_array - cell is with a specified contiguous portion of the spilled array such as "is between rows 5 through 7 and columns 2 through 12" is_in_interval_of_array - cell is in set of every N rows or N columns such as "is one of every other row" or "is one of every third column" is_in_recurring_band_of_rows - cell is in a recurring grouping of rows such as "is 2nd and 3rd row of every group of 4 rows" is_in_recurring_band_of_cols - cell is in a recurring grouping of columns such as "is last column of every group of 7 columns" Here is an example function: is_in_col_of_array = LAMBDA( range, start_index, [end_index], IF( NOT(is_within_array(range)), FALSE, LET( num_cols, COLUMNS(range), current_col, COLUMN() - COLUMN(TAKE(range, 1, 1)) + 1, start_resolved, IF(start_index > 0, start_index, num_cols + start_index + 1), end_resolved, IF( ISOMITTED(end_index), start_resolved, IF( end_index > 0, end_index, num_cols + end_index + 1 ) ), final_start, MIN(start_resolved, end_resolved), final_end, MAX(start_resolved, end_resolved), AND(current_col >= final_start, current_col <= final_end) ) ) ); On top of the basic structure, convenience functions are created - the names of which should indicate how they solve common formatting needs: alternate_cols alternate_rows is_in_first_row_of_recurring_band_of_rows is_in_last_row_of_recurring_band_of_rows is_in_first_col_of_recurring_band_of_cols is_in_last_col_of_recurring_band_of_cols is_in_header_of_col is_in_last_row_of_col is_in_first_data_row_of_col is_between_first_and_last_row_of_col One major benefit flowing from this design is that these conditions are composable, e.g. alternate_cols * is_in_header_of_col would give you alternating formatting on the headers of a spilled array, such as an array with empty columns between each of the substantive columns. While I do not promise that the following formulas are perfect, what I can say is that they presently permit me to write rules like this: =cf.is_in_row_of_array(ins_rep.dynamic_array, 1)*cf.alternate_cols(ins_rep.dynamic_array,FALSE) =cf.is_in_first_data_row_of_col(ins_rep.dynamic_array,9) =cf.is_between_first_and_last_row_of_col(ins_rep.dynamic_array, 9,TRUE) =cf.is_in_last_row_of_col(ins_rep.dynamic_array,9) Which effectively gives me a rule for all headers (shading with underlining and bold), the ability to set the first data cell in column 9 to be a dollar format, the last cell in column 9 to be a dollar format with a top border, and all of the other cells in column 9 to be simple integers. So something like this (imagine the first cell is shaded as described and the last cell has a top border) is what I get for column 9 of a dynamically generated and spilled array: Fees $175 175 175 175 175 175 175 175 175 $1,575 Please let me know if you have found any other ways to address the problem of the Conditional Formatting Range being disconnected from the Spilled Array Range. I'm happy to answer any questions about my method or formulas, so feel free to ask. I'd also appreciate any feedback/suggestions/improvements on my idea/formulas. Here are the complete formulas (I have them saved within Excel Labs Advanced formula environment in separate modules) // _range module create_expanded_from_subset_of_containing_range = LAMBDA( base_array, desired_height, container_array, [desired_width], LET( req_width, IF(ISOMITTED(desired_width), COLUMNS(base_array), desired_width), /* --- Resolve anchors (references, not values) --- */ base_array_first_cell, INDEX(base_array, 1, 1), base_array_first_row, ROW(base_array_first_cell), base_array_first_col, COLUMN(base_array_first_cell), container_array_first_cell, INDEX(container_array, 1, 1), container_array_first_row, ROW(container_array_first_cell), container_array_first_col, COLUMN(container_array_first_cell), container_array_rows, rows(container_array), container_array_cols, columns(container_array), idx_base_in_container_first_row, base_array_first_row - container_array_first_row +1, idx_base_in_container_first_col, base_array_first_col - container_array_first_col +1, idx_base_in_container_last_row, idx_base_in_container_first_row + desired_height - 1, idx_base_in_container_last_col, idx_base_in_container_first_col + req_width - 1, base_in_container, and( idx_base_in_container_first_row > 0, idx_base_in_container_first_row <= idx_base_in_container_last_row, idx_base_in_container_last_row <= container_array_rows, idx_base_in_container_first_col > 0, idx_base_in_container_first_col <= idx_base_in_container_last_col, idx_base_in_container_last_col <= container_array_cols ), result, IF( NOT(base_in_container), NA(), LET( start_ref, INDEX(container_array, idx_base_in_container_first_row, idx_base_in_container_first_col), end_ref, INDEX(container_array, idx_base_in_container_last_row, idx_base_in_container_last_col), start_ref:end_ref ) ), result ) ); //cf module is_within_array = LAMBDA( range, LET( start_row, ROW(TAKE(range, 1, 1)), start_col, COLUMN(TAKE(range, 1, 1)), AND( ROW() >= start_row, ROW() < start_row + ROWS(range), COLUMN() >= start_col, COLUMN() < start_col + COLUMNS(range) ) ) ); is_in_row_of_array = LAMBDA( range, start_index, [end_index], IF( NOT(is_within_array(range)), FALSE, LET( num_rows, ROWS(range), current_row, ROW() - ROW(TAKE(range, 1, 1)) + 1, start_resolved, IF(start_index > 0, start_index, num_rows + start_index + 1), end_resolved, IF(ISOMITTED(end_index), start_resolved, IF(end_index > 0, end_index, num_rows + end_index + 1)), final_start, MIN(start_resolved, end_resolved), final_end, MAX(start_resolved, end_resolved), AND(current_row >= final_start, current_row <= final_end) ) ) ); is_in_col_of_array = LAMBDA( range, start_index, [end_index], IF( NOT(is_within_array(range)), FALSE, LET( num_cols, COLUMNS(range), current_col, COLUMN() - COLUMN(TAKE(range, 1, 1)) + 1, start_resolved, IF(start_index > 0, start_index, num_cols + start_index + 1), end_resolved, IF(ISOMITTED(end_index), start_resolved, IF(end_index > 0, end_index, num_cols + end_index + 1)), final_start, MIN(start_resolved, end_resolved), final_end, MAX(start_resolved, end_resolved), AND(current_col >= final_start, current_col <= final_end) ) ) ); is_in_slice_of_array = LAMBDA( range, start_row, start_col, [end_row], [end_col], [include_slice_neg1_to_exclude], if( not(is_within_array(range)), FALSE, LET( final_end_row, IF(ISOMITTED(end_row), start_row, end_row), final_end_col, IF(ISOMITTED(end_col), start_col, end_col), row_match, is_in_row_of_array(range, start_row, final_end_row), col_match, is_in_col_of_array(range, start_col, final_end_col), selection, AND(row_match, col_match), mode, IF(ISOMITTED(include_slice_neg1_to_exclude), 1, include_slice_neg1_to_exclude), IF(mode = -1, NOT(selection), selection) ) ) ); is_in_interval_of_array = LAMBDA( range, row_interval, col_interval, [start_at_row], [start_at_col], [include_interval_neg1_to_exclude], if( not(is_within_array(range)), FALSE, LET( row_idx, ROW() - ROW(TAKE(range, 1, 1)) + 1, col_idx, COLUMN() - COLUMN(TAKE(range, 1, 1)) + 1, start_row, IF(ISOMITTED(start_at_row), 1, start_at_row), start_col, IF(ISOMITTED(start_at_col), 1, start_at_col), row_match, IF(row_interval <= 1, TRUE, MOD(row_idx - start_row, row_interval) = 0), col_match, IF(col_interval <= 1, TRUE, MOD(col_idx - start_col, col_interval) = 0), selection, AND(row_match, col_match), mode, IF(ISOMITTED(include_interval_neg1_to_exclude), 1, include_interval_neg1_to_exclude), IF(mode = -1, NOT(selection), selection) ) ) ); alternate_cols = lambda( array, [start_with_even_df_TRUE], is_in_interval_of_array(array,1,2,,1+if(isomitted(start_with_even_df_TRUE),1,start_with_even_df_TRUE)) ); alternate_rows = lambda( array, [start_with_even_df_TRUE], is_in_interval_of_array(array,2,1,1+if(isomitted(start_with_even_df_TRUE),1,start_with_even_df_TRUE)) ); is_in_recurring_band_of_rows = LAMBDA( range, rows_in_pattern, first_row_in_band, [band_thickness], [include_selected_df_TRUE], IF( NOT(is_within_array(range)), FALSE, LET( relative_row, ROW() - ROW(TAKE(range, 1, 1)), row_in_pattern, MOD(relative_row, rows_in_pattern) + 1, actual_thickness, IF(ISOMITTED(band_thickness), 1, band_thickness), is_in_band, AND( row_in_pattern >= first_row_in_band, row_in_pattern <= (first_row_in_band + actual_thickness - 1) ), include_mode, IF(ISOMITTED(include_selected_df_TRUE), TRUE, include_selected_df_TRUE), IF(include_mode, is_in_band, NOT(is_in_band)) ) ) ); is_in_first_row_of_recurring_band_of_rows = lambda( range, rows_in_pattern, [include_selected_df_TRUE], is_in_recurring_band_of_rows(range, rows_in_pattern, 1, 1, include_selected_df_TRUE) ); is_in_last_row_of_recurring_band_of_rows = lambda( range, rows_in_pattern, [include_selected_df_TRUE], is_in_recurring_band_of_rows(range, rows_in_pattern, rows_in_pattern, 1, include_selected_df_TRUE) ); is_in_recurring_band_of_cols = LAMBDA( range, cols_in_pattern, first_col_in_band, [band_thickness], [include_selected_df_TRUE], IF( NOT(is_within_array(range)), FALSE, LET( relative_col, COLUMN() - COLUMN(TAKE(range, 1, 1)), col_in_pattern, MOD(relative_col, cols_in_pattern) + 1, actual_thickness, IF(ISOMITTED(band_thickness), 1, band_thickness), is_in_band, AND( col_in_pattern >= first_col_in_band, col_in_pattern <= (first_col_in_band + actual_thickness - 1) ), include_mode, IF(ISOMITTED(include_selected_df_TRUE), TRUE, include_selected_df_TRUE), IF(include_mode, is_in_band, NOT(is_in_band)) ) ) ); is_in_first_col_of_recurring_band_of_cols = LAMBDA( range, cols_in_pattern, [include_selected_df_TRUE], is_in_recurring_band_of_cols(range, cols_in_pattern, 1, 1, include_selected_df_TRUE) ); is_in_last_col_of_recurring_band_of_cols = LAMBDA( range, cols_in_pattern, [include_selected_df_TRUE], is_in_recurring_band_of_cols(range, cols_in_pattern, cols_in_pattern, 1, include_selected_df_TRUE) ); is_in_header_of_col = LAMBDA( array, [column_no], IF(ISOMITTED(column_no), is_in_row_of_array(array, 1), is_in_slice_of_array(array, 1, column_no) ) ); is_in_last_row_of_col = LAMBDA( array, [column_no], IF(ISOMITTED(column_no), is_in_row_of_array(array, -1), is_in_slice_of_array(array, -1, column_no) ) ); is_in_first_data_row_of_col = LAMBDA( array, [column_no], IF(ISOMITTED(column_no), is_in_row_of_array(array, 2), is_in_slice_of_array(array, 2, column_no) ) ); is_between_first_and_last_row_of_col = lambda( array, [column_no], [exclude_first_data_row_df_FALSE], is_in_slice_of_array( array, if(isomitted(exclude_first_data_row_df_FALSE),FALSE,exclude_first_data_row_df_FALSE)+2, if(isomitted(column_no),1,column_no), -2, if(isomitted(column_no),-1,column_no), ) );148Views1like7CommentsRe: CountA, Unique & Filter Combined Formula Query
This might be a little wordy for you, but try this formula instead: =LET( count_range,A1:A5, criteria1,(A1:A5)="Check", criteria2,(A1:A5)="Check2", filtered_results, FILTER( A1:A5, criteria1*criteria2 ), result, IF( ISERROR(filtered_results), 0, COUNTA(UNIQUE(filtered_results)) ), result ) The basic idea is that COUNTA effectively acts as a cell counter regardless of the contents of that cell. Given that a filter function returns a "#CALC!" error when there are no matching results (you did not choose a default value in your filter function), this means that when you do not match both of your criteria, the FILTER will return the calc error, then UNIQUE will return the calc error, and then COUNTA will count the cell containing the error value as 1. If you don't want an error value to count with COUNTA, I think you are out of luck. Something like =SUM(IF(NOT(ISERROR(A1:A5)),1)) would work.161Views0likes0CommentsRe: Advice Needed: Performing complicated nested if calculation
Is it possible to give a small sample of your data and expected outcomes? Rather than thinking in terms of nested "if" type statements, I strongly recommend you think in terms of "flow", i.e. "given data that looks like X, I should treat it in Y way." Two major ways in excel to express a thought like that are the "ifs" formula and the "switch" formula, depending upon your particular circumstance. Think of the "ifs" being what you use when the data you are considering is not simply different values of the same variable name. Because you mention multiple sources/types of data that would be relevant to how you perform your calculations, it is unlikely that a switch would provide any conceptual benefits or readability enhancements. For instance, a general description for your various scenarios using a flow-type approach might be "When these condition are met, apply this formula. When those conditions are met, apply that formula. When a third set of conditions are met, apply a third formula, and when none of these specified scenarios are present, carry on using a fourth formula." So now you just need to figure out what your conditions are and what you want to do (which is where you started). Question 1: Where is the data related to your conditions? If, for instance, you have a data table which includes independent variables, e.g. the date rate A changed, the date rate B changed, and no relation between A and B or the dates of change, things are different than if you have a complicated relationship between A and B, the dates of change, and the information is all over the place. In the simplest case, you might have a data table with headers like "rate_change_date", "rate_type", "new_rate" which includes all rates from inception of your data through the end of your data. In such a case, you might answer the question "Did a rate change during the period?" by an uncomplicated date filter. You should try to focus on the question of your conditions and structure data independent of your questions about what/how to do a calculation so that you have a clear logical structure in place that makes sense and is usable regardless of the calculation you end up needing to do. For instance: get_rate_changes_during_period = LAMBDA(period_start, period_end, FILTER( rate_change_table, (rate_change_table[date] <= period_end) * (rate_change_table[date] >= period_start), "" ) ) you might also have helper formulas like: changed_rate_data_row = LAMBDA(rate_type, period_start, period_end, FILTER( rate_change_table, (rate_change_table[rate_type] = rate_type) * (rate_change_table[date] <= period_end) * (rate_change_table[date] >= period_start), FALSE ) ) prior_rate_data = LAMBDA(change_date, rate_type, TAKE( SORT( FILTER( rate_change_table, (rate_change_table[date] < change_date) * (rate_change_table[type] = rate_type), FALSE ) ), -1 ) ) days_in_period_pre_and_post = LAMBDA(change_date, HSTACK( DAYS(change_date), EOMONTH(change_date, 0) - change_date + 1 ) ) And then you start your formula with something like: =LET( rate_change_table, YOUR_TABLE_REFERENCE_HERE, target_rate_type, EITHER_TYPE_A_OR_TYPE_B, period_start_date, DATE(YEAR(2021), MONTH(1), DAY(1)), period_end_date, EOMONTH(period_start_date, 0), // Define the rate change row for the target type within the period period_change_row, changed_rate_data_row(target_rate_type, period_start_date, period_end_date), target_change_date, CHOOSECOLS(period_change_row, 1), // Retrieve the prior rate row for the target type before the change date prior_rate_row, prior_rate_data(target_change_date, target_rate_type), // Calculate the days of prior and post-rate within the period days_of_prior_rate_and_post_rate, days_in_period_pre_and_post(target_change_date), // Select prior and changed rates prior_rate, CHOOSECOLS(prior_rate_row, 3), changed_rate, CHOOSECOLS(period_change_row, 3), // Continue with your subsequent logic here... ) At this point, you have your relevant rate data: The date of change Number of days in the month at that rate (assuming only one change per month) The rate at the beginning of the month and after it changed Next, follow a similar approach to retrieve any additional data conditions you might need, such as whether sales reached a certain percentage of total revenues in the period. Once you have all your conditions, you can combine them in a structured IFS formula as follows: IFS( AND(period_change_row, sales_to_revenues_percentage > 0.4), calculation_for_tier_1(period_change_row, prior_rate_row, days_of_prior_rate_and_post_rate), AND(period_change_row, sales_to_revenues_percentage > 0.3), calculation_for_tier_2(period_change_row, prior_rate_row, days_of_prior_rate_and_post_rate), NOT(period_change_row), calculation_for_no_rate_change_in_period, TRUE, "Invalid data" ) Explanation of Logic First Condition: If period_change_row exists (indicating a rate change) and sales_to_revenues_percentage exceeds 40%, apply calculation_for_tier_1 with the relevant data. Second Condition: If period_change_row exists and sales_to_revenues_percentage exceeds 30% (but not 40%), apply calculation_for_tier_2. Third Condition: If there is no rate change (using NOT(period_change_row)), apply the calculation for periods with no rate change. Fallback Condition: If none of the conditions are met, return "Invalid data". This formatting clarifies each logical tier and maintains the readability of your IFS formula, making it easier to troubleshoot and update as needed. By cleanly separating the following elements: Data Retrieval: Collecting all relevant data points needed for calculations Flow Paths: Defining the possible calculations or actions to perform Conditions: Identifying the specific data conditions that trigger each calculation you can create a robust structure that: Allows for easy modification and reuse Ensures future intelligibility for both yourself and others This approach provides a clear, modular structure where each component—data retrieval, conditions, and calculations—remains separate and straightforward. This independence simplifies both maintenance and adaptability, making each part easy to adjust or expand without disrupting the others.0Views1like0CommentsRe: Complex look up with output
Hi there, I can see you’re looking to cross-reference a table of training courses with a report of students’ completed courses, filtering dynamically to identify which courses each student is missing for their specific grade and level. Here’s a structured way to think through this in Excel: Step 1: Understand the Data Structure To solve this problem, we’ll use three tables: tbl_courses: Contains the course names and their required grade and level. tbl_courses_completed: Contains each student’s name and the courses they’ve completed. tbl_current_students: Contains the current student list with their grade and level. The goal here is to filter these tables to get a list of courses for each student that they still need to complete for their grade and level. Step 2: Define the Lookup Process For each student, you’ll want to: Find their grade and level: Using the student’s name in tbl_current_students, we can look up their specific grade and level. List Required Courses for the Grade and Level: Filter tbl_courses to retrieve the courses required for that grade and level. Identify Courses Already Completed: Look up the student in tbl_courses_completed to find the courses they’ve finished. Calculate Missing Courses: Compare the required courses with the completed courses and list any that are missing. Step 3: Build the Formula Here’s an outline of a potential formula approach: Determine Grade and Level: Use XLOOKUP to find the student’s grade and level based on their name. Get Required Courses: Use FILTER to list courses from tbl_courses that match the student’s grade and level. Filter Out Completed Courses: With FILTER or MATCH, identify courses in this list that aren’t found in tbl_courses_completed for that student. Display Missing Courses: If any courses remain, output them as a list for that student. This approach could be set up in a third sheet with dropdown menus to select the student, grade, and level dynamically, displaying a list of any missing courses for easy review. You could elect to use this formula on a case by case basis (change the name in the target cell and have the results displayed) or you could stack it together for use later (example at end). Step 1: define the data structure first table with course names and course levels is "tbl_courses" and it has two columns, "course_name" and "course_level". second table with student names and completed courses is "tbl_courses_completed" and it has three columns, "student_name", "course_name", and "course_grade_level". third table with current student names and grade level is "tbl_current_students" and it has two columns "name" and "grade_level". Step 2. 1 - Find their grade and level: current_student_grade = xlookup(student_name, tbl_current_students[name], tbl_current_students[grade_level]) Step 2.2 - List Required Courses for the Grade and Level classes_for_current_student_grade = filter(tbl_classes[course_name],tbl_classes[grade_level]=current_student_grade), Step 2.3 - Identify Courses Already Completed current_student_grade_level_classes_taken = filter(tbl_classes[course_name], (tbl_courses_completed[student_name]=current_student_name)*(tbl_courses_completed[course_grade_level]=current_student_grade)) Step 2.4 Calculate Missing Courses has_taken_class_names_in_grade_boolean_mask = byrow(classes_for_current_grade, lambda(class_name, isnumber(match(current_class, current_student_grade_level_classes_taken,0))) current_student_grade_level_classes_not_taken = filter(classes_for_current_grade, not(has_taken_class_names_in_grade_boolean), "all_classes_taken") lambda form for finding missing classes for a particular student: get_missing_classes = lambda( student_name, let( current_student_grade, xlookup( student_name, tbl_current_students[name], tbl_current_students[grade_level] ), classes_for_current_student_grade, filter( tbl_classes[course_name], tbl_classes[grade_level]=current_student_grade ), current_student_grade_level_classes_taken, filter( tbl_classes[course_name], ( (tbl_courses_completed[student_name]=current_student_name)* (tbl_courses_completed[course_grade_level]=current_student_grade) ), has_taken_class_names_in_grade_boolean_mask, byrow( classes_for_current_grade, lambda( class_name, isnumber( match( current_class, current_student_grade_level_classes_taken, 0 ) ) ) ), current_student_grade_level_classes_not_taken, filter( classes_for_current_grade, not(has_taken_class_names_in_grade_boolean), "all_classes_taken" ), current_student_grade_level_classes_not_taken ) ); Now let's say you want to use this formula to populate an easily referenced data set, one way to do that is to put it into a table where the student's name and grade are appended to each missing class name, thereby letting you filter by class, grade, or name, and create useful sets of data like, "grade 3 students who need algebra" by filtering the grade column for 3 and the class name column for algebra. Here is one possible implementation: =let( list_of_students, tbl_current_students[name], number_of_students, rows(list_of_students), list_of_student_indices, sequence(number_of_students), get_student_name, lambda(idx, index(list_of_students, idx, 1)), get_student_grade, lambda( student_name, xlookup( student_name, tbl_current_students[name], tbl_current_students[grade_level] ) ), get_missing_classes_for_student, lambda( current_student_name, current_student_grade, let( classes_for_current_student_grade, filter( tbl_courses[course_name], tbl_courses[grade_level] = current_student_grade ), current_student_grade_level_classes_taken, filter( tbl_courses_completed[course_name], ( (tbl_courses_completed[student_name] = current_student_name) * (tbl_courses_completed[course_grade_level] = current_student_grade) ) ), has_taken_class_names_in_grade_boolean_mask, byrow( classes_for_current_student_grade, lambda( class_name, isnumber( match( class_name, current_student_grade_level_classes_taken, 0 ) ) ) ), current_student_grade_level_classes_not_taken, filter( classes_for_current_student_grade, not(has_taken_class_names_in_grade_boolean_mask), "all_classes_taken" ), current_student_grade_level_classes_not_taken ) ), append_grade_and_name_to_missing_class_list, lambda( student_idx, let( student_name, get_student_name(student_idx), student_grade, get_student_grade(student_name), missing_classes, get_missing_classes_for_student(student_name, student_grade), hstack( expand(student_name, rows(missing_classes), student_name), expand(student_grade, rows(missing_classes), student_grade), missing_classes ) ) ), get_current_student_missing_class_array, reduce( append_grade_and_name_to_missing_class_list(index(list_of_students, 1, 1)), drop(list_of_student_indices, 1), lambda( acc, student_idx, vstack( acc, append_grade_and_name_to_missing_class_list(student_idx) ) ) ), headers, hstack("Student Name", "Grade Level", "Missing Class"), result, vstack( headers, get_current_student_missing_class_array ), result ) Let me know if any of this doesn't make sense or if there is an error somewhere. I didn't test the formulas, so some minor tweaks may be required. [Edit: When I initially posted this reply, I indicated that ChatGPT wrote the intro after a bit of prompting, but it seems that people might have understood that GPT wrote the whole post or designed the solution. All ChatGPT did was make my initial narrative a little more friendly/accessible. At this stage, it is utterly incapable of designing/crafting/effectively implementing the solution I provided. I strongly caution you against taking ChatGPT's excel solutions at face value or assuming that they are optimized/ideal by any measure.]0Views0likes2CommentsRe: Complex look up with output
ChatGPT had no idea how to do it. It just cleaned up the introduction a bit. ChatGPT is actually pretty bad at Excel (at least it is bad at functional programming in Excel). Tell me exactly what it is you need it to do and I can probably expand the formula as necessary.46Views1like0CommentsRe: Excel - sorting rows by balance due
Generally speaking, the answer to such questions depends on what your use case is - do you just need to see the open balances or do you need to be able to interact with the rows that have the open balances? If you just need to see them, you can write a dynamic array formula that uses sortby, filter, etc. and set up your data view however makes sense. If you need to edit the data, that is more complicated since there is not, so far as I know, a way to have data in a table automatically sort as you add/change data. The solution for viewing/editing the data in place (outside of a VBA type solution or other more "outside" excel tools as NikolinoDE suggested) is to manually update your sort. This can be accomplished in a couple of ways, but a relatively simple way (I believe PeterBartholomew1 uses this one) is to concatenate the columns you wish to sort by into a single helper column and then sort by that column. I believe he uses a character like "-" or some such to split the text strings and takes advantage of textjoin, but I could be wrong. Despite the helper column, I tend to think that this method is probably the most conceptually efficient as involves a single, straightforward join formula combined with sorting as needed on the helper column. Keep in mind that Peter offered this solution before sortby was generally available. What I would likely do, depending on the particular need, is write a formula to assign an integer to a helper column representing the sort order of that row and then sort by the helper column. So, for instance, I might take the table range which includes the columns I wish to sort by and includes an additional "primary id" or "control number" type column, i.e. a column with relatively simple data to search and find unique values, preferably sortable. So if you don't have a primary column already, just add one, number it from 1 to whatever, but make sure the values are fixed (meaning you copy and paste the integer values if you number the columns using a formula). Then do something like use sortby for the columns of interest, and then add a similar control number to the sorted array. You then xlookup the control number for the original array in the sorted array and return the sorted array's control number. After that setup, from then on you can simply sort by the column with the sorted array's control number and get your current sorted state. Only thing you have to be sure to do is to continue adding control numbers to each new row of an array. Note that the general difference between my approach and Peters (besides my taking advantage of the new sortby forumula) is that he sorts by the data directly whereas I create an "id" that sort of hides the data behind something that looks prettier. In both cases, the impact is the same - you are adding a new column to your data array that you have to manually sort by every time you want the most up-to-date ordering of your data. So say you have an array named "sample_array" with these headers and values: orig_unsorted_id color amount balance 1 0 2 5 3 0 4 25 5 0 you add a column called "sorted_id" and insert the following formula (I used a match to get the column index for a header in case you aren't using excel's table feature - if you are, you could just use, e.g., sample_array[orig_unsorted_id]): =LET( target_orig_id, [@[orig_unsorted_id]], source_array, sample_array[#All], primary_id_col_name, "orig_unsorted_id", sort_by_col_name, "balance", source_array_data, DROP(source_array, 1), source_array_headers, TAKE(source_array, 1), balance_col, CHOOSECOLS(source_array_data, MATCH(sort_by_col_name, source_array_headers, 0)), orig_id_col, CHOOSECOLS(source_array_data, MATCH(primary_id_col_name, source_array_headers, 0)), sorted_orig_id_col, SORTBY(orig_id_col, balance_col, -1), return_array, SEQUENCE(ROWS(source_array_data)), XLOOKUP(target_orig_id, sorted_orig_id_col, return_array, "check_values") ) you should end up with: orig_unsorted_id color amount balance sort_order 1 0 3 2 5 2 3 0 4 4 25 1 5 0 5 you then sort by the "sorted_id" column. Every time you update a value or add a new row, you just have to re-sort by the sorted_id column. You should get the following after the first sort: orig_unsorted_id color amount balance sort_order 4 25 1 2 5 2 1 0 3 3 0 4 5 0 5 You can enhance this sort method as necessary, including filtering the original array before sorting, filtering it after and resorting, etc.1View0likes0CommentsRe: How to Append Tables with Different Column Order with Automatic Refresh
You can either reorder them on import, reorder them once imported, or reorder them when combining. No matter when it happens, it will involve assigning the source columns to the correct final location. Theoretically you could just sort the columns alphabetically if they had the same names/column count, but you said that they come in different column widths. So your best bet is really to find the most efficient way of doing your mapping and adapting that mapping to the method you pick for reordering. I'm sure people here would recommend power query for the task because it is likely the easiest choice for you, but it doesn't sound like something you have used before.149Views0likes0CommentsRe: How to Append Tables with Different Column Order with Automatic Refresh
Your process sounds very manual, so I am imaging that somewhere in your wookbook you have the four "source" tables that you update periodically and one combined table somewhere in a worksheet. What you need is a programmatic way to combine them on a sheet. The basic idea is that you need to make the columns be able to be stacked properly. I have no idea how many columns you have, but regardless, you need to form a "mapping" between what you have and what you want. If your columns are always fixed, it is just the row data that changes, the problem is fairly simple - just directly map things like: vstack(choosecols(table1, 3), choosecols(table2,6), choosecols(table3,4), choosecols(table4,2)). In this way, you'll get a single column with each of your source table's related data included. A second choice is to take advantage of standardizing names. For instance, you can add a row above your source tables where you write down the standard name of each column. You then use a match based approach to put the columns into a vstack, e.g. "purple", "green", "blue", "red" for table 1 "apple","plum","blueberry" for table 2 you create a mapping for table 2, say "red","purple","blue" then you get the proper column using xlookup: xlookup("red",table_2_mapping,table_2) So now you can just stack: vstack( xlookup("red",table_1_headers,table_1) xlookup("red",table_2_mapping, table2) etc. ) Obviously the more involved your column names, the more complicated the mapping. Well structured data and consistent naming will go a long way to making your situation easier.16Views0likes0CommentsRe: Loan Amortization Schedule
And p.s. this entire sheet can be made with a single formula if that is your thing, but the way I write it would require you to use a table for the additional payments. p.p.s. here you go. The attached file has both an incell formula version and a named lambda version. There is room for change, certainly, but I think it should function on a base level. =LET( total_periods, months_of_payments + years_of_payments * 12, monthly_rate, IF( OR(ISBLANK(effective_rate), ISOMITTED(effective_rate)), yearly_stated_rate / 12, (1 + effective_rate) ^ (1 / 12) - 1 ), adj_future_value, IF(OR(ISOMITTED(future_value), ISBLANK(future_value)), 0, future_value), regular_payment, PMT(monthly_rate, total_periods, present_value, adj_future_value), is_priniciple_reduction, present_value > future_value, adj_payment, IF(is_priniciple_reduction, -regular_payment, regular_payment), has_additional_payments, OR( NOT(ISBLANK(additional_payment_amount_col)), NOT(ISOMITTED(additional_payment_amount_col)) ), has_principal_increases, OR( NOT(ISBLANK(principal_increase_amount_col)), NOT(ISOMITTED(principal_increase_amount_col)) ), period_column, SEQUENCE(total_periods), date_column, EOMONTH(start_date, SEQUENCE(period_column, , 0)), calcCurPeriod, LAMBDA(cur_end_of_month_date, opening_balance, LET( cur_additional_payments, IF( has_additional_payments, FILTER( additional_payment_amount_col, (additional_payment_date_col <= cur_end_of_month_date) * (additional_payment_date_col >= EOMONTH(cur_end_of_month_date, -1)), 0 ), 0 ), cur_periodic_interest, opening_balance * monthly_rate, cur_increases_to_principal, IF( has_principal_increases, FILTER( principal_increase_amount_col, principal_increase_date_col <= cur_end_of_month_date * principal_increase_date_col > EOMONTH(cur_end_of_month_date, -1) ), 0 ), cur_payment, MIN( (opening_balance + cur_periodic_interest + cur_increases_to_principal), -regular_payment ), cur_scheduled_payment_amount_applied_to_principal, cur_payment - cur_periodic_interest, cur_total_payments, cur_payment + cur_additional_payments, cur_total_increases_to_principal, cur_periodic_interest + cur_increases_to_principal, closing_balance, ROUND( opening_balance - cur_total_payments + cur_total_increases_to_principal, 2 ), values_through_additional_payments, HSTACK( cur_end_of_month_date, opening_balance, -regular_payment ), values_through_increases_to_principal, IF( has_additional_payments, HSTACK( values_through_additional_payments, cur_additional_payments, cur_periodic_interest, cur_scheduled_payment_amount_applied_to_principal ), HSTACK( values_through_additional_payments, cur_periodic_interest, cur_scheduled_payment_amount_applied_to_principal ) ), values_through_closing_balance, IF( has_principal_increases, HSTACK( values_through_increases_to_principal, cur_increases_to_principal, closing_balance ), HSTACK(values_through_increases_to_principal, closing_balance) ), values_through_closing_balance ) ), payment_schedule_headers_prior_to_additional_payments, HSTACK( "period", "end_date", "opening_balance", "scheduled_payment_amount" ), payment_schedule_headers_prior_to_principal_increases, IF( has_additional_payments, HSTACK( payment_schedule_headers_prior_to_additional_payments, "additional_payments_applied_to_principal", "periodic_interest", "scheduled_payment_applied_to_principal" ), HSTACK( payment_schedule_headers_prior_to_additional_payments, "periodic_interest", "scheduled_payment_applied_to_principal" ) ), payment_schedule_headers_through_closing_balance, IF( has_principal_increases, HSTACK( payment_schedule_headers_prior_to_principal_increases, "increases_to_principal", "closing_balance" ), HSTACK(payment_schedule_headers_prior_to_principal_increases, "closing_balance") ), basic_schedule, DROP( REDUCE( present_value, period_column, LAMBDA(acc, cur_period, IF( TAKE(acc, -1, -1), VSTACK( acc, HSTACK( cur_period, calcCurPeriod(EOMONTH(start_date, cur_period - 1), TAKE(acc, -1, -1)) ) ), acc ) ) ), 1 ), result, VSTACK( payment_schedule_headers_through_closing_balance, FILTER(basic_schedule, ISNUMBER(CHOOSECOLS(basic_schedule, 3)), "") ), result )7Views0likes0CommentsRe: Loan Amortization Schedule
Do you have a particular format in mind? If I were to do it across multiple cells, it would go something like principle amount = interest_rate = periods = ordinary_payment = pmt() first_date = date = edate(first_date,sequence(periods) scheduled_payment = if(prior_balance>0,ordinary_payment,0) additional_payment = (whatever value you want) current_period_interest = if(prior_balance>0, prior_balance*interest_rate),0 total_principle_payment = additional_payment+(schedule_payment-current_period_interest) closing_balance = prior_balance-total_principle_payment total_scheduled_payments = sumifs(scheduled_pmts_col, date_col<=date) total_additional_payments = sumifs(additional_payments, date_col<=date) total_interest_paid = sumifs(interest_col, scheduled_pmts_col, date_col<=date) total_principle_paid = total_additional_payments + (total_scheduled_payments-total_interest_paid) Where there are blanks after the equal sign, that means you can either a) name the cell and enter a vale in that cell or b) insert a reference to the value you want (like the cell next to it). I would strongly suggest using a table so that you can sort and what not, but you can use the formulas described in just an ordinary sheet. If you use a table for the additional payments, rather than entering a value/reference for each additional payment, you can do additional_payment = sumifs(additional_payment_amt_acol, (other_date_col<=date)*(other_date_col>edate(date,-1))) The only downside of this method is that if you are interested in more precise interest adjustments based on the date of the additional payment, this formula won't do it (but could be modified if you needed it to). Any other solution is really just a re-wrapping of this suggestion. You can always calculate the periodic payment in advance, but interest must be calculated based upon the outstanding balance from the prior period. Once you know that number, you can determine how much of the scheduled payment reduced principle. If you want to make additional payments, you just reduce the principle further. And in case it is non-obvious, the reason for the dynamic formula for the date is that it will automatically change size based upon the criteria you use. If you just have the remaining formulas reference the date col (e.g. $a$5#) and place them next to it, you can fill down all of the formulas to match the size. You just need to delete everything after the first row of formulas if you want to repeat the fill down. All of the other cell references can be normal, like the prior balance is just E4 if your current ending balance would be E5. No need for offsets, indirects, or anything fancy. The first row would get a slightly different formula if you have to worry about headers, but that is not a big deal. Just change the prior balance reference to be conditional, i.e. if(not(isnumber(prior_balance)), original_balance, prior_balance).3Views0likes1CommentStacking row/columns of an array - some module formulas
I just wrote a few functions that should theoretically allow you to take an array, chunk it by rows or columns, and then stack them in the other dimensions, e.g. you have 10 rows by 3 columns and you want to stack them as a set of 5 rows by 6 columns. You should be able to use this to completely flatten an array (e.g. take each column and place them one under the other). It has optional parameters for separate fill values for rows and columns. I welcome any comments, suggestions, or alternative approaches. By way of contrast, if you had an array like: ctrl date end_date 1 1/1/2021 1/1/2021 2 1/15/2021 1/15/2021 3 2/2/2021 2/2/2021 Excel's native TOCOL function would flatten it into one column like this: =TOCOL(B4:D6) or =TOCOL(B4:D6,,FALSE) ctrl date end_date 1 44197 44197 2 44211 44211 or this =TOCOL(B4:D6,,TRUE) ctrl 1 2 date 44197 44211 end_date 44197 44211 whereas mine flattens it like this: =chunk.vStackChunksOfColumns(B4:D7,1) ctrl 1 2 date 44197 44211 end_date 44197 44211 or this =chunk.vStackChunksOfColumns(B4:D6,2) ctrl date 1 44197 2 44211 end_date 44197 44211 or this (which is effectively no change): =chunk.vStackChunksOfColumns(B4:D6,3) ctrl date end_date 1 44197 44197 2 44211 44211 hStackChunksOfRows = lambda( target_array, rows_per_slice, [no_columns_to_adj_slice_to_df_NO_ADJUSTMENT], [fill_value_for_added_columns_df_DBLQTS], [no_rows_to_adj_array_to_df_ROWS_PER_SLICE_TIMES_NO_OF_SLICES], [fill_value_for_added_rows_df_DBLQTS], let( rows_in_array, rows(target_array), cols_in_array, columns(target_array), no_of_slices, roundup(rows_in_array/rows_per_slice,0), target_slice_width, if( if( isomitted(no_columns_to_adj_slice_to_df_NO_ADJUSTMENT), FALSE, no_columns_to_adj_slice_to_df_NO_ADJUSTMENT ), no_columns_to_adj_slice_to_df_NO_ADJUSTMENT, cols_in_array ) , target_array_height, if( isomitted(no_rows_to_adj_array_to_df_ROWS_PER_SLICE_TIMES_NO_OF_SLICES), rows_per_slice * no_of_slices, no_rows_to_adj_array_to_df_ROWS_PER_SLICE_TIMES_NO_OF_SLICES ), fill_value_for_added_columns, if(isomitted(fill_value_for_added_columns_df_DBLQTS),"",fill_value_for_added_columns_df_DBLQTS), fill_value_for_added_rows, if(isomitted(fill_value_for_added_rows_df_DBLQTS),"",fill_value_for_added_rows_df_DBLQTS), adj_array, normalizeArray( target_array, target_array_height, target_slice_width, FALSE, fill_value_for_added_rows, fill_value_for_added_columns ), stack_iter, sequence(no_of_slices-1), stack_func, lambda( acc, slice_counter, hstack( acc, take(drop(adj_array,(slice_counter)*rows_per_slice),rows_per_slice) ) ), ifs( rows_in_array<=rows_per_slice, adj_array, rows_per_slice = 1, drop(reduce("",sequence(rows_in_array),lambda(acc, cur_row, hstack(acc, chooserows(adj_array,cur_row)))),,1), 1, reduce( take(adj_array,rows_per_slice), stack_iter, stack_func ) ) ) ); vStackChunksOfColumns = lambda( target_array, cols_per_slice, [no_rows_to_adj_slice_to_df_NO_ADJUSTMENT], [fill_value_for_added_rows_df_DBLQTS], [no_columns_to_adj_array_to_df_COLS_PER_SLICE_TIMES_NO_OF_SLICES], [fill_value_for_added_columns_df_DBLQTS], let( rows_in_array, rows(target_array), cols_in_array, columns(target_array), no_of_slices, roundup(cols_in_array/cols_per_slice,0), target_slice_height, if( if( isomitted(no_rows_to_adj_slice_to_df_NO_ADJUSTMENT), FALSE, no_rows_to_adj_slice_to_df_NO_ADJUSTMENT ), no_rows_to_adj_slice_to_df_NO_ADJUSTMENT, rows_in_array ) , target_array_width, if( isomitted(no_columns_to_adj_array_to_df_COLS_PER_SLICE_TIMES_NO_OF_SLICES), cols_per_slice*no_of_slices, no_columns_to_adj_array_to_df_COLS_PER_SLICE_TIMES_NO_OF_SLICES ), fill_value_for_added_rows, if(isomitted(fill_value_for_added_rows_df_DBLQTS),"",fill_value_for_added_rows_df_DBLQTS), fill_value_for_added_columns, if(isomitted(fill_value_for_added_columns_df_DBLQTS),"",fill_value_for_added_columns_df_DBLQTS), adj_array, normalizeArray( target_array, target_slice_height, target_array_width, TRUE, fill_value_for_added_rows, fill_value_for_added_columns ), stack_iter, sequence(,no_of_slices-1), stack_func, lambda( acc, slice_counter, vstack( acc, take(drop(adj_array,,(slice_counter)*cols_per_slice),,cols_per_slice) ) ), result, ifs( cols_in_array<=cols_per_slice, adj_array, cols_per_slice = 1, drop(reduce("",sequence(,cols_in_array),lambda(acc, cur_col, vstack(acc, CHOOSECOLS(adj_array,cur_col)))),1), 1, reduce( take(adj_array,,cols_per_slice), stack_iter, stack_func ) ), result ) ); normalizeArray = LAMBDA( target_array, final_height, // Desired final number of rows final_width, // Desired final number of columns fill_rows_first, // Should rows be filled first (default: TRUE) value_for_filling_rows, // Value to fill additional rows (default: "") value_for_filling_columns, // Value to fill additional columns (default: "") LET( // Get current array dimensions current_height, ROWS(target_array), current_width, COLUMNS(target_array), // Define row adjustment function (truncate or expand) row_func, IF( final_height <= current_height, LAMBDA(target_array_for_row_adjustment, TAKE(target_array_for_row_adjustment, final_height)), LAMBDA(target_array_for_row_adjustment, EXPAND(target_array_for_row_adjustment, final_height, , value_for_filling_rows)) ), // Define column adjustment function (truncate or expand) col_func, IF( final_width <= current_width, LAMBDA(target_array_for_col_adjustment, TAKE(target_array_for_col_adjustment, , final_width)), LAMBDA(target_array_for_col_adjustment, EXPAND(target_array_for_col_adjustment, , final_width, value_for_filling_columns)) ), // Apply adjustments adjusted_array, IF( fill_rows_first, row_func(col_func(target_array)), col_func(row_func(target_array)) ), adjusted_array ) );327Views1like2CommentsRe: Double Thunking Works Wonders!
Reduce merely requires a single array (of any dimension) as a return - it doesn't inherently limit the type of return (so far as I know). If you are willing to thunk thunks, your return stack can be fairly deep before you notice major slow downs. You could, for instance, turn each row of an array into a thunk and hstack them iteratively via reduce, thereby giving you a row of rows. You can write an access function along the lines of lambda( outer_address, row_of_thunks, lambda( inner_address, index( index( row_of_thunks, 1, outer_address )(), 1, inner_address ) ) In this way, thunks just become 2d buckets that could be anything at all and arrays are just the scaffolding for a basic thunk arrangement. Depending how you choose to write your functions, you can have each thunk in an array represent a different state you want to persist and pass it easily from one name scope to another. I've also been thinking a bit about the downside of just adding everything to a single 2 day array with more explicit contained array tracking as the first few columns, that way you can set a particular array as a variable name and have it extracted from the array as you require. Outside of the inherent limits of accessing an array and deconstructing it using take/drop/choosecols/etc., there is no overhead to maintaining the array and no stacking of functions and dependency chains for the data.106Views0likes6CommentsRe: Double Thunking Works Wonders!
PeterBartholomew1 My chunk function (sel.chunk) is just the combination of a slice by rows (sel.sliceRows) and a slice by columns (sel.sliceCols). The reason for all of the parameters is so that you can specify the starting and ending index for the row slice and the column slice using one of three methods: a) absolute values(i.e. references to the start of the dimension), (e.g. start at column at end at column 6) b) as values with reference to the end of the dimension, or (e.g. start at 8 columns from the end of the array and end at 3 columns from the end of the array), c) as values with reference to the other value (e.g. start at column 5 and end 3 columns later or end at column 9 and start 4 columns earlier). All of the index management is handled within the formula so you can almost think of it as a declarative type call - "Give me the slice with these dimensions". All of the validation/error checking happens within the final formula (sel.calcSliceStartAndEnd) which is dimension agnostic - you just specify the total dimension size and how the start and end indices should be determined. The default values and ordering of the parameters are aimed at the standard use case - you tell it the starting index and you tell it either the ending index or how many additional units of the dimension you want (start at row 4 end at row 8 or start at row 4 and include 5 rows), but if you don't specify an end or a start, that just means "from the start give me 8 rows" or "from row 8 give me to the end." This way more complicated dimension slicing can be specifically referenced if needed and otherwise ignored entirely. ---------an FYI on my naming approach/syntax ------------- Syntax: Functions in camel case, Variables in snake case, Classes (in other contexts) in pascal case, and Global variables in all caps. In a more nuanced setting, I'd also use leading underscores for "helper" type objects/functions. You don't see it in the "sel" module context, but I generally try to have functions that construct other functions start with "make", functions that retrieve something start with "get", and functions that are about novel value/object creation start with something like "calc". Excel naming approach: Module names should be intuitively related to the "user" type functions in the module (like "sel" stands for "select" and it includes the slicing and chunking functions), Function names plus the module context should give you a strong intuitive sense of what the function expects/returns, Variable names should essentially be self-documenting why they exist, and Parameter names should self-document what they expect and end with "_df_" if there is a default value followed by what the default value is where the default value should strongly suggest what a valid argument type would be (e.g. "_df_TRUE" for when a parameter expects a boolean or "_df_2" for when it expects an int/number"). My main motivation with parameter names is that excel provides no type hints or hover information for users, so the only way they will know what valid entries are for the function is to divine it from the parameter name, to refer to other documentation, or to go read the function definition, and I'd rather the user have minimal friction (or learning curve) when invoking a function.165Views0likes0CommentsRe: Double Thunking Works Wonders!
Patrick2788 (P.S. Updated chunk function to proper version) Funnily enough I've been working on something somewhat similar, but mine is solidly within the bucket of what I'd call chunking. The primary solution I'm currently working towards involves precalculating the spacial arrangement of the chunks and then making iterators by way of things like map, bycol, or byrow. For the most part I do fancy accumulating stuff inside of a reduce combined with an iter function, but back in the day I was using thunks of thunks as list like objects and then converting those lists into 2d arrays by use of a layout engine (which included array dimension normalization logic and permitted the combination of arrays with different dimensions without unanticipated fill values). The only reason I haven't stuck with the thunks is because you can't meaningfully persist them and their use case is basically limited to the exact formula you are currently constructing. What makes thunks helpful is that they are essentially functions and can be treated as values up until a context where you want them to be evaluated (accomplished by the "()" syntax). Any excel context which permits you to evaluate a function will give you access to the contents, so it all comes down to creatively playing the shell game of unwinding your thunks. Also, consider stacking thunks in a 2 column array where the first column is some sort of identifier and you use a selection mechanism that can select the thunk you want by the identifier - it is much easier than having to iter through each thunk to find out what it is. And for what it is worth, you can very easily create a thunk, calculate some value from that thunk, and then save that value as a thunk. If you accumulate those values with your thunk, you can think of it is a caching mechanism and take advantage of key value pairs. And just because, here is my latest iteration of an array chunker/slicer. Using this in an iter and then aggregating each chunk as a thunk is something even a punk could do. //sel module // for taking portions of other things. chunk = LAMBDA( target_array, [start_at_row], [end_at_row], [start_at_col], [end_at_col], [no_of_next_rows_to_include], [no_of_next_cols_to_include], [rows_to_stop_from_end_of_array], [cols_to_stop_from_end_of_array], [start_at_rows_before_end_of_slice], [start_at_cols_before_end_of_slice], [start_at_rows_before_end_of_array], [start_at_cols_before_end_of_array], [accept_partial_df_FALSE], [return_first_row_if_no_start_df_FALSE], [return_first_col_if_no_start_df_FALSE], [return_last_row_if_no_end_df_FALSE], [return_last_col_if_no_end_df_FALSE], LET( sliced_rows, sliceRows( target_array, start_at_row, end_at_row, no_of_next_rows_to_include, rows_to_stop_from_end_of_array, start_at_rows_before_end_of_slice, start_at_rows_before_end_of_array, accept_partial_df_FALSE, return_first_row_if_no_start_df_FALSE, return_last_row_if_no_end_df_FALSE ), error_in_row_slice, iserror(index(sliced_rows,1,1)), sliced_rows_and_columns, if( error_in_row_slice, error.type(3), sliceCols( sliced_rows, start_at_col, end_at_col, no_of_next_cols_to_include, cols_to_stop_from_end_of_array, start_at_cols_before_end_of_slice, start_at_cols_before_end_of_array, accept_partial_df_FALSE, return_first_col_if_no_start_df_FALSE, return_last_col_if_no_end_df_FALSE ) ), sliced_rows_and_columns ) ); sliceRows = LAMBDA( target_array, [start_at_row], [end_at_row], [no_of_next_rows_to_include], [rows_to_stop_from_end_of_array], [start_at_rows_before_end_of_slice], [start_at_rows_before_end_of_array], [accept_partial_df_FALSE], [return_first_if_no_start_df_FALSE], [return_last_if_no_end_df_FALSE], LET( row_bounds, calcSliceStartAndEnd( columns(target_array), start_at_row, end_at_row, no_of_next_rows_to_include, rows_to_stop_from_end_of_array, start_at_rows_before_end_of_slice, start_at_rows_before_end_of_array, return_first_if_no_start_df_FALSE, return_last_if_no_end_df_FALSE, accept_partial_df_FALSE ), start_row, index(row_bounds,1,1), end_row, index(row_bounds,1,2), return_trimmed_array, columns(row_bounds)=3 , trimmed_to_starting_row, if( return_trimmed_array, DROP( target_array, start_row- 1 ), FALSE ), trimmed_to_ending_row, if( return_trimmed_array, take( trimmed_to_starting_row, end_row - start_row + 1 ), FALSE ), result, if( return_trimmed_array, trimmed_to_ending_row, ERROR.TYPE(3) ), result ) ); sliceCols = LAMBDA( target_array, [start_at_col], [end_at_col], [no_of_next_cols_to_include], [cols_to_stop_from_end_of_array], [start_at_cols_before_end_of_slice], [start_at_cols_before_end_of_array], [accept_partial_df_FALSE], [return_first_col_if_no_start_df_FALSE], [return_last_col_if_no_end_df_FALSE], LET( col_bounds, calcSliceStartAndEnd( columns(target_array), start_at_col, end_at_col, no_of_next_cols_to_include, cols_to_stop_from_end_of_array, start_at_cols_before_end_of_slice, start_at_cols_before_end_of_array, return_first_col_if_no_start_df_FALSE, return_last_col_if_no_end_df_FALSE, accept_partial_df_FALSE ), start_col, index(col_bounds,1,1), end_col, index(col_bounds,1,2), return_trimmed_array, columns(col_bounds)=3 , trimmed_to_starting_col, if( return_trimmed_array, DROP( target_array, , start_col- 1 ), FALSE ), trimmed_to_ending_col, if( return_trimmed_array, take( trimmed_to_starting_col, , end_col - start_col + 1 ), FALSE ), result, if( return_trimmed_array, trimmed_to_ending_col, ERROR.TYPE(3) ), result ) ); calcSliceStartAndEnd = LAMBDA( main_dimension_size, [start_at_idx], [end_at_idx], [end_units_from_start_of_slice], [end_units_from_end_of_dimension], [start_units_from_end_of_slice], [start_units_from_end_of_dimension], [return_first_if_no_from_df_FALSE], [return_last_if_no_through_df_FALSE], [accept_partial_df_FALSE], let( // Parameter usage indicators start_idx_provided, NOT(ISOMITTED(start_at_idx)), calc_start_idx_from_end_of_slice,not(isomitted(start_units_from_end_of_slice)), calc_start_idx_from_end_of_dimension,not(isomitted(start_units_from_end_of_dimension)), start_from_first_unit_as_default,if(isomitted(return_first_if_no_from_df_FALSE),FALSE,return_first_if_no_from_df_FALSE), end_idx_provided,not(isomitted(end_at_idx)), calc_end_idx_from_start_of_slice,not(isomitted(end_units_from_start_of_slice)), calc_end_idx_from_end_of_dimension,not(isomitted(end_units_from_end_of_dimension)), end_at_final_unit_as_default,if(isomitted(return_last_if_no_through_df_FALSE),FALSE,return_last_if_no_through_df_FALSE), at_least_one_relative_position, calc_start_idx_from_end_of_slice * calc_end_idx_from_start_of_slice, accept_partial, if(isomitted(accept_partial_df_FALSE),FALSE,accept_partial_df_FALSE), // Valid parameter checking valid_start_parameter_provided, ( start_idx_provided+ calc_start_idx_from_end_of_slice+ calc_start_idx_from_end_of_dimension ) = 1 , valid_end_parameter_provided, ( end_idx_provided + calc_end_idx_from_start_of_slice + calc_end_idx_from_end_of_dimension ) = 1 , valid_relative_start, calc_start_idx_from_end_of_slice * (end_idx_provided+calc_end_idx_from_end_of_dimension) , valid_relative_end, calc_end_idx_from_start_of_slice * (start_idx_provided+calc_start_idx_from_end_of_dimension) , valid_relative_parameter, if( at_least_one_relative_position, (valid_relative_start+valid_relative_end) = 1, TRUE ), valid_parameters, (valid_start_parameter_provided+start_from_first_unit_as_default) * (valid_end_parameter_provided+end_at_final_unit_as_default) * valid_relative_parameter , result, if( not(valid_parameters), error.type(3), let( abs_start, ifs( valid_relative_start, FALSE, start_idx_provided, start_at_idx, calc_start_idx_from_end_of_dimension, main_dimension_size-start_units_from_end_of_dimension, 1, error.type(3) ), abs_end, ifs( valid_relative_end, FALSE, end_idx_provided, end_at_idx, end_units_from_end_of_dimension, main_dimension_size-end_units_from_end_of_dimension ), start_idx, if( valid_relative_start, abs_end-start_units_from_end_of_slice-1, abs_start ), end_idx, if( valid_relative_end, abs_start+end_units_from_start_of_slice-1, abs_end ), is_valid_slice, and( end_idx>=start_idx, start_idx>0, ((end_idx<=main_dimension_size)+accept_partial) ), if( is_valid_slice, hstack(start_idx, end_idx,is_valid_slice), error.type(3) ) ) ), result ) );370Views1like2CommentsRe: Excel Labs Array Module - What are your thoughts?
SergeiBaklan Thanks for taking a look. These functions are designed for working with dynamic arrays, so the array traversal logic is embedded (usually) in each of them. My replaceCell and replaceBlankCells, for instance, can replace any cell in an array (dynamic or otherwise) without having to separately call an array traversal function. Yes, it involves constructing a new array, but as far as I know, each "discrete" calculation creates its own array anyway (e.g. original_array, extra_row(vstack(original_array,"")), extra_row2(vstack(extra_row,"")) creates two new arrays while original_array, vstack(original_array, extra_row, extra_row2) creates only one new array). The one function that I think is the biggest contributor going forward is the flatten function. It is presumably going to be the basis of all of my set theory module logic.322Views0likes0Comments
Recent Blog Articles
No content to show