Forum Discussion
Excel - Sorting hierarchical numbers issue
I don't think AI could help if only
- the case is quite simple, and/or
- you have good enough knowledge of RegEx
First, AI answers are not reliable. It could easy recommend Word or Google Sheet settings to apply in Excel, etc.
Second, RegEx behaviour depends on flavour. What works in Perl or JavaScript or something else could not work in Excel.
funny enough, just a few days ago I just wrote a Lambda function for this. It is more generic and flexible. It does NOT currently account for alternative hierachy indicators (only uses ".") but does account for multiple levels (e.g. 12.5.6) and lets you set the "pad" character (e.g. 12.005.006 or 12.__5.__6 or 12. 5. 6) and the output uses the pad character so the hierarchy info (where the "." is) is maintained:
/**convert typical outline numbering (1.1, 1.2, … 1.9, 1.10,…,1.99, 1.100) to proper sortable numbering (1.001, 1.002,… 1.009,1.01, … 1.099, 1.1)
*/
OutlineNumbers = LAMBDA(in, [padCharacter], LET(
pad, if(ISOMITTED(padCharacter),"_",padCharacter),
levels, MAX(LEN(in)-LEN(SUBSTITUTE(in,".",""))),
prePad0, LAMBDA(list, LET(l,MAX(LEN(list)), RIGHT(REPT(pad,l)&list,l))),
REDUCE(IFERROR(TEXTBEFORE(in,".",1),in),SEQUENCE(levels),LAMBDA(p,q,
p&"."&prePad0(IFERROR(TEXTAFTER(IFERROR(TEXTBEFORE(in,".",q+1),in),".",q),pad))))
)
);
OK now I have question for all of you. My original default was to use "-" but in a sample test case this is how Excel sorts these samples with "-" mixed in:
everything makes sense until the last 2 where the 1.--9 and 1.-ab is AFTER 1.100 while I would expect sorting to be as shown in the 2nd,3rd, and 4th columns where "_", "0", and " " were used.
EDIT ... NVM I found my answer. Excel apparently completely ignores "-" during a sort operation and therefor sorted it as if it isn't there. Notice in the next image how the SORT for both with and without "-" come out the same:
- SergeiBaklanAug 01, 2024Diamond Contributor
That's not only during the sorting, dash is ignored when two values are compared, i.e. ="10" < "-9" or
="ab" < "-z" both return TRUE.
However, ="99" = "-99" returns FALSE but ="99" = CHAR(173) & "99" returns TRUE.
In general there are 2808 unichars which work the same way as soften hyphen, we may list them as
=LET( codesRaw, UNICHAR( SEQUENCE(999999,,1 ) ), codes, FILTER( codesRaw, NOT( ISNA(codesRaw) ) ), test, codes & "abc" = "abc", chars, FILTER( codes, test ), result, HSTACK( chars, UNICODE(chars) ), result )
- m_tarlerAug 01, 2024Bronze Contributor
SergeiBaklan The story is actually more complicated. You found 2808 unichars that appear to be completely ignored by comparators but the character "-" is only PARTIALLY ignored. Notice in the following sequence the values with _-173 use CHAR(173) while the _-4 values use a mix of "-" and CHAR(45) and additional numbers and "-" and variations. Notice how all the _-4x are sorted based on the number first and then all of the "-" give slight more value and bump those down (pay particular attention to 4, -4, 4-, -4- variations) but all the _-173 variations are in the same order as were found in the original list:
- SergeiBaklanAug 02, 2024Diamond Contributor
Yes, things are more complex and the logic behind is not clear, at least for me.
"10" is greater than "$10" as it shall be, at the same time against text comparing logic
"10" is greater than "€10"
"abc" is greater than "£abc"
UNICHAR(79) is greater than UNICHAR(8679)
etc...