Forum Discussion
Chris_Mercer
Jul 26, 2024Copper Contributor
Excel - Sorting hierarchical numbers issue
Hi - I'm looking for some help with something I thought would be quick and easy but is proving not to be the case! I'm trying to sort a data set of hierarchical numbers but am having issues with ...
SergeiBaklan
Aug 01, 2024MVP
It doesn't sort correctly "8" (numbers without dots).
= LET(
level1, --IFNA(TEXTBEFORE(sectionNumbers, "."), sectionNumbers),
level2, --IFNA(TEXTAFTER(sectionNumbers, "."),0),
SORTBY(contents, level1,,level2,)
)
Or, with 365 Beta
=SORTBY(
contents,
--REGEXEXTRACT(sectionNumbers, "^\d*", 1),,
--IFNA(REGEXEXTRACT(sectionNumbers, "(?<=\.)\d*", 1),0),
)
PeterBartholomew1
Aug 01, 2024Silver Contributor
Well spotted as far as the single number is concerned.
The regex is nice though I lack fluency when it comes to group constructs such as the positive lookbehind
(?<=\.)\d*
I guess bridging the gap between natural language and the cryptic regex notation is something to which AI is well suited.
- SergeiBaklanAug 01, 2024MVP
I don't think AI could help if only
- the case is quite simple, and/or
- you have good enough knowledge of RegEx
First, AI answers are not reliable. It could easy recommend Word or Google Sheet settings to apply in Excel, etc.
Second, RegEx behaviour depends on flavour. What works in Perl or JavaScript or something else could not work in Excel.
- m_tarlerAug 01, 2024Bronze Contributor
funny enough, just a few days ago I just wrote a Lambda function for this. It is more generic and flexible. It does NOT currently account for alternative hierachy indicators (only uses ".") but does account for multiple levels (e.g. 12.5.6) and lets you set the "pad" character (e.g. 12.005.006 or 12.__5.__6 or 12. 5. 6) and the output uses the pad character so the hierarchy info (where the "." is) is maintained:
/**convert typical outline numbering (1.1, 1.2, … 1.9, 1.10,…,1.99, 1.100) to proper sortable numbering (1.001, 1.002,… 1.009,1.01, … 1.099, 1.1) */ OutlineNumbers = LAMBDA(in, [padCharacter], LET( pad, if(ISOMITTED(padCharacter),"_",padCharacter), levels, MAX(LEN(in)-LEN(SUBSTITUTE(in,".",""))), prePad0, LAMBDA(list, LET(l,MAX(LEN(list)), RIGHT(REPT(pad,l)&list,l))), REDUCE(IFERROR(TEXTBEFORE(in,".",1),in),SEQUENCE(levels),LAMBDA(p,q, p&"."&prePad0(IFERROR(TEXTAFTER(IFERROR(TEXTBEFORE(in,".",q+1),in),".",q),pad)))) ) );
OK now I have question for all of you. My original default was to use "-" but in a sample test case this is how Excel sorts these samples with "-" mixed in:
everything makes sense until the last 2 where the 1.--9 and 1.-ab is AFTER 1.100 while I would expect sorting to be as shown in the 2nd,3rd, and 4th columns where "_", "0", and " " were used.
EDIT ... NVM I found my answer. Excel apparently completely ignores "-" during a sort operation and therefor sorted it as if it isn't there. Notice in the next image how the SORT for both with and without "-" come out the same:
- SergeiBaklanAug 01, 2024MVP
That's not only during the sorting, dash is ignored when two values are compared, i.e. ="10" < "-9" or
="ab" < "-z" both return TRUE.
However, ="99" = "-99" returns FALSE but ="99" = CHAR(173) & "99" returns TRUE.
In general there are 2808 unichars which work the same way as soften hyphen, we may list them as
=LET( codesRaw, UNICHAR( SEQUENCE(999999,,1 ) ), codes, FILTER( codesRaw, NOT( ISNA(codesRaw) ) ), test, codes & "abc" = "abc", chars, FILTER( codes, test ), result, HSTACK( chars, UNICODE(chars) ), result )