Excel - Sorting hierarchical numbers issue

Copper Contributor

Excel - Sorting hierarchical numbers issue

Hi - I'm looking for some help with something I thought would be quick and easy but is proving not to be the case!

I'm trying to sort a data set of hierarchical numbers but am having issues with the 8.1 & 8.10 values (etc.), whereby the sort is treating them both as 8.1 -

Copying the data from another piece of software into a table it looks like this -

After sorting (by smallest to largest) it treats both the same and returns -

I looked on line and found a post suggesting using a VBA function which treats the string as text to sort -

Unfortunately this still returned the same value for the 8.1 & 8.10 items -

Any help would be greatly appreciated.

Thanks

10 Replies

Re: Excel - Sorting hierarchical numbers issue

Let's say your hierarchical numbers are in B2 and down.

Change the formula in C2 to

=LET(n, C2, p, FIND(".", n), IFERROR(1000*LEFT(n, p-1)+MID(n, p+1, 10), 1000*n))

Fill down.

Re: Excel - Sorting hierarchical numbers issue

Hi - thanks for replying, I appreciate it. I've been on vacation so just picking it up now.

I changed the data format to text in column B and used the formula in Column C as so and it worked-

Many thanks for your help!

Re: Excel - Sorting hierarchical numbers issue

For a 365 user, LET variables provide a good alternative to helper ranges

``````= LET(
level1, TEXTBEFORE(sectionNumbers, "."),
level2, VALUE(TEXTAFTER(sectionNumbers, ".")),
SORTBY(contents, level1,1,level2,1)
)``````

Re: Excel - Sorting hierarchical numbers issue

It doesn't sort correctly "8" (numbers without dots).

``````= LET(
level1, --IFNA(TEXTBEFORE(sectionNumbers, "."), sectionNumbers),
level2, --IFNA(TEXTAFTER(sectionNumbers, "."),0),
SORTBY(contents, level1,,level2,)
)``````

Or, with 365 Beta

``````=SORTBY(
contents,
--REGEXEXTRACT(sectionNumbers, "^\d*", 1),,
--IFNA(REGEXEXTRACT(sectionNumbers, "(?<=\.)\d*", 1),0),
)``````

Re: Excel - Sorting hierarchical numbers issue

Well spotted as far as the single number is concerned.

The regex is nice though I lack fluency when it comes to group constructs such as the positive lookbehind

`(?<=\.)\d*`

I guess bridging the gap between natural language and the cryptic regex notation is something to which AI is well suited.

Re: Excel - Sorting hierarchical numbers issue

I don't think AI could help if only

- the case is quite simple, and/or

- you have good enough knowledge of RegEx

First, AI answers are not reliable. It could easy recommend Word or Google Sheet settings to apply in Excel, etc.

Second, RegEx behaviour depends on flavour. What works in Perl or JavaScript or something else could not work in Excel.

Re: Excel - Sorting hierarchical numbers issue

funny enough, just a few days ago I just wrote a Lambda function for this.  It is more generic and flexible.  It does NOT currently account for alternative hierachy indicators (only uses ".") but does account for multiple levels (e.g. 12.5.6) and lets you set the "pad" character (e.g. 12.005.006 or 12.__5.__6 or 12.  5.  6) and the output uses the pad character so the hierarchy info (where the "." is) is maintained:

``````/**convert typical outline numbering (1.1, 1.2, … 1.9, 1.10,…,1.99, 1.100) to proper sortable numbering (1.001, 1.002,… 1.009,1.01, … 1.099, 1.1)
*/
OutlineNumbers = LAMBDA(in, [padCharacter], LET(
pad, if(ISOMITTED(padCharacter),"_",padCharacter),
levels, MAX(LEN(in)-LEN(SUBSTITUTE(in,".",""))),
prePad0, LAMBDA(list, LET(l,MAX(LEN(list)), RIGHT(REPT(pad,l)&list,l))),
REDUCE(IFERROR(TEXTBEFORE(in,".",1),in),SEQUENCE(levels),LAMBDA(p,q,
p&"."&prePad0(IFERROR(TEXTAFTER(IFERROR(TEXTBEFORE(in,".",q+1),in),".",q),pad))))
)
);``````

OK now I have question for all of you.  My original default was to use "-" but in a sample test case this is how Excel sorts these samples with "-" mixed in:

everything makes sense until the last 2 where the 1.--9 and 1.-ab is AFTER 1.100 while I would expect sorting to be as shown in the 2nd,3rd, and 4th columns where "_", "0", and " " were used.

EDIT ...  NVM I found my answer.  Excel apparently completely ignores "-" during a sort operation and therefor sorted it as if it isn't there.  Notice in the next image how the SORT for both with and without "-" come out the same:

Re: Excel - Sorting hierarchical numbers issue

That's not only during the sorting, dash is ignored when two values are compared, i.e. ="10" < "-9" or

="ab" < "-z" both return TRUE.

However, ="99" = "-99" returns FALSE but ="99" = CHAR(173) & "99" returns TRUE.

In general there are 2808 unichars which work the same way as soften hyphen, we may list them as

``````=LET(
codesRaw, UNICHAR( SEQUENCE(999999,,1 ) ),
codes, FILTER( codesRaw, NOT( ISNA(codesRaw) ) ),
test, codes & "abc" = "abc",
chars, FILTER( codes, test ),
result, HSTACK( chars, UNICODE(chars) ),
result
)``````

Re: Excel - Sorting hierarchical numbers issue

@SergeiBaklan  The story is actually more complicated.  You found 2808 unichars that appear to be completely ignored by comparators but the character "-" is only PARTIALLY ignored.  Notice in the following sequence the values with _-173 use CHAR(173) while the _-4 values use a mix of "-" and CHAR(45) and additional numbers and "-" and variations.  Notice how all the _-4x are sorted based on the number first and then all of the "-" give slight more value and bump those down (pay particular attention to 4, -4, 4-, -4- variations) but all the _-173 variations are in the same order as were found in the original list:

Re: Excel - Sorting hierarchical numbers issue

Yes, things are more complex and the logic behind is not clear, at least for me.

"10" is greater than "\$10" as it shall be, at the same time against text comparing logic

"10" is greater than "€10"

"abc" is greater than "£abc"

UNICHAR(79) is greater than UNICHAR(8679)

etc...