Excel - Sorting hierarchical numbers issue

Copper Contributor

Hi - I'm looking for some help with something I thought would be quick and easy but is proving not to be the case!

 

I'm trying to sort a data set of hierarchical numbers but am having issues with the 8.1 & 8.10 values (etc.), whereby the sort is treating them both as 8.1 - 

 

Copying the data from another piece of software into a table it looks like this -

Chris_Mercer_0-1721991556971.png

 

After sorting (by smallest to largest) it treats both the same and returns -

Chris_Mercer_0-1721992363572.png

 

I looked on line and found a post suggesting using a VBA function which treats the string as text to sort - 

Chris_Mercer_1-1721992522314.png

 

Unfortunately this still returned the same value for the 8.1 & 8.10 items -

Chris_Mercer_2-1721992586464.png

 

Any help would be greatly appreciated.

 

Thanks

 

 

10 Replies

@Chris_Mercer 

Let's say your hierarchical numbers are in B2 and down.

Change the formula in C2 to

 

=LET(n, C2, p, FIND(".", n), IFERROR(1000*LEFT(n, p-1)+MID(n, p+1, 10), 1000*n))

 

Fill down.

@HansVogelaar 

 

Hi - thanks for replying, I appreciate it. I've been on vacation so just picking it up now.

 

I changed the data format to text in column B and used the formula in Column C as so and it worked-

Chris_Mercer_0-1722488857338.png

 

Many thanks for your help! 

@Chris_Mercer 

For a 365 user, LET variables provide a good alternative to helper ranges

= LET(
    level1, TEXTBEFORE(sectionNumbers, "."),
    level2, VALUE(TEXTAFTER(sectionNumbers, ".")),
    SORTBY(contents, level1,1,level2,1)
  )

  

@PeterBartholomew1 

It doesn't sort correctly "8" (numbers without dots).

= LET(
    level1, --IFNA(TEXTBEFORE(sectionNumbers, "."), sectionNumbers),
    level2, --IFNA(TEXTAFTER(sectionNumbers, "."),0),
    SORTBY(contents, level1,,level2,)
  )

Or, with 365 Beta

=SORTBY(
  contents,
  --REGEXEXTRACT(sectionNumbers, "^\d*", 1),,
  --IFNA(REGEXEXTRACT(sectionNumbers, "(?<=\.)\d*", 1),0),
)

@SergeiBaklan 

Well spotted as far as the single number is concerned.

The regex is nice though I lack fluency when it comes to group constructs such as the positive lookbehind

(?<=\.)\d*

I guess bridging the gap between natural language and the cryptic regex notation is something to which AI is well suited.

 

@PeterBartholomew1 

I don't think AI could help if only

- the case is quite simple, and/or

- you have good enough knowledge of RegEx

 

First, AI answers are not reliable. It could easy recommend Word or Google Sheet settings to apply in Excel, etc.

Second, RegEx behaviour depends on flavour. What works in Perl or JavaScript or something else could not work in Excel.

funny enough, just a few days ago I just wrote a Lambda function for this.  It is more generic and flexible.  It does NOT currently account for alternative hierachy indicators (only uses ".") but does account for multiple levels (e.g. 12.5.6) and lets you set the "pad" character (e.g. 12.005.006 or 12.__5.__6 or 12.  5.  6) and the output uses the pad character so the hierarchy info (where the "." is) is maintained:

 

 

 

/**convert typical outline numbering (1.1, 1.2, … 1.9, 1.10,…,1.99, 1.100) to proper sortable numbering (1.001, 1.002,… 1.009,1.01, … 1.099, 1.1)
*/
OutlineNumbers = LAMBDA(in, [padCharacter], LET(
    pad, if(ISOMITTED(padCharacter),"_",padCharacter),
    levels, MAX(LEN(in)-LEN(SUBSTITUTE(in,".",""))),
    prePad0, LAMBDA(list, LET(l,MAX(LEN(list)), RIGHT(REPT(pad,l)&list,l))),
    REDUCE(IFERROR(TEXTBEFORE(in,".",1),in),SEQUENCE(levels),LAMBDA(p,q,
        p&"."&prePad0(IFERROR(TEXTAFTER(IFERROR(TEXTBEFORE(in,".",q+1),in),".",q),pad))))
    )
);

 

 

 

 

OK now I have question for all of you.  My original default was to use "-" but in a sample test case this is how Excel sorts these samples with "-" mixed in:

m_tarler_5-1722530332209.png

 

everything makes sense until the last 2 where the 1.--9 and 1.-ab is AFTER 1.100 while I would expect sorting to be as shown in the 2nd,3rd, and 4th columns where "_", "0", and " " were used.

 

EDIT ...  NVM I found my answer.  Excel apparently completely ignores "-" during a sort operation and therefor sorted it as if it isn't there.  Notice in the next image how the SORT for both with and without "-" come out the same:

m_tarler_0-1722531331546.png

 

 

 

 

 

@m_tarler 

That's not only during the sorting, dash is ignored when two values are compared, i.e. ="10" < "-9" or 

="ab" < "-z" both return TRUE. 

However, ="99" = "-99" returns FALSE but ="99" = CHAR(173) & "99" returns TRUE.

In general there are 2808 unichars which work the same way as soften hyphen, we may list them as

=LET(
   codesRaw, UNICHAR( SEQUENCE(999999,,1 ) ),
   codes, FILTER( codesRaw, NOT( ISNA(codesRaw) ) ),
   test, codes & "abc" = "abc",
   chars, FILTER( codes, test ),
   result, HSTACK( chars, UNICODE(chars) ),
   result
)

@SergeiBaklan  The story is actually more complicated.  You found 2808 unichars that appear to be completely ignored by comparators but the character "-" is only PARTIALLY ignored.  Notice in the following sequence the values with _-173 use CHAR(173) while the _-4 values use a mix of "-" and CHAR(45) and additional numbers and "-" and variations.  Notice how all the _-4x are sorted based on the number first and then all of the "-" give slight more value and bump those down (pay particular attention to 4, -4, 4-, -4- variations) but all the _-173 variations are in the same order as were found in the original list:

m_tarler_2-1722550638173.png

 

 

 

 

@m_tarler 

Yes, things are more complex and the logic behind is not clear, at least for me.

"10" is greater than "$10" as it shall be, at the same time against text comparing logic

 

"10" is greater than "€10"

"abc" is greater than "£abc"

UNICHAR(79) is greater than UNICHAR(8679)

etc...