SOLVED

Amazing discovery about Guid's and how to use them to evenly distribute automation over time

Steel Contributor

I recently re-discovered something, only this time was equipped to take advantage.  I place this post here in the Exchange forum since this is my area of most interest, especially relating to using this discovery.

 

Guid's (ObjectGuid, ObjectId, ExchangeGuid, ArchiveGuid, ExternalDirectoryObjectId, etc.) are system-generated and are made up of hexadecimal characters.  The amazing discovery is that the degree to which these things are alphabetically balanced is astonishingly perfect.  That is, if you take a random sampling of say 16,000 Guid's, you will most likely find that 1000 of them start with '0', 1000 start with '1', 1000 start with '2', ...through until 1000 start with '9', then 1000 also start with 'a', 1000 start with 'b', and so on up to 1000 start with 'f'.

 

It keeps getting better though!  Every next character is again evenly distributed.  So for example, of all the Guid's starting with '0', roughly 1/16th of them will have '0' for the 2nd character, 1/16th will have '1' for the 2nd characters, and so on up to 1/16th will have 'f' for their 2nd character.

 

This continues but admittedly becomes less perfect the deeper we go.  Also, this amazing phenomenon works best when all the Guid's are coming from the same place, such as the list of Azure AD accounts in your tenant, or the list of ExchangeGuid's for your EXO mailboxes.

 

Where this amazing'ness comes into play that completes my reason for this post?  It is for large organizations who need to perform reporting and/or automation against their high volume of users or mailboxes (or other entities that are plentiful and have Guid's).  Knowing how perfectly distributed the Guid's are alphabetically, we can program around the clock and always be dealing with a consistent level of load.

 

Think back to EXO PowerShell pre-V2 when everyone's scripts would fail if their organization had more than a few hundred mailboxes.  Well in this example, we can spread the 16 hexadecimal characters over the days in the week and each day deal with only 1/16th, or 2/16th's, etc.  For example:

 

function getTodaysMailboxes {
    param (
        [Object[]]$Mailboxes
    )
    try {
        $_guidDay = @{
            'Monday'    = '0', '1', '2', '3', '4'
            'Tuesday'   = '5', '6', '7', '8', '9'
            'Wednesday' = 'a', 'b', 'c', 'd', 'e'
            'Thursday'  = 'f', '0', '1', '2', '3'
            'Friday'    = '4', '5', '6', '7'
            'Saturday'  = '8', '9', 'a', 'b'
            'Sunday'    = 'c', 'd', 'e', 'f'
        }
        $_todaysGuids = $_guidDay["$([datetime]::Today.DayOfWeek)"]
        $Script:TodaysMailboxes = @($Mailboxes | Where-Object { $_todaysGuids -contains $_.Guid.ToString().SubString(0,1) })
        writeLog @writeLogParams -Message "Identified $($Script:TodaysMailboxes.Count) mailboxes to process today ($([datetime]::Today.DayOfWeek)'s = (EXO) Guid's starting with: $($_todaysGuids -join ', '))."
    }
    catch { throw }
}

 

And that example is just for daily script execution.  You could instead pick a more frequent schedule, say hourly, with the goal to cover every mailbox in 1 week.  There are 168 hours in the week, and there are 256 combinations of the first 2 characters in the Guid's.  So, if you spread those 256 combos over those 168 hours, all 168 hours would process 1 combo's worth of Guid's, and then 88 of those hours could process 1 extra combo, covering all 256 combos.  It would end up being a tiny amount of work every hour, which should lead to less long-running scripts, less data being pulled in short amounts of time, and less failures all around.

 

I personally for now am making use of the code sample above to run scripts nightly which cover off 4-5/16th's all all my org's mailboxes.  Every 1 weeks, all of my mailboxes have been processed twice.  This could be for pulling mailbox statistics, or mailbox permissions, AAD sign-ins, all kinds of stuff.

 

To summarize this is one cool and easy way to evenly distribute your scripting work over a set amount of time.  I don't think you'll find another property on objects that is so well distributed as Guid's.  One more reason to love the Guid.

1 Reply
best response confirmed by Jeremy Bradshaw (Steel Contributor)
Solution

Just wanted to share an extra finding that really makes the above solution shine - it is that in Exchange Online, ExternalDirectoryObjectId is stored as a string! This means we can easily do this:

Get-Mailbox -Filter "ExternalDirectoryObjectId -like 'a*'

and this will give us back pretty much exactly 1/16th of all of our EXO mailboxes.  The same cannot be accomplished with any of the other GUID-based properties, which are stored as GUID's.  For example, this doesn't work properly:

Get-Mailbox -Filter "Guid -like 'a*'"

If anyone happens to know how to filter with -like and against actual GUID properties, I'd love to see it. But for now and thanks to how they've decided to store ExternalDirectoryObjectId, we have a perfect solution for getting back some predictable set of mailboxes which represent a very predictably sized subset of all mailboxes.  So we can easily spread work across multiple days and know that we'll cover all mailboxes in whatever number of days we choose to spread the hexadecimal characters across. 

1 best response

Accepted Solutions
best response confirmed by Jeremy Bradshaw (Steel Contributor)
Solution

Just wanted to share an extra finding that really makes the above solution shine - it is that in Exchange Online, ExternalDirectoryObjectId is stored as a string! This means we can easily do this:

Get-Mailbox -Filter "ExternalDirectoryObjectId -like 'a*'

and this will give us back pretty much exactly 1/16th of all of our EXO mailboxes.  The same cannot be accomplished with any of the other GUID-based properties, which are stored as GUID's.  For example, this doesn't work properly:

Get-Mailbox -Filter "Guid -like 'a*'"

If anyone happens to know how to filter with -like and against actual GUID properties, I'd love to see it. But for now and thanks to how they've decided to store ExternalDirectoryObjectId, we have a perfect solution for getting back some predictable set of mailboxes which represent a very predictably sized subset of all mailboxes.  So we can easily spread work across multiple days and know that we'll cover all mailboxes in whatever number of days we choose to spread the hexadecimal characters across. 

View solution in original post