SOLVED

Amazing discovery about Guid's and how to use them to evenly distribute automation over time

Steel Contributor

I recently re-discovered something, only this time was equipped to take advantage.  I place this post here in the Exchange forum since this is my area of most interest, especially relating to using this discovery.

 

Guid's (ObjectGuid, ObjectId, ExchangeGuid, ArchiveGuid, ExternalDirectoryObjectId, etc.) are system-generated and are made up of hexadecimal characters.  The amazing discovery is that the degree to which these things are alphabetically balanced is astonishingly perfect.  That is, if you take a random sampling of say 16,000 Guid's, you will most likely find that 1000 of them start with '0', 1000 start with '1', 1000 start with '2', ...through until 1000 start with '9', then 1000 also start with 'a', 1000 start with 'b', and so on up to 1000 start with 'f'.

 

It keeps getting better though!  Every next character is again evenly distributed.  So for example, of all the Guid's starting with '0', roughly 1/16th of them will have '0' for the 2nd character, 1/16th will have '1' for the 2nd characters, and so on up to 1/16th will have 'f' for their 2nd character.

 

This continues but admittedly becomes less perfect the deeper we go.  Also, this amazing phenomenon works best when all the Guid's are coming from the same place, such as the list of Azure AD accounts in your tenant, or the list of ExchangeGuid's for your EXO mailboxes.

 

Where this amazing'ness comes into play that completes my reason for this post?  It is for large organizations who need to perform reporting and/or automation against their high volume of users or mailboxes (or other entities that are plentiful and have Guid's).  Knowing how perfectly distributed the Guid's are alphabetically, we can program around the clock and always be dealing with a consistent level of load.

 

Think back to EXO PowerShell pre-V2 when everyone's scripts would fail if their organization had more than a few hundred mailboxes.  Well in this example, we can spread the 16 hexadecimal characters over the days in the week and each day deal with only 1/16th, or 2/16th's, etc.  For example:

 

function getTodaysMailboxes {
    param (
        [Object[]]$Mailboxes
    )
    try {
        $_guidDay = @{
            'Monday'    = '0', '1', '2', '3', '4'
            'Tuesday'   = '5', '6', '7', '8', '9'
            'Wednesday' = 'a', 'b', 'c', 'd', 'e'
            'Thursday'  = 'f', '0', '1', '2', '3'
            'Friday'    = '4', '5', '6', '7'
            'Saturday'  = '8', '9', 'a', 'b'
            'Sunday'    = 'c', 'd', 'e', 'f'
        }
        $_todaysGuids = $_guidDay["$([datetime]::Today.DayOfWeek)"]
        $Script:TodaysMailboxes = @($Mailboxes | Where-Object { $_todaysGuids -contains $_.Guid.ToString().SubString(0,1) })
        writeLog @writeLogParams -Message "Identified $($Script:TodaysMailboxes.Count) mailboxes to process today ($([datetime]::Today.DayOfWeek)'s = (EXO) Guid's starting with: $($_todaysGuids -join ', '))."
    }
    catch { throw }
}

 

And that example is just for daily script execution.  You could instead pick a more frequent schedule, say hourly, with the goal to cover every mailbox in 1 week.  There are 168 hours in the week, and there are 256 combinations of the first 2 characters in the Guid's.  So, if you spread those 256 combos over those 168 hours, all 168 hours would process 1 combo's worth of Guid's, and then 88 of those hours could process 1 extra combo, covering all 256 combos.  It would end up being a tiny amount of work every hour, which should lead to less long-running scripts, less data being pulled in short amounts of time, and less failures all around.

 

I personally for now am making use of the code sample above to run scripts nightly which cover off 4-5/16th's all all my org's mailboxes.  Every 1 weeks, all of my mailboxes have been processed twice.  This could be for pulling mailbox statistics, or mailbox permissions, AAD sign-ins, all kinds of stuff.

 

To summarize this is one cool and easy way to evenly distribute your scripting work over a set amount of time.  I don't think you'll find another property on objects that is so well distributed as Guid's.  One more reason to love the Guid.

2 Replies
best response confirmed by JeremyTBradshaw (Steel Contributor)
Solution

Just wanted to share an extra finding that really makes the above solution shine - it is that in Exchange Online, ExternalDirectoryObjectId is stored as a string! This means we can easily do this:

Get-Mailbox -Filter "ExternalDirectoryObjectId -like 'a*'

and this will give us back pretty much exactly 1/16th of all of our EXO mailboxes.  The same cannot be accomplished with any of the other GUID-based properties, which are stored as GUID's.  For example, this doesn't work properly:

Get-Mailbox -Filter "Guid -like 'a*'"

If anyone happens to know how to filter with -like and against actual GUID properties, I'd love to see it. But for now and thanks to how they've decided to store ExternalDirectoryObjectId, we have a perfect solution for getting back some predictable set of mailboxes which represent a very predictably sized subset of all mailboxes.  So we can easily spread work across multiple days and know that we'll cover all mailboxes in whatever number of days we choose to spread the hexadecimal characters across. 

@JeremyTBradshaw @The_Exchange_Team would it be possible to stage a new property on mailboxes in EXO that is a string like ExternalDirectoryObjectId, but is just a copy of the Guid (ObjectGuid in underlying AD)?

 

I would like to include inactive mailboxes on my EXO mailbox reporting, which I am scheduling to happen in lightweight fashion around the clock using this Guid distribution system.  New Gist about it - "The Power of the Guid for Even Distribution of Large Sets" .  The problem with Inactive Mailboxes is that they don't have ExternalDirectoryObjectId's so this solution doesn't work for them currently.

 

If the Name property were consistently set like it is on some objects, as a string Guid, that would be useable, but it's not reliable since some accounts don't have their Name updated that way.

 

I've thought about using NetID, as it looks unique, always present (I think) and likely well-distributed like Guid's (not sure about this), however I saw that it is intended for internal use only and a few issues online about it (example).  Would just love if in EXO PowerShell, the "Guid" property (i.e., ObjectGuid in EXO's AD), were a string, or if we could have a "GuidString"  property that matches Guid, but is a string.

1 best response

Accepted Solutions
best response confirmed by JeremyTBradshaw (Steel Contributor)
Solution

Just wanted to share an extra finding that really makes the above solution shine - it is that in Exchange Online, ExternalDirectoryObjectId is stored as a string! This means we can easily do this:

Get-Mailbox -Filter "ExternalDirectoryObjectId -like 'a*'

and this will give us back pretty much exactly 1/16th of all of our EXO mailboxes.  The same cannot be accomplished with any of the other GUID-based properties, which are stored as GUID's.  For example, this doesn't work properly:

Get-Mailbox -Filter "Guid -like 'a*'"

If anyone happens to know how to filter with -like and against actual GUID properties, I'd love to see it. But for now and thanks to how they've decided to store ExternalDirectoryObjectId, we have a perfect solution for getting back some predictable set of mailboxes which represent a very predictably sized subset of all mailboxes.  So we can easily spread work across multiple days and know that we'll cover all mailboxes in whatever number of days we choose to spread the hexadecimal characters across. 

View solution in original post