Nov 10 2020 01:32 AM - edited Nov 10 2020 01:54 AM
Hi
I'm trying to find the fastest way to search a large array (40K entries) for a value. However, I'm struggling with the way in which the array is working.
I read in all AD users as shown below
$AllADUsers = Get-ADUser -Filter "*" -properties SAMAccountName, DisplayName, UserPrincipalName, Company, Office,Department, Manager, Description, Created, LastLogonDate, EmployeeType,Info
-Server $ADServer -Credential $c
|select SAMAccountName, DisplayName, UserPrincipalName, Company, Office, Department, Manager,
Description, Created, LastLogonDate, EmployeeType,Info
Now this is why I have a query
If I search the array using the .Contains method it finds the entry in around 2ms
$AllADUsers.Contains($SearchID)
If I then try to pull the data for the record using the "where" method it takes 1000ms
$AllADUsers.Where({$_.UserPrincipalName -eq "$SearchID"})
So why can it find the value in 2ms using "contains" method but takes more than 1000ms to read the actual record using the "where" method. Its as if it's using a different search algorithm in the "where" method.
In fact it was faster to use Get-AdUser for each search as this only takes 900ms
I've also tried other variations to no avail, such as:
$AllADUsers | where-object {$_.UserPrincipalName -eq $SearchID}
Any help gratefully received.
Thanks
blairkei
Nov 10 2020 03:52 AM
I would create a hash table where UserPrincipalName is the key and the user properties are the value.
$AllADUsers = Get-ADUser -Filter "*" -properties SAMAccountName, DisplayName, UserPrincipalName, Company, Office,Department, Manager, Description, Created, LastLogonDate, EmployeeType,Info -Server $ADServer -Credential $c
$AllADUsersHash = [ordered]@{}
$AllADUsers | ForEach-Object { $AllADUsersHash.add($_.UserPrincipalName,$_)}
$AllADUsersHash["Joe@microsoft.com"]
# or
$AllADUsersHash.Item("Joe@microsoft.com")
Nov 10 2020 04:59 AM
Solution@Joe_Cauffiel Thanks for the alternative solution. My next step was to switch over to using a HASH array but it still does not explain why .contains is taking 2ms and .where is taking 1000ms when they are both using the same search criteria and scanning through the array. I'm wondering if I am missing an option on the ".where" somehow.
Nov 10 2020 06:55 AM
@blairkei After testing using the HASH array the performance of a search takes 2000ms. So this method is even slower than either reading from the ARRAY or directly reading the record from AD.
Nov 10 2020 10:06 PM
The Contain method will return true or false, but to use it in your case, I guess you will need to add
$AllADUsers.UserPrincipalName.Contains($SearchID)
I guess this is related to how many items in the object, so in your case, each object of the array contain multiple items. not a
Key= Value
if you check the Where{} statement, you are going to each item properties, and this is why you get the result.
The $AllADUsers.Contains("*user1@necad.ae*") is not equal to $AllADUsers.Where({$_.UserPrincipalName -eq "$SearchID"})
Nov 11 2020 01:04 AM
@farismalaeb Hi
The contain statement only takes 2ms
The where statement takes 1000ms
The search string is the same in both cases. So changing the contains statement won't address my issue with the where method.
Nov 11 2020 08:05 AM
I had a small mistake in my code and the HASH solution works great. Thanks for your help
Nov 16 2020 09:00 AM
Joe's hash idea looks very useful. I've not yet wrapped my head around Hash tables to be honest but as to why the contains method is so much quicker than the Where method is that the latter has to read each object in the array and read the Userprincipal attribute until it finds an object that where the UPN matches your search string whilst the first method is doing a straight read until it finds that string and then retrieving that object. Probably not the most technically accurate explanation but that's my understanding of it..
Nov 17 2020 03:53 AM
I suspect the ‘contains’ method returns a faster response because it only needs to iterate through the collection until the first match, while the “where” method needs to iterate through the whole collection before it completes. Have you measure the completion times of ‘Contains’ Method where the search item is at the beginning, middle and end of the collection? Is the completion time using the 'contains' method on the last item of collection closer to the time of the 'where' method?
This link has a good primer on Hash Tables.
Nov 17 2020 11:09 PM
@Joe_Cauffiel That is exactly what I was trying to say you just articulated way better than I did..
Thanks for the hash table primer. I will take a look.