SOLVED

PS array performance >40K entries

Copper Contributor

Hi

I'm trying to find the fastest way to search a large array (40K entries) for a value. However, I'm struggling with the way in which the array is working.

 

I read in all AD users as shown below

 

$AllADUsers = Get-ADUser -Filter "*" -properties SAMAccountName, DisplayName, UserPrincipalName, Company, Office,Department, Manager, Description, Created, LastLogonDate, EmployeeType,Info 

-Server $ADServer -Credential $c

|select SAMAccountName, DisplayName, UserPrincipalName, Company, Office, Department, Manager,

Description, Created, LastLogonDate, EmployeeType,Info

 

Now this is why I have a query

 

If I search the array using the .Contains method it finds the entry in around 2ms

 

$AllADUsers.Contains($SearchID)

 

If I then try to pull the data for the record using the "where" method it takes 1000ms

 

$AllADUsers.Where({$_.UserPrincipalName -eq "$SearchID"})

 

So why can it find the value in 2ms using "contains" method but takes more than 1000ms to read the actual record using the "where" method. Its as if it's using a different search algorithm in the "where" method.

 

In fact it was faster to use Get-AdUser for each search as this only takes 900ms

 

I've also tried other variations to no avail, such as:

 

$AllADUsers | where-object {$_.UserPrincipalName -eq $SearchID}

 

Any help gratefully received.

 

Thanks

blairkei

9 Replies

@blairkei 

I would create a hash table where UserPrincipalName is the key and the user properties are the value. 

$AllADUsers = Get-ADUser -Filter "*" -properties SAMAccountName, DisplayName, UserPrincipalName, Company, Office,Department, Manager, Description, Created, LastLogonDate, EmployeeType,Info -Server $ADServer -Credential $c

 

$AllADUsersHash = [ordered]@{}

 

$AllADUsers | ForEach-Object { $AllADUsersHash.add($_.UserPrincipalName,$_)}

 

$AllADUsersHash["Joe@microsoft.com"]

 

 

# or
$AllADUsersHash.Item("Joe@microsoft.com")

best response confirmed by blairkei (Copper Contributor)
Solution

@Joe_Cauffiel Thanks for the alternative solution. My next step was to switch over to using a HASH array but it still does not explain why .contains is taking 2ms and .where is taking 1000ms when they are both using the same search criteria and scanning through the array. I'm wondering if I am missing an option on the ".where" somehow.

 

@blairkei After testing using the HASH array the performance of a search takes 2000ms. So this method is even slower than either reading from the ARRAY or directly reading the record from AD.

@blairkei 

The Contain method will return true or false, but to use it in your case, I guess you will need to add

$AllADUsers.UserPrincipalName.Contains($SearchID)

I guess this is related to how many items in the object, so in your case, each object of the array contain multiple items. not a 

Key= Value

if you check the Where{} statement, you are going to each item properties, and this is why you get the result.

The $AllADUsers.Contains("*user1@necad.ae*") is not equal to $AllADUsers.Where({$_.UserPrincipalName -eq "$SearchID"})

 

 

@farismalaeb Hi

 

The contain statement only takes 2ms

The where statement takes 1000ms

 

The search string is the same in both cases. So changing the contains statement won't address my issue with the where method.

@Joe_Cauffiel Hi

 

I had a small mistake in my code and the HASH solution works great. Thanks for your help

@blairkei

 

Joe's hash idea looks very useful. I've not yet wrapped my head around Hash tables to be honest but as to why the contains method is so much quicker than the Where method is that the latter has to read each object in the array and read the Userprincipal attribute until it finds an object that where the UPN matches your search string whilst the first method is doing a straight read until it finds that string and then retrieving that object. Probably not the most technically accurate explanation but that's my understanding of it..     

@PeterJ_Inobits 

I suspect the ‘contains’ method returns a faster response because it only needs to iterate through the collection until the first match, while the “where” method needs to iterate through the whole collection before it completes. Have you measure the completion times of ‘Contains’ Method where the search item is at the beginning, middle and end of the collection?  Is the completion time using the 'contains' method on the last item of collection closer to the time of the 'where' method?


This link has a good primer on Hash Tables.

https://docs.microsoft.com/en-us/powershell/scripting/learn/deep-dives/everything-about-hashtable?vi...

@Joe_Cauffiel That is exactly what I was trying to say you just articulated way better than I did..

 

Thanks for the hash table primer. I will take a look.

 

 

1 best response

Accepted Solutions
best response confirmed by blairkei (Copper Contributor)
Solution

@Joe_Cauffiel Thanks for the alternative solution. My next step was to switch over to using a HASH array but it still does not explain why .contains is taking 2ms and .where is taking 1000ms when they are both using the same search criteria and scanning through the array. I'm wondering if I am missing an option on the ".where" somehow.

 

View solution in original post