Pipelines
1 TopicBeginners performance tip: Use pipelines.
Hi folks, In my role, I see a lot of poorly-written PowerShell code spanning a lot of different contexts. Without fail, the most common failing I see is that the script simply won't scale, meaning performance will decrease significantly when it's run against larger data sets. And within this context, one of the biggest reasons is the overuse of variables and the underutilisation of the PowerShell pipeline. If you're the investigative type, here's some official documentation and training on the pipeline: about_Pipelines - PowerShell | Microsoft Learn Understand the Windows PowerShell pipeline - Training | Microsoft Learn A short explanation is that piping occurs when the output from one command is automatically sent to and used by another command. As an example, let's say I want my first command to fetch all the files in my temporary directory (just the root in this case). I might run a command like the following: Get-ChildItem -Path $env:TEMP -File Which, as you'd expect, produces a list of files. Where PowerShell differs from the old command prompt (or DOS prompt, if you're old like me) is that it's not simply a bunch of text written to the screen. Instead, each of these files is an object. If you don't know what an object is, think of it as a school lunchbox for the time being, where that lunchbox contains a bunch of useful stuff (just data; nothing tasty, sadly) inside. Because this isn't just a bunch of useless text, we can take the individual lunchboxes (objects) produced from this first command and send those to another command. As the second command sees each lunchbox, it can choose to do something with it - or even just ignore it; the possibilities are endless! When the lunchboxes coming out of the first command travel to the second command, the pathway they travel along is the pipeline! It's what joins the two commands together. Continuing the example above, I now want to remove those lunchboxes - I mean files - from my temporary directory, which I'll do by piping the lunchboxes from the first command into a second command that will perform the deleting. Get-ChildItem -Path $env:TEMP -File | Remove-Item -Confirm:$false -ErrorAction:SilentlyContinue Now, there's no output to show for this command but it does pretty much what you'd expect: Deletes the files. Now, there's another way we could have achieved the same thing using variables and loops, which I'll demonstrate first before circling back to how this relates to performance and scalability. # Get all the files first and assign them to a variable. $MyTempFiles = Get-ChildItem -Path $env:TEMP -File; # Now we have a single variable holding all files, send all those files (objects) to the second command to delete them. $MyTempFiles | Remove-Item -Confirm:$false -ErrorAction:SilentlyContinue; This isn't the only way you you can go about it, and you can see I'm still using piping in the second command - but none of this is important. The important point - and this brings us back to the topic of performance - is that I've assigned all of the files to a variable instead of simply passing them over the pipeline. This means that all of these files consume memory and continue to do so until I get rid of the variable (named $MyTempFiles). Now, imagine that instead of dealing with a few hundred files in my temp directory, I'm dealing with 400,000 user objects from an Azure Active Directory tenant and I'll retrieving all attributes. The difference in memory usage is incomparable. And when Windows starts feeling memory pressure, this impacts disk caching performance and before you know it, the Event Log system starts throwing performance events everywhere. It's not a good outcome. So, the more objects you have, the more your performance decreases in a linear manner. Pipeline to the rescue! This conversation is deliberately simple and doesn't go into the internal mechanics of how many commands you might like using actually work, but only relevant part I want to focus on is something called paging. Let's say you use Get-MgBetaUser to pull down those 400,000 users from Azure Active Directory. Get-MgBetaUser (Microsoft.Graph.Beta.Users) | Microsoft Learn Internally, the command won't be pulling them down all at once. Instead, it will pull down a bite-sized chunk (i.e. a page, as evidenced by the ability to specify a value for the PageSize parameter that features in the above documentation) and push that out onto the pipeline. And if you are piping from Get-MgBetaUser to a second command, then that second command can read that set of users from the pipeline to do what it needs to do. And so on through any other commands until eventually there are no more commands left. At this stage - and this is where the memory efficiency comes in - that batch of users can be released from memory as they are no longer needed by anything. In pictures, and using a page size of 1,000, this looks like: Now, as anyone familiar with .NET can attest to, memory isn't actually released immediately by default. The .NET engine manages memory resourcing and monitoring internally but the key takeaway is by using the pipeline, we're allowing the early release of memory to occur. Conversely, when we store everything in a variable, we're preventing the .NET memory manager from releasing memory early. This, in turn, leads to the above-mentioned performance issues. In pictures, this looks like: Is there real benefit? Yes, absolutely. And it can be quite significant, too. In one case, I triaged a script that was causing system failure (Windows PowerShell has/had a default process limit of 2 GB) through storing Azure results in a variable. After some minor tweaks so that it used the pipeline, process memory fluctuated between 250 MB to 400 MB. Working with pipelines typically doesn't require any extra effort - and in some cases can actually condense your code. However, the performance and scalability benefits can potentially be quite significant - particularly on already-busy systems. Cheers, Lain174Views2likes0Comments