Why a PowerShell workflow is much slower than a non-workflow script to parse XML files

I am writing a PowerShell program to analyze the contents of 1900+ large XML configuration files (50,000+ lines, 1.5 MB). Just for the test, I transfer 36 test files to my computer (Win 10, PS 5.1, 32 GB of RAM) and quickly write a script to check the speed of execution.

$TestDir = "E:\Powershell\Test" $TestXMLs = Get-ChildItem $TestDir -Recurse -Include *.xml foreach ($TestXML in $TestXMLs) { [xml]$XML = Get-Content $TestXML (($XML.root.servers.server).Where{$_.name -eq "Server1"}).serverid } 

This is completed within 36-40 seconds. I conducted several tests using the measurement command.

Then I tried workflow with foreach -paralell, assuming that loading multiple files in parallel would give me a faster process.

 Workflow Test-WF { $TestDir = "E:\Powershell\Test" $TestXMLs = Get-ChildItem $TestDir -Recurse -Include *.xml foreach -parallel -throttle 10 ($TestXML in $TestXMLs) { [xml]$XML = Get-Content $TestXML (($TestXML.root.servers.server).Where{$_.name -eq "Sevrver1"}).serverid } } Test-WF #execute workflow 

Script with a working volume of 118 to 132 seconds.

Now I'm just wondering what could be causing the workflow to be much slower? Can recompilation in XMAL be a slower algorithm for loading XML files into WWF?

+5
source share
1 answer

foreach -parallel is by far the slowest parallelization option you have with PowerShell, since Workflows are not designed for speed, but for long-running operations that can be safely interrupted and resumed.

Implementing these security mechanisms leads to some overhead, so your script runs slower when working as a workflow.

If you want to optimize execution speed, use spaces instead:

 $TestDir = "E:\Powershell\Test" $TestXMLs = Get-ChildItem $TestDir -Recurse -Include *.xml # Set up runspace pool $RunspacePool = [runspacefactory]::CreateRunspacePool(1,10) $RunspacePool.Open() # Assign new jobs/runspaces to a variable $Runspaces = foreach ($TestXML in $TestXMLs) { # Create new PowerShell instance to hold the code to execute, add arguments $PSInstance = [powershell]::Create().AddScript({ param($XMLPath) [xml]$XML = Get-Content $XMLPath (($XML.root.servers.server).Where{$_.name -eq "Server1"}).serverid }).AddParameter('XMLPath', $TestXML.FullName) # Assing PowerShell instance to RunspacePool $PSInstance.RunspacePool = $RunspacePool # Start executing asynchronously, keep instance + IAsyncResult objects New-Object psobject -Property @{ Instance = $PSInstance IAResult = $PSInstance.BeginInvoke() Argument = $TestXML } } # Wait for the the runspace jobs to complete while($Runspaces |Where-Object{-not $_.IAResult.IsCompleted}) { Start-Sleep -Milliseconds 500 } # Collect the results $Results = $Runspaces |ForEach-Object { $Output = $_.Instance.EndInvoke($_.IAResult) New-Object psobject -Property @{ File = $TestXML ServerID = $Output } } 

Quick XML Processing Bonus Tips:

As wOxxOm suggests , using Xml.Load() is faster than using Get-Content to read in an XML document.

In addition, using dot notation ( $xml.root.servers.server ) and the Where({}) extension Where({}) method will also be painfully slow if there are many servers or server nodes. Use the SelectNodes() method with an XPath expression to search for "Server1" instead (remember that XPath is case sensitive):

 $PSInstance = [powershell]::Create().AddScript({ param($XMLPath) $XML = New-Object Xml $XML.Load($XMLPath) $Server1Node = $XML.SelectNodes('/root/servers/server[@name = "Server1"]') return $Server1Node.serverid }).AddParameter('XMLPath', $TestXML.FullName) 
+11
source

Source: https://habr.com/ru/post/1263178/


All Articles