Need help improving PowerShell script syntax markup performance

Question

Need help improving PowerShell script syntax markup performance

I need to analyze a large file with channel delimiters in order to count the number of records whose 5th column matches and does not meet my criteria.

PS C:\temp> gc .\items.txt -readcount 1000 | ` ? { $_ -notlike "HEAD" } | ` % { foreach ($s in $_) { $s.split("|")[4] } } | ` group -property {$_ -ge 256} -noelement | ` ft –autosize

This command does what I want, returning the output as follows:

  Count name
   ----- ----
 1129339 True
 2013703 False

However, for a 500 MB test file, this command takes about 5.5 minutes to execute, as measured by Measure-Command. A typical file is more than 2 GB, where waiting 20+ minutes is undesirable.

Do you see a way to improve the performance of this command?

For example, is there a way to determine the optimal value for Get-Content ReadCount? Without it, it takes 8.8 minutes to complete the same file.

+6

performance powershell

neontapir Jan 17 '12 at 21:26

source share

3 answers

Using the @Gisli tooltip, here's the script I ended up with:

 param($file = $(Read-Host -prompt "File")) $fullName = (Get-Item "$file").FullName $sr = New-Object System.IO.StreamReader("$fullName") $trueCount = 0; $falseCount = 0; while (($line = $sr.ReadLine()) -ne $null) { if ($line -like 'HEAD|') { continue } if ($line.split("|")[4] -ge 256) { $trueCount++ } else { $falseCount++ } } $sr.Dispose() write "True count: $trueCount" write "False count: $falseCount"

It gives the same results in about a minute, which matches my performance requirements.

+4

neontapir Jan 17 '12 at 23:11

source share

Just add another example of using StreamReader to read a very large IIS log file and display all the unique IP addresses of clients and some perforation metrics.

 $path = 'A_245MB_IIS_Log_File.txt' $r = [IO.File]::OpenText($path) $clients = @{} while ($r.Peek() -ge 0) { $line = $r.ReadLine() # String processing here... if (-not $line.StartsWith('#')) { $split = $line.Split() $client = $split[-5] if (-not $clients.ContainsKey($client)){ $clients.Add($client, $null) } } } $r.Dispose() $clients.Keys | Sort

A small performance comparison with Get-Content :

StreamReader : Completed: 5.5 seconds, powershell.exe :. 35328 KB of RAM

Get content : Completed: 23.6 seconds. PowerShell.exe: 1,110,524 KB of RAM.

+2

Andy arismendi Jan 18 '12 at 0:16

source share

Gisli · Accepted Answer · 2012-01-17T22:52:58+0000

Have you tried StreamReader? I think Get-Content loads the entire file into memory before it does anything with it.

class StreamReader

Need help improving PowerShell script syntax markup performance

More articles: