I need to analyze a large file with channel delimiters in order to count the number of records whose 5th column matches and does not meet my criteria.
PS C:\temp> gc .\items.txt -readcount 1000 | ` ? { $_ -notlike "HEAD" } | ` % { foreach ($s in $_) { $s.split("|")[4] } } | ` group -property {$_ -ge 256} -noelement | ` ft βautosize
This command does what I want, returning the output as follows:
Count name
----- ----
1129339 True
2013703 False
However, for a 500 MB test file, this command takes about 5.5 minutes to execute, as measured by Measure-Command. A typical file is more than 2 GB, where waiting 20+ minutes is undesirable.
Do you see a way to improve the performance of this command?
For example, is there a way to determine the optimal value for Get-Content ReadCount? Without it, it takes 8.8 minutes to complete the same file.
source share