Posted in
Windows Powershell |
4 Comments | 8,499 views | 28/02/2015 04:08
I want to give some performance tips for large text operations on PowerShell.
Test File: 424390 lines, 200 MB Microsoft IIS Log
1. First of all, we have to read file :) Lets try our alternatives:
a. Native command: Get-Content
1
2
3
4
5
6
7
8
9
| $LogFilePath = "C:\large.log"
$Lines = Get-Content $LogFilePath
[int]$LineNumber = 0;
# Read Lines
foreach ($Line in $Lines)
{
$LineNumber++
} |
$LogFilePath = "C:\large.log"
$Lines = Get-Content $LogFilePath
[int]$LineNumber = 0;
# Read Lines
foreach ($Line in $Lines)
{
$LineNumber++
}
If I use this option, script takes: 13.3727013 seconds to read and loop in 424390 lines.
But how about memory usage?
Get-Content stores file into memory, so it’s normal to see high memory usage.
b. Using .Net method: [io.file]::ReadAllLines
1
2
3
4
5
6
7
8
9
| $LogFilePath = "C:\large.log"
$Lines = [io.file]::ReadAllLines($LogFilePath)
[int]$LineNumber = 0;
# Read Lines
foreach ($Line in $Lines)
{
$LineNumber++
} |
$LogFilePath = "C:\large.log"
$Lines = [io.file]::ReadAllLines($LogFilePath)
[int]$LineNumber = 0;
# Read Lines
foreach ($Line in $Lines)
{
$LineNumber++
}
In this option, script takes: 2.0082615 seconds to read and loop in 424390 lines which is extremely fast instead of Get-Content.
Memory usage is less than Get-Content but still too much. Also I can’t capture it but CPU is max 13%.
c. Using .Net method: System.IO.StreamReader
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| $LogFilePath = "C:\large.log"
$FileStream = New-Object -TypeName IO.FileStream -ArgumentList ($LogFilePath), ([System.IO.FileMode]::Open), ([System.IO.FileAccess]::Read), ([System.IO.FileShare]::ReadWrite);
$ReadLogFile = New-Object -TypeName System.IO.StreamReader -ArgumentList ($FileStream, [System.Text.Encoding]::ASCII, $true);
[int]$LineNumber = 0;
# Read Lines
while (!$ReadLogFile.EndOfStream)
{
$LogContent = $ReadLogFile.ReadLine()
$LineNumber++
}
$ReadLogFile.Close() |
$LogFilePath = "C:\large.log"
$FileStream = New-Object -TypeName IO.FileStream -ArgumentList ($LogFilePath), ([System.IO.FileMode]::Open), ([System.IO.FileAccess]::Read), ([System.IO.FileShare]::ReadWrite);
$ReadLogFile = New-Object -TypeName System.IO.StreamReader -ArgumentList ($FileStream, [System.Text.Encoding]::ASCII, $true);
[int]$LineNumber = 0;
# Read Lines
while (!$ReadLogFile.EndOfStream)
{
$LogContent = $ReadLogFile.ReadLine()
$LineNumber++
}
$ReadLogFile.Close()
If I use this option, script takes: 1.7062244 seconds to read and loop in 424390 lines. This seems fastest method.
Also memory usage is too low because it reads file line by line. So PowerShell doesn’t hold file in memory.
But in this case, CPU usage is still too high. Probably it’s killing server’s one core at running time. But it’s something that I can’t help :)
Winner: System.IO.StreamReader
In next part, I’ll show you text manipulation tips. See you.