Categories
Sponsors
Archive
Blogroll
Badges
Community
|
Posted in Windows Powershell | 4 Comments | 2,873 views | 28/02/2015 09:31
Well, if you read first part, now we will continue with text manipulations on PowerShell.
Test File: 424390 lines, 200 MB Microsoft IIS Log
In first part, winner was “System.IO.StreamReader” so I’ll continue with that.
1. Let’s try a Replace on our script:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| $LogFilePath = "C:\large.log"
$FileStream = New-Object -TypeName IO.FileStream -ArgumentList ($LogFilePath), ([System.IO.FileMode]::Open), ([System.IO.FileAccess]::Read), ([System.IO.FileShare]::ReadWrite);
$ReadLogFile = New-Object -TypeName System.IO.StreamReader -ArgumentList ($FileStream, [System.Text.Encoding]::ASCII, $true);
[int]$LineNumber = 0;
# Read Lines
while (!$ReadLogFile.EndOfStream)
{
$LogContent = $ReadLogFile.ReadLine()
$LineNumber++
$LogContent = $LogContent.Replace('\','\\')
}
$ReadLogFile.Close() |
$LogFilePath = "C:\large.log"
$FileStream = New-Object -TypeName IO.FileStream -ArgumentList ($LogFilePath), ([System.IO.FileMode]::Open), ([System.IO.FileAccess]::Read), ([System.IO.FileShare]::ReadWrite);
$ReadLogFile = New-Object -TypeName System.IO.StreamReader -ArgumentList ($FileStream, [System.Text.Encoding]::ASCII, $true);
[int]$LineNumber = 0;
# Read Lines
while (!$ReadLogFile.EndOfStream)
{
$LogContent = $ReadLogFile.ReadLine()
$LineNumber++
$LogContent = $LogContent.Replace('\','\\')
}
$ReadLogFile.Close()
After replace, script execution time: 3.2394121 seconds.
So what happens if I use Regex instead of Replace?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| $LogFilePath = "C:\large.log"
$FileStream = New-Object -TypeName IO.FileStream -ArgumentList ($LogFilePath), ([System.IO.FileMode]::Open), ([System.IO.FileAccess]::Read), ([System.IO.FileShare]::ReadWrite);
$ReadLogFile = New-Object -TypeName System.IO.StreamReader -ArgumentList ($FileStream, [System.Text.Encoding]::ASCII, $true);
[int]$LineNumber = 0;
# Read Lines
while (!$ReadLogFile.EndOfStream)
{
$LogContent = $ReadLogFile.ReadLine()
$LineNumber++
$LogContent = $LogContent -replace "\\", "\\\\"
}
$ReadLogFile.Close() |
$LogFilePath = "C:\large.log"
$FileStream = New-Object -TypeName IO.FileStream -ArgumentList ($LogFilePath), ([System.IO.FileMode]::Open), ([System.IO.FileAccess]::Read), ([System.IO.FileShare]::ReadWrite);
$ReadLogFile = New-Object -TypeName System.IO.StreamReader -ArgumentList ($FileStream, [System.Text.Encoding]::ASCII, $true);
[int]$LineNumber = 0;
# Read Lines
while (!$ReadLogFile.EndOfStream)
{
$LogContent = $ReadLogFile.ReadLine()
$LineNumber++
$LogContent = $LogContent -replace "\\", "\\\\"
}
$ReadLogFile.Close()
Now script execution time: 25.1311866 seconds. So .Net Replace is your best friend :)
Winner: Replace
2. What happens if I use -notlike in my text operation?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| $LogFilePath = "C:\large.log"
$FileStream = New-Object -TypeName IO.FileStream -ArgumentList ($LogFilePath), ([System.IO.FileMode]::Open), ([System.IO.FileAccess]::Read), ([System.IO.FileShare]::ReadWrite);
$ReadLogFile = New-Object -TypeName System.IO.StreamReader -ArgumentList ($FileStream, [System.Text.Encoding]::ASCII, $true);
[int]$LineNumber = 0;
[int]$TestCount = 0;
# Read Lines
while (!$ReadLogFile.EndOfStream)
{
$LogContent = $ReadLogFile.ReadLine()
$LineNumber++
$LogContent = $LogContent.Replace('\','\\')
if ($LogContent -notlike "#*")
{
$TestCount++
}
}
$ReadLogFile.Close() |
$LogFilePath = "C:\large.log"
$FileStream = New-Object -TypeName IO.FileStream -ArgumentList ($LogFilePath), ([System.IO.FileMode]::Open), ([System.IO.FileAccess]::Read), ([System.IO.FileShare]::ReadWrite);
$ReadLogFile = New-Object -TypeName System.IO.StreamReader -ArgumentList ($FileStream, [System.Text.Encoding]::ASCII, $true);
[int]$LineNumber = 0;
[int]$TestCount = 0;
# Read Lines
while (!$ReadLogFile.EndOfStream)
{
$LogContent = $ReadLogFile.ReadLine()
$LineNumber++
$LogContent = $LogContent.Replace('\','\\')
if ($LogContent -notlike "#*")
{
$TestCount++
}
}
$ReadLogFile.Close()
Script takes 50.1493736 seconds.
But do I have another way for this query? Yes, I can use something like this. Let’s try -ne:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| $LogFilePath = "C:\large.log"
$FileStream = New-Object -TypeName IO.FileStream -ArgumentList ($LogFilePath), ([System.IO.FileMode]::Open), ([System.IO.FileAccess]::Read), ([System.IO.FileShare]::ReadWrite);
$ReadLogFile = New-Object -TypeName System.IO.StreamReader -ArgumentList ($FileStream, [System.Text.Encoding]::ASCII, $true);
[int]$LineNumber = 0;
[int]$TestCount = 0;
# Read Lines
while (!$ReadLogFile.EndOfStream)
{
$LogContent = $ReadLogFile.ReadLine()
$LineNumber++
$LogContent = $LogContent.Replace('\','\\')
if ($LogContent.Substring(0,1) -ne "#")
{
$TestCount++
}
}
$ReadLogFile.Close() |
$LogFilePath = "C:\large.log"
$FileStream = New-Object -TypeName IO.FileStream -ArgumentList ($LogFilePath), ([System.IO.FileMode]::Open), ([System.IO.FileAccess]::Read), ([System.IO.FileShare]::ReadWrite);
$ReadLogFile = New-Object -TypeName System.IO.StreamReader -ArgumentList ($FileStream, [System.Text.Encoding]::ASCII, $true);
[int]$LineNumber = 0;
[int]$TestCount = 0;
# Read Lines
while (!$ReadLogFile.EndOfStream)
{
$LogContent = $ReadLogFile.ReadLine()
$LineNumber++
$LogContent = $LogContent.Replace('\','\\')
if ($LogContent.Substring(0,1) -ne "#")
{
$TestCount++
}
}
$ReadLogFile.Close()
Script takes 25.3682308 seconds. OMG! :)
So using -eq/-ne queries 50% faster than -like/-notlike queries. Try to use them if it’s possible.
Winner: -EQ
To be continued.. :)
Posted in Windows Powershell | 4 Comments | 8,514 views | 28/02/2015 04:08
I want to give some performance tips for large text operations on PowerShell.
Test File: 424390 lines, 200 MB Microsoft IIS Log
1. First of all, we have to read file :) Lets try our alternatives:
a. Native command: Get-Content
1
2
3
4
5
6
7
8
9
| $LogFilePath = "C:\large.log"
$Lines = Get-Content $LogFilePath
[int]$LineNumber = 0;
# Read Lines
foreach ($Line in $Lines)
{
$LineNumber++
} |
$LogFilePath = "C:\large.log"
$Lines = Get-Content $LogFilePath
[int]$LineNumber = 0;
# Read Lines
foreach ($Line in $Lines)
{
$LineNumber++
}
If I use this option, script takes: 13.3727013 seconds to read and loop in 424390 lines.
But how about memory usage?
Get-Content stores file into memory, so it’s normal to see high memory usage.
b. Using .Net method: [io.file]::ReadAllLines
1
2
3
4
5
6
7
8
9
| $LogFilePath = "C:\large.log"
$Lines = [io.file]::ReadAllLines($LogFilePath)
[int]$LineNumber = 0;
# Read Lines
foreach ($Line in $Lines)
{
$LineNumber++
} |
$LogFilePath = "C:\large.log"
$Lines = [io.file]::ReadAllLines($LogFilePath)
[int]$LineNumber = 0;
# Read Lines
foreach ($Line in $Lines)
{
$LineNumber++
}
In this option, script takes: 2.0082615 seconds to read and loop in 424390 lines which is extremely fast instead of Get-Content.
Memory usage is less than Get-Content but still too much. Also I can’t capture it but CPU is max 13%.
c. Using .Net method: System.IO.StreamReader
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| $LogFilePath = "C:\large.log"
$FileStream = New-Object -TypeName IO.FileStream -ArgumentList ($LogFilePath), ([System.IO.FileMode]::Open), ([System.IO.FileAccess]::Read), ([System.IO.FileShare]::ReadWrite);
$ReadLogFile = New-Object -TypeName System.IO.StreamReader -ArgumentList ($FileStream, [System.Text.Encoding]::ASCII, $true);
[int]$LineNumber = 0;
# Read Lines
while (!$ReadLogFile.EndOfStream)
{
$LogContent = $ReadLogFile.ReadLine()
$LineNumber++
}
$ReadLogFile.Close() |
$LogFilePath = "C:\large.log"
$FileStream = New-Object -TypeName IO.FileStream -ArgumentList ($LogFilePath), ([System.IO.FileMode]::Open), ([System.IO.FileAccess]::Read), ([System.IO.FileShare]::ReadWrite);
$ReadLogFile = New-Object -TypeName System.IO.StreamReader -ArgumentList ($FileStream, [System.Text.Encoding]::ASCII, $true);
[int]$LineNumber = 0;
# Read Lines
while (!$ReadLogFile.EndOfStream)
{
$LogContent = $ReadLogFile.ReadLine()
$LineNumber++
}
$ReadLogFile.Close()
If I use this option, script takes: 1.7062244 seconds to read and loop in 424390 lines. This seems fastest method.
Also memory usage is too low because it reads file line by line. So PowerShell doesn’t hold file in memory.
But in this case, CPU usage is still too high. Probably it’s killing server’s one core at running time. But it’s something that I can’t help :)
Winner: System.IO.StreamReader
In next part, I’ll show you text manipulation tips. See you.
|