Forum Discussion
Help with Using PowerShell to split a file by a particular string and saving as specific name
- Jul 17, 2024
Hi, Marvin.
Here's a basic working template to get you started, where I've assumed the example data you've provided is consistently formed.
If, for example, "MessageA" could in fact feature a space, such as "Message A", then some logic would have to be added to the script to allow for that.
Input data file
Script
$SourceFile = "D:\Data\Temp\Forum\forum.txt"; $SourceDirectory = [System.IO.Path]::GetDirectoryName($SourceFile) + "\"; $FileOpen = $false; $Timestamp = [datetime]::MinValue; Get-Content -Path $SourceFile | ForEach-Object { $Line = $_; # Check this isn't empty space in between files. If so, skip it. if ((-not $FileOpen) -and [string]::IsNullOrWhiteSpace($Line)) { # We do nothing here, meaning we skip the empty space between files. } # Check if we've hit a well-formed line that indicates the start of a new message-aligned file. elseif ((-not $FileOpen) -and ($Line.Length -gt 10) -and ([datetime]::TryParse($Line.Substring(0, 10), [ref] $Timestamp) -and (4 -eq ($Parts = [regex]::Split($Line, "\s+")).Count))) { $NewFileName = [string]::Concat($SourceDirectory, $Parts[1], ".txt"); Out-File -FilePath $NewFileName -InputObject $Line -ErrorAction:Stop; $FileOpen = $true; } # Check if we've hit a well-formed line indicating the end of a file. elseif ("*End of message" -eq $Line) { # This is more of a safety check, since outside of an error condition, $FileOpen should always be $true. if ($FileOpen) { Out-File -FilePath $NewFileName -InputObject $Line -Append; } $FileOpen = $false; $NewFileName = $null; } # Otherwise, if a file is considered "open", weite the line to it. (Mechanically, the file isn't really open - it's just easier to conceptualise it that way.) elseif ($FileOpen) { Out-File -FilePath $NewFileName -InputObject $Line -Append; } }Output
Cheers,
Lain
Hi, Marvin.
Here's a basic working template to get you started, where I've assumed the example data you've provided is consistently formed.
If, for example, "MessageA" could in fact feature a space, such as "Message A", then some logic would have to be added to the script to allow for that.
Input data file
Script
$SourceFile = "D:\Data\Temp\Forum\forum.txt";
$SourceDirectory = [System.IO.Path]::GetDirectoryName($SourceFile) + "\";
$FileOpen = $false;
$Timestamp = [datetime]::MinValue;
Get-Content -Path $SourceFile |
ForEach-Object {
$Line = $_;
# Check this isn't empty space in between files. If so, skip it.
if ((-not $FileOpen) -and [string]::IsNullOrWhiteSpace($Line))
{
# We do nothing here, meaning we skip the empty space between files.
}
# Check if we've hit a well-formed line that indicates the start of a new message-aligned file.
elseif ((-not $FileOpen) -and
($Line.Length -gt 10) -and
([datetime]::TryParse($Line.Substring(0, 10), [ref] $Timestamp) -and
(4 -eq ($Parts = [regex]::Split($Line, "\s+")).Count)))
{
$NewFileName = [string]::Concat($SourceDirectory, $Parts[1], ".txt");
Out-File -FilePath $NewFileName -InputObject $Line -ErrorAction:Stop;
$FileOpen = $true;
}
# Check if we've hit a well-formed line indicating the end of a file.
elseif ("*End of message" -eq $Line)
{
# This is more of a safety check, since outside of an error condition, $FileOpen should always be $true.
if ($FileOpen)
{
Out-File -FilePath $NewFileName -InputObject $Line -Append;
}
$FileOpen = $false;
$NewFileName = $null;
}
# Otherwise, if a file is considered "open", weite the line to it. (Mechanically, the file isn't really open - it's just easier to conceptualise it that way.)
elseif ($FileOpen)
{
Out-File -FilePath $NewFileName -InputObject $Line -Append;
}
}
Output
Cheers,
Lain
Good day, Thank you again so much for your assistance, thanks to your explanation I was able to understand and make some slight changes that didn't use the .DOT NET references.
With regards to the file header the pattern is as follows
16/07/24-07:35:54 Printer-5592-000034 34
The middle portion is a set length so i was able to match the pattern using regex expressions. The comparisons for the line being empty was also done using more PowerShell type methods. Hope that isn't an issue, but it was just easier for me to read and I wanted to keep the solution consistent.
Here's the modified code.
$INPRINTPATH= " # this is the path where the files would be stored to be split*"
$OUTPRINTPATH="#this is the path where the resulting files will be saved "
#$FILEPATH is the full path of all files in the INPRINTPATH directory (so if it's 1 file or multiple it will list them)
$FILEPATH = Get-Childitem -Path $INPRINTPATH | %{$_.FullName}
#This is the pattern we are looking for to save the file name as once found (it's a mandatory line so it WILL be in the files.
$PATTERN = 'Printer-[0-9][0-9][0-9][[0-9]-[0-9][0-9][0-9][0-9][0-9][0-9]'
$FILEOPEN = $false;
GET-CONTENT -PATH $FILEPATH |
ForEach-Object {
$LINE = $_;
# non DOT.NET method of checking for whitespace.
if ((-not $FILEOPEN) -and ($LINE.ToString() -eq ""))
{
}
# Non dot net method for finding the pattern $matches[0] is the object that stores the pattern and you can extract the data from it
if ($LINE.Substring(0) -match $PATTERN)
{
$NEWFILENAME = $OUTPRINTPATH + ($matches[0].Substring(0)) + ".txt";
Out-File -FilePath $NEWFILENAME -InputObject $LINE -ErrorAction:Stop;
$FILEOPEN = $true;
}
elseif ($LINE.ToString() -eq "*End of Message")
{
if ($FILEOPEN)
{
Out-File -FilePath $NEWFILENAME -InputObject $LINE -Append;
}
$FILEOPEN = $false
$NEWFILENAME = $null;
}
elseif ($FILEOPEN)
{
Out-File -FilePath $NEWFILENAME -InputObject $LINE -Append;
}
}