Forum Discussion

Fred_Elmendorf's avatar
Fred_Elmendorf
Brass Contributor
Jul 27, 2023
Solved

Read one data element from each of the first two rows of a text file, output to .csv

This new task requires recursing a file structure of hundreds of files to pull one data element from each of the first two rows in each of the appropriate text files, then exporting the results to a single labeled .csv file. In a recent similar task, I needed to retrieve data elements from specific columns in a text file. This task is navigating the same file structure, so I'm including the existing code, a sample input file, and the desired output file. From the first line in the file, I need to get the Station name which is the text following the =. From the second line in the file, I need to get the Version which is the text following the =.  I also need to include the Filename and the Created date, as is shown in the existing code sample. 

 

Existing code:

Get-ChildItem -Path "Q:\DFR\APP" -Filter "Status*.txt" -Recurse -Depth 2 |
ForEach-Object {
$File = $_;

Get-Content -Path ($_.FullName) -TotalCount 1 |
ForEach-Object {

if (($Columns = $_.Split(",")).Count -ge 19)
{
[PSCustomObject] @{
# Remember, arrays are zero-based, so the second column has an index of 1, not 2.
Site = $Columns[1];
Hz = $Columns[18];
Filename = $File.FullName;
Created = $file.CreationTime
} | Export-Csv -NoTypeInformation -Path "$csvlog" -Append

}

}
}

 

Sample input file:

Station=Station1 Name
RECORDER=V3.1.8
DFR=ONLINE
PC_TIME=07/26/2023-15:41:25
TIME_MARK_SOURCE=IRIG-B
TIME_MARK_TIME=07/26/2023-15:41:26.000000
Clock=SYNC(lock)
IEEE_1344=Yes
DATA_DISK_SIZE=999546736640
DATA_DRIVE=289GB/999GB

...

 

Desired Output file:

Station,Version,Filename,Created

Station1 Name,V3.1.8,Q:\DFR\APP\Station1 Name\Status99.txt,7/22/2023 10:43

  • Fred_Elmendorf 

     

    As with the last script, I'm going to assume this new requirement also only uses the first two lines.

     

    Based on this sample data across two separate files:

     

    File 1

    Station=Station1 Name
    RECORDER=V3.1.8
    DFR=ONLINE
    PC_TIME=07/26/2023-15:41:25
    TIME_MARK_SOURCE=IRIG-B
    TIME_MARK_TIME=07/26/2023-15:41:26.000000
    Clock=SYNC(lock)
    IEEE_1344=Yes
    DATA_DISK_SIZE=999546736640
    DATA_DRIVE=289GB/999GB

     

    File 2

    Station=Station2 Name
    RECORDER=V3.1.8
    DFR=ONLINE
    PC_TIME=07/26/2023-15:41:25
    TIME_MARK_SOURCE=IRIG-B
    TIME_MARK_TIME=07/26/2023-15:41:26.000000
    Clock=SYNC(lock)
    IEEE_1344=Yes
    DATA_DISK_SIZE=999546736640
    DATA_DRIVE=289GB/999GB

     

    We get this output

    "Station","Version","Filename","Created"
    "Station1 Name","V3.1.8","D:\Data\Temp\Forum\forums.txt","20/03/2023 5:51:01 PM"
    "Station2 Name","V3.1.8","D:\Data\Temp\Forum\forums2.txt","27/07/2023 9:49:17 PM"

     

    From this example script

    # Specify our CSV output file name.
    $TargetFile = "D:\Data\Temp\Forum\forum.csv";
    
    # Instantiate a new HashTable outside the loop to minimise overhead, should this have to scale to tens of thousands of files (or more.)
    $HashTable = @{};
    
    Get-ChildItem -Path "D:\Data\Temp\Forum\*" -Filter "forums*.txt" -Recurse -Depth 2 |
        ForEach-Object {
            $File = $_;
            # Read just the first two lines of the file.
            Get-Content -Path ($File.FullName) -TotalCount 2 |
                ForEach-Object {
                    # Treat the first "=" only as the separator, ensuring that even if more "=" are in the rest of the string, it's not broken up into smaller substrings.
                    $Parts = $_.Split([char[]]@("="), 2);
    
                    # Some basic validation to ensure we're not reading an ineligible file.
                    if (($HashTable.Count -gt 0) -or ("station" -eq $Parts[0]))
                    {
                        # Add the key-value pairs to the HastTable.
                        switch ($Parts[0])
                        {
                            "station" {
                                $HashTable.Add("Station", $Parts[1]);   # Add the station value.
                                continue;
                            }
    
                            "recorder" {
                                $HashTable.Add("Version", $Parts[1]);   # Add the version.
                                break;
                            }
    
                            default {
                                # We're not going to do anything in this scenario, but it's good practice to include a default handler.
                                continue;
                            }
                        }
                    }
                }
    
            # Time to output something useful, as long as both the values were obtained.
            if ($HashTable.Count -eq 2)
            {
                [PSCustomObject] @{
                    Station  = [string]$HashTable["Station"];
                    Version  = [string]$HashTable["Version"];
                    Filename = $File.FullName;
                    Created  = $File.CreationTime;
                }
            }
    
            # Clean out the HashTable, to ensure it's ready for use in the next file or just because we've finished and cleaning up after ourselves.
            $HashTable.Clear();
        } | Export-Csv -NoTypeInformation -Path $TargetFile;

     

    Cheers,

    Lain

2 Replies

  • LainRobertson's avatar
    LainRobertson
    Silver Contributor

    Fred_Elmendorf 

     

    As with the last script, I'm going to assume this new requirement also only uses the first two lines.

     

    Based on this sample data across two separate files:

     

    File 1

    Station=Station1 Name
    RECORDER=V3.1.8
    DFR=ONLINE
    PC_TIME=07/26/2023-15:41:25
    TIME_MARK_SOURCE=IRIG-B
    TIME_MARK_TIME=07/26/2023-15:41:26.000000
    Clock=SYNC(lock)
    IEEE_1344=Yes
    DATA_DISK_SIZE=999546736640
    DATA_DRIVE=289GB/999GB

     

    File 2

    Station=Station2 Name
    RECORDER=V3.1.8
    DFR=ONLINE
    PC_TIME=07/26/2023-15:41:25
    TIME_MARK_SOURCE=IRIG-B
    TIME_MARK_TIME=07/26/2023-15:41:26.000000
    Clock=SYNC(lock)
    IEEE_1344=Yes
    DATA_DISK_SIZE=999546736640
    DATA_DRIVE=289GB/999GB

     

    We get this output

    "Station","Version","Filename","Created"
    "Station1 Name","V3.1.8","D:\Data\Temp\Forum\forums.txt","20/03/2023 5:51:01 PM"
    "Station2 Name","V3.1.8","D:\Data\Temp\Forum\forums2.txt","27/07/2023 9:49:17 PM"

     

    From this example script

    # Specify our CSV output file name.
    $TargetFile = "D:\Data\Temp\Forum\forum.csv";
    
    # Instantiate a new HashTable outside the loop to minimise overhead, should this have to scale to tens of thousands of files (or more.)
    $HashTable = @{};
    
    Get-ChildItem -Path "D:\Data\Temp\Forum\*" -Filter "forums*.txt" -Recurse -Depth 2 |
        ForEach-Object {
            $File = $_;
            # Read just the first two lines of the file.
            Get-Content -Path ($File.FullName) -TotalCount 2 |
                ForEach-Object {
                    # Treat the first "=" only as the separator, ensuring that even if more "=" are in the rest of the string, it's not broken up into smaller substrings.
                    $Parts = $_.Split([char[]]@("="), 2);
    
                    # Some basic validation to ensure we're not reading an ineligible file.
                    if (($HashTable.Count -gt 0) -or ("station" -eq $Parts[0]))
                    {
                        # Add the key-value pairs to the HastTable.
                        switch ($Parts[0])
                        {
                            "station" {
                                $HashTable.Add("Station", $Parts[1]);   # Add the station value.
                                continue;
                            }
    
                            "recorder" {
                                $HashTable.Add("Version", $Parts[1]);   # Add the version.
                                break;
                            }
    
                            default {
                                # We're not going to do anything in this scenario, but it's good practice to include a default handler.
                                continue;
                            }
                        }
                    }
                }
    
            # Time to output something useful, as long as both the values were obtained.
            if ($HashTable.Count -eq 2)
            {
                [PSCustomObject] @{
                    Station  = [string]$HashTable["Station"];
                    Version  = [string]$HashTable["Version"];
                    Filename = $File.FullName;
                    Created  = $File.CreationTime;
                }
            }
    
            # Clean out the HashTable, to ensure it's ready for use in the next file or just because we've finished and cleaning up after ourselves.
            $HashTable.Clear();
        } | Export-Csv -NoTypeInformation -Path $TargetFile;

     

    Cheers,

    Lain

    • Fred_Elmendorf's avatar
      Fred_Elmendorf
      Brass Contributor
      Hi Lain,

      This solution worked perfectly first try. I really appreciate your concise but thorough responses and the comments for explanation. I'm fairly new to powershell and I'm using it out of necessity because of the resources available. These real-world tasks are helping me learn as I go.

      Thanks!
      Fred