Forum Discussion

filigrana's avatar
filigrana
Copper Contributor
Mar 31, 2024
Solved

split a text file using a string as delimiter

greetings to the forum...

i have a (huge) text file made like this

____________________________________

string blahblahblah(1)

blahblahblah(2)

 

string blahblahblah(3)

blahblahblah(4)

 

string blahblahblah(5)

blahblahblah(6)

..........................................

_________________________________________

i have to obtain a number n of files equal to the n occurrences of string, done this way

___________________________

string blahblahblah(1)

blahblahblah(2)
___________________________

___________________________

string blahblahblah(3)

blahblahblah(4)
___________________________

___________________________

string blahblahblah(5)

blahblahblah(6)
___________________________

and so on...

obviously the various blahblahblah(x) are texts of variable length...

I know it's possible to do it with powershell, but unfortunately I don't master it and the resources on the net didn't help me...

can anyone help me?

Thank you.

  • The script creates new files in the same directory as the original file, naming them with a base name followed by a sequence number and the .txt extension (e.g., splitFile_1.txt, splitFile_2.txt, etc.).

    powershell
    Copy code
    # Define your parameters
    $filePath = "C:\path\to\your\file.txt" # Path to your huge text file
    $delimiter = "string" # Your delimiter
    $baseOutputPath = "C:\path\to\output\splitFile_" # Base path and filename for output files

    # Initialize variables
    $fileCounter = 1
    $currentContent = @()

    # Read the file line by line
    Get-Content -Path $filePath | ForEach-Object {
    if ($_ -match $delimiter -and $currentContent.Count -gt 0) {
    # Output the current content to a file
    $currentContent | Out-File -FilePath ($baseOutputPath + $fileCounter + ".txt")
    # Increment the file counter and reset the current content
    $fileCounter++
    $currentContent = @()
    }
    $currentContent += $_
    }

    # Don't forget to output the last chunk if it exists
    if ($currentContent.Count -gt 0) {
    $currentContent | Out-File -FilePath ($baseOutputPath + $fileCounter + ".txt")
    }
    Here's how to use this script:

    Replace $filePath with the full path to your text file.
    Change $delimiter to the string you're using to split the files (it appears you're using "string" as your delimiter).
    Set $baseOutputPath to the directory and base filename where you want to save the split files. The script will append numbers to this base name to create the individual filenames.
    This script works by reading each line of the input file. Whenever it encounters the delimiter (indicating the start of a new section), it writes the accumulated lines to a new file and starts collecting lines afresh for the next file.

    Remember to adjust the file paths and delimiter according to your specific needs before running the script.

6 Replies

  • filigrana's avatar
    filigrana
    Copper Contributor
    where can you learn all these great things about windows power shell? I want to learn them too...
    • Dalbir3's avatar
      Dalbir3
      Copper Contributor

      filigrana 

       

      YouTube

      Udemy.com

       

      Microsoft learn google a few things

       

      if you want to invest into it

      pluralsight.com

      cbt nuggets

      amazon books


      essentially learn 1-5 commands then mix and match them, there tons of scripts on git hub 

       

      Use visual studio code, powershell ise 

       

      I would take what you have there and add more scope to it to learn more on the powershell side like log the output 

       

  • Dalbir3's avatar
    Dalbir3
    Copper Contributor
    The script creates new files in the same directory as the original file, naming them with a base name followed by a sequence number and the .txt extension (e.g., splitFile_1.txt, splitFile_2.txt, etc.).

    powershell
    Copy code
    # Define your parameters
    $filePath = "C:\path\to\your\file.txt" # Path to your huge text file
    $delimiter = "string" # Your delimiter
    $baseOutputPath = "C:\path\to\output\splitFile_" # Base path and filename for output files

    # Initialize variables
    $fileCounter = 1
    $currentContent = @()

    # Read the file line by line
    Get-Content -Path $filePath | ForEach-Object {
    if ($_ -match $delimiter -and $currentContent.Count -gt 0) {
    # Output the current content to a file
    $currentContent | Out-File -FilePath ($baseOutputPath + $fileCounter + ".txt")
    # Increment the file counter and reset the current content
    $fileCounter++
    $currentContent = @()
    }
    $currentContent += $_
    }

    # Don't forget to output the last chunk if it exists
    if ($currentContent.Count -gt 0) {
    $currentContent | Out-File -FilePath ($baseOutputPath + $fileCounter + ".txt")
    }
    Here's how to use this script:

    Replace $filePath with the full path to your text file.
    Change $delimiter to the string you're using to split the files (it appears you're using "string" as your delimiter).
    Set $baseOutputPath to the directory and base filename where you want to save the split files. The script will append numbers to this base name to create the individual filenames.
    This script works by reading each line of the input file. Whenever it encounters the delimiter (indicating the start of a new section), it writes the accumulated lines to a new file and starts collecting lines afresh for the next file.

    Remember to adjust the file paths and delimiter according to your specific needs before running the script.
    • filigrana's avatar
      filigrana
      Copper Contributor
      IT WORKS!!!!!!!!!!!!
      you are a great!!!!!!!
      thank you!!!
    • filigrana's avatar
      filigrana
      Copper Contributor

      Dalbir3

      uhm...i am an animal...
      i have to save your script as split.ps1, put it in the huge text file folder, run powershell, go in that folder and give the command .\split.ps1, is this correct?

Resources