Forum Discussion
Remove a comma from a CSV file for a particular column only
Hi everyone
I have a CSV file that has some bad data in a particular column called ColumnA. ColumnA is a column of names. Some of the names have a comma in it. For example, "Invesco QQQ Trust, Series 1". This extra comma is causing issues when I try to import the file because the delimiter is a comma too (ie CSV). The extra comma in the name needs to be removed. The file is a CSV so commas are needed. I need a Powershell script that removes commas for a particular column (ie ColumnA) and only those commas that are not at the beginning or end of the word.
For example:
",Invesco QQQ Trust, Series 1," becomes ",Invesco QQQ Trust, Series 1,"
The leading and lagging commas are still needed in order to preserve the structure of the CSV.
How can I do this in Powershell?
Thank you in advance
cc: LainRobertson
Hi manny213 ,
You can't provide a wildcard specification such as "*.csv" to the Import-Csv commandlet. That's why you're getting that error.
Given Import-Csv can only accept a single file, you have to wrap it within an outer ForEach-Object statement.
Get-ChildItem -Path "d:\data\temp\forum\*.csv" | ForEach-Object { # Name of the temporary working file. $tempFile = [regex]::Replace($_.FullName, ".csv$", "-temp.csv"); try { # Parse the original file and pipe to the temporary file. Import-Csv -Path $_.FullName -ErrorAction:Stop | ForEach-Object { [PSCustomObject] @{ SYMBOL = $_.SYMBOL; INTERVAL = $_.INTERVAL; NAME = $_.NAME.Replace(",", ""); DATE = $_.DATE; OPEN = $_.OPEN; HIGH = $_.HIGH; LOW = $_.LOW; CLOSE = $_.CLOSE; VOLUME = $_.VOLUME; } } | Export-Csv -Path $tempFile -NoTypeInformation -ErrorAction:Stop; # Remove original file. Remove-Item -Path $_.FullName -ErrorAction:Stop; # Rename temporary file to the original file's name. Rename-Item -Path $tempFile -NewName $_.Name -ErrorAction:Stop; } catch { Write-Error -Exception $_.Exception; } }
This basic script will attempt to parse the original file before replacing it, and will throw an error if it's unable to do so.
It will repeat this process for each file within the directory, as illustrated by the outer ForEach-Object block on line 1, which is where you can specify wildcards via the Get-ChildItem commandlet.
This is the sample original file I used.
This is the resulting file after parsing.
Cheers,
Lain
8 Replies
- LainRobertsonSilver Contributor
Hi manny213 ,
I'm confused by your example as in your statement:
",Invesco QQQ Trust, Series 1," becomes ",Invesco QQQ Trust, Series 1,"
Both sides are equal to each other.
First, it's important to understand that there can be variety in the formatting of a CSV file.
In some CSV files - and this is the default with PowerShell as well - the columns are enclosed in double-quotes:
While in others, columns are not enclosed within double-quotes:
If your file conforms to the first standard of using double-quotes then there should be no issues with importing the CSV data, as demonstrated below.
Conversely, if your CSV file confirms to the second standard without double-quotes, then there's no foolproof way to handle this. Instead, you would have to go back to whoever/whatever is creating the CSV files and see if they/it can include the double-quotes.
Now, if your file conforms to the first standard of using double-quotes and you still want to remove the commas from a column, then that's done easily enough.
Import-Csv -Path .\forum.csv | ForEach-Object { [PSCustomObject] @{ columnA = $_.columnA; columnB = $_.columnB.Replace(",", " "); columnC = $_.columnC; } }
Example
Cheers,
Lain- manny213Brass Contributor
Thank you for the prompt reply.
The data is coming from a program that outputs the CSV without any double quotes (consistently). The company is a big one and I don't know if they want to change their process for me. I'm just a small fish to them.
I am importing the data into SQL Server but this particular file failed to load correctly. SQL Server is using a comma as a delimiter to do the import. When it sees the extra comma the data goes into the wrong columns for this particular case.
Here is how the file looks like:
Here is another one with no double quotes:
I think the company is aware of the issue caused by the extra comma. Notice how they include double quotes around the problem entry (first one) and not for the second one (it has no extra comma). This difference can be used to come up with a solution. I think the algorithm is as follows:
- Look at the file and look for a starting double quote.
- In the same line look for the ending double quote.
- If you find a comma between positions 1 and 2 then remove it.
There is probably a better approach that mine.
There are close to 7 million records so performance is important.
Can this be done in Powershell?
Thank you
- LainRobertsonSilver Contributor
Hi manny213 ,
Mixing the use of double-quotes isn't common but it still works, as shown using the following example based on my earlier post.
Example
What are you actually trying to achieve via PowerShell? Do you want to overwrite the original file; create a new file; just output the data, etc.?
Cheers,
Lain