Powershell script remove double quotes from CSV if comma does not exist inside double quotes

Question

Powershell script remove double quotes from CSV if comma does not exist inside double quotes

I have .csv in the following file format:

In: "bob","1234 Main St, New York, NY","cool guy"

I want to remove double quotes where there is no comma:

 Out: bob,"1234 Main St, New York, Ny",cool guy

Is there a way to do this in Powershell?

I checked:

+6

regex powershell csv

jgaw May 14, '15 at 20:44

source share

3 answers

I don’t know exactly what the rest of your script looks like. Try something in this direction though

 (("bob","1234 Main St, New York, NY","cool guy") -split '"' | ForEach-Object {IF ($_ -match ",") {'"' + $_ + '"' } ELSE {$_}}) -join ","

+1

markg May 14, '15 at 21:00

source share

Existing answers work well with sample input:

The helpful answer by Wiktor Stribi ew , which identifies fields with double quotes that do not contain , using a regular expression, first loads the entire input file into memory, which allows replacing the input file with results in a single pipeline.
Although this is convenient - and faster than cross-processing - the caveat is that it may not be an option for large input files.
markg's useful answer , which breaks the lines into fields using " chars.," is an alternative for large input files, since it uses a pipeline to process input lines in turn.
(As a result, the input file cannot be directly updated with the result.)

If we generalize the OP requirement for processing fields with embedded characters " . , We need a different approach:

Then the following fields should keep double quotes:

(optionally) double-quoted fields with embedded characters,; eg,
"1234 Main St, New York, NY"
(optionally) double-quoted fields with embedded characters " , which should be escaped as "" for RFC 4180 , i.e. doubles; for example,
"Nat ""King"" Cole"

^Note: ^{- We are not dealing with fields that may contain embedded line breaks, since this will require a fundamentally different approach, because autonomous stepwise processing is no longer possible.} ^{- Wiktor Stribiżew hat tip , which came up with a regular expression to ensure that the double quote field matches any number of built-in double quotes, escaped as "" : "([^"]*(?:""[^"]*)*)"}

 # Create sample CSV file with double-quoted fields that contain # just ',', just embedded double quotes ('""'), and both. @' bob,"1234 Main St, New York, NY","cool guy" nat,"Nat ""King"" Cole Lane","cool singer" nat2,"Nat ""King"" Cole Lane, NY","cool singer" '@ | Set-Content ./test.csv Get-Content ./test.csv | ForEach-Object { # Match all double-quoted fields on the line, and replace those that # contain neither commas nor embedded double quotes with just their content, # ie, with enclosing double quotes removed. ([regex] '"([^"]*(?:""[^"]*)*)"').Replace($_, { param($match) $fieldContent = $match.Groups[1] if ($fieldContent -match '[,"]') { $match } else { $fieldContent } }) }

This gives:

 bob,"1234 Main St, New York, NY",cool guy nat,"Nat ""King"" Cole Lane",cool singer nat2,"Nat ""King"" Cole Lane, NY",cool singer

Input file update :

As in markg's answer, due to the phased processing, you cannot directly update the input file with the output in the same pipeline.
To update the iput file later, use a temporary output file, and then replace it with the input file ( ... represents the Get-Content pipeline from above, with only $csvFile instead of ./test.csv ):

 $csvfile = 'c:\path\to\some.csv' $tmpFile = $env:TEMP\tmp.$PID.csv ... | Set-Content $tmpFile if ($?) { Move-Item -Force $tmpFile $csvFile }

Note that Set-Content uses system single-byte encoding with an extended ASCII character by default (even if the help topic falsely specifies ASCII ).

Using the -Encoding parameter, you can specify a different encoding, but note that UTF-16LE, by default for Out-File / > , causes the CSV file to not be recognized properly by Excel, for example.

0

mklement0 Mar 03 '17 at 17:40

source share

Wiktor stribiżew · Accepted Answer · 2015-05-14T21:59:30+0000

Adapting the code from "How to remove double quotes in a specific column from a CSV file using a Powershell script" :

 $csv = 'C:\path\to\your.csv' (Get-Content $csv) -replace '(?m)"([^,]*?)"(?=,|$)', '$1' | Set-Content $csv

The regular expression (?m)"([^,]*?)"(?=,|$) " + 0 or more non-commas + " any " + 0 or more non-commas + " before the comma or end of line (achieved with a positive appearance and multi-line option (?m) , which makes $ match the newline, not just the end of the line).

Watch the regex demo

Powershell script remove double quotes from CSV if comma does not exist inside double quotes

More articles: