Powershell script remove double quotes from CSV if comma does not exist inside double quotes

3 answers

Adapting the code from "How to remove double quotes in a specific column from a CSV file using a Powershell script" :

 $csv = 'C:\path\to\your.csv' (Get-Content $csv) -replace '(?m)"([^,]*?)"(?=,|$)', '$1' | Set-Content $csv 

The regular expression (?m)"([^,]*?)"(?=,|$) " + 0 or more non-commas + " any " + 0 or more non-commas + " before the comma or end of line (achieved with a positive appearance and multi-line option (?m) , which makes $ match the newline, not just the end of the line).

Watch the regex demo

+3
source

I don’t know exactly what the rest of your script looks like. Try something in this direction though

 (("bob","1234 Main St, New York, NY","cool guy") -split '"' | ForEach-Object {IF ($_ -match ",") {'"' + $_ + '"' } ELSE {$_}}) -join "," 
+1
source

Existing answers work well with sample input:

  • The helpful answer by Wiktor Stribi ew , which identifies fields with double quotes that do not contain , using a regular expression, first loads the entire input file into memory, which allows replacing the input file with results in a single pipeline.
    Although this is convenient - and faster than cross-processing - the caveat is that it may not be an option for large input files.
  • markg's useful answer , which breaks the lines into fields using " chars.," is an alternative for large input files, since it uses a pipeline to process input lines in turn.
    (As a result, the input file cannot be directly updated with the result.)

If we generalize the OP requirement for processing fields with embedded characters " . , We need a different approach:

Then the following fields should keep double quotes:

  • (optionally) double-quoted fields with embedded characters,; eg,
    "1234 Main St, New York, NY"
  • (optionally) double-quoted fields with embedded characters " , which should be escaped as "" for RFC 4180 , i.e. doubles; for example,
    "Nat ""King"" Cole"

Note:
- We are not dealing with fields that may contain embedded line breaks, since this will require a fundamentally different approach, because autonomous stepwise processing is no longer possible.
- Wiktor Stribiżew hat tip , which came up with a regular expression to ensure that the double quote field matches any number of built-in double quotes, escaped as "" : "([^"]*(?:""[^"]*)*)"

 # Create sample CSV file with double-quoted fields that contain # just ',', just embedded double quotes ('""'), and both. @' bob,"1234 Main St, New York, NY","cool guy" nat,"Nat ""King"" Cole Lane","cool singer" nat2,"Nat ""King"" Cole Lane, NY","cool singer" '@ | Set-Content ./test.csv Get-Content ./test.csv | ForEach-Object { # Match all double-quoted fields on the line, and replace those that # contain neither commas nor embedded double quotes with just their content, # ie, with enclosing double quotes removed. ([regex] '"([^"]*(?:""[^"]*)*)"').Replace($_, { param($match) $fieldContent = $match.Groups[1] if ($fieldContent -match '[,"]') { $match } else { $fieldContent } }) } 

This gives:

 bob,"1234 Main St, New York, NY",cool guy nat,"Nat ""King"" Cole Lane",cool singer nat2,"Nat ""King"" Cole Lane, NY",cool singer 

Input file update :

As in markg's answer, due to the phased processing, you cannot directly update the input file with the output in the same pipeline.
To update the iput file later, use a temporary output file, and then replace it with the input file ( ... represents the Get-Content pipeline from above, with only $csvFile instead of ./test.csv ):

 $csvfile = 'c:\path\to\some.csv' $tmpFile = $env:TEMP\tmp.$PID.csv ... | Set-Content $tmpFile if ($?) { Move-Item -Force $tmpFile $csvFile } 

Note that Set-Content uses system single-byte encoding with an extended ASCII character by default (even if the help topic falsely specifies ASCII ).

Using the -Encoding parameter, you can specify a different encoding, but note that UTF-16LE, by default for Out-File / > , causes the CSV file to not be recognized properly by Excel, for example.

0
source

Source: https://habr.com/ru/post/987229/


All Articles