Match unlimited quotes in quoted csv

I looked through several Qaru posts with similar headers, and none of the accepted answers helped.

I have a CSV file where each “cell” of data is comma-delimited and quoted (including numbers). Each line ends with a new line symbol.

Some text “cells” have quotation marks in them, and I want to use a regex to find them so that I can avoid them.

Example line:

"0","0.23432","234.232342","data here dsfsd hfsdf","3/1/2016",,"etc","E 60"","AD"8"\n

I want to match only" in E 60"and AD"8, but not any other ".

What is the (preferred for Python) regex that I can use to do this?

+4
source share
2 answers

EDIT: Updated with regex from @sundance to avoid line break and new line.

You can try replacing only quotes that are not related to a comma, the beginning of a line, or a new line:

import re

newline = re.sub(r'(?<!^)(?<!,)"(?!,|$)', '', line)
+1
source

Instead of using a regular expression, an approach is used that uses Python string functions to find and exclude only quotes between left and right string quotes.

.find() .rfind() ". ", . , ,, (, '\n' ).

def escape_internal_quotes(item):
    left = item.find('"') + 1
    right = item.rfind('"')
    if left < right:
        # only do the substitution if two surrounding quotes are found
        item = item[:left] + item[left:right].replace('"', '\\"') + item[right:]
    return item

line = '"0","0.23432","234.232342","data here dsfsd hfsdf","3/1/2016",,"etc","E 60"","AD"8"\n'
escaped = [escape_internal_quotes(item) for item in line.split(',')]
print(repr(','.join(escaped)))

:

'"0","0.23432","234.232342","data here dsfsd hfsdf","3/1/2016",,"etc","E 60\\"","AD\\"8"\n'
0

Source: https://habr.com/ru/post/1675665/


All Articles