How can I delete lines from all quoted text fragments in a file?

I exported a CSV file from the database. Some fields are long text fragments and may contain newline characters. What would be the easiest way to remove only newlines from this file that are inside double quotes, but keeping the rest?

I don't care if it uses the Bash command line one liner or a simple script while it works.

For instance,

"Value1", "Value2", "This is a longer piece of text with newlines in it.", "Value3" "Value4", "Value5", "Another value", "value6" 

New lines within a longer piece of text should be deleted, but not a new line separating the two lines.

+4
source share
3 answers

In Python:

 import csv with open("input.csv", "rb") as input, open("output.csv", "wb") as output: w = csv.writer(output) for record in csv.reader(input): w.writerow(tuple(s.remove("\n") for s in record)) 
+6
source

Here is the solution in Python:

 import re pattern = re.compile(r'".*?"', re.DOTALL) print pattern.sub(lambda x: x.group().replace('\n', ''), text) 

See how it works on the Internet: ideone

+7
source

This is very simplistic, but may work for you:

 # cat <<\! | sed ':a;/"$/{P;D};N;s/\n//g;ba' > "Value1", "Value2", "This is a longer piece > of text with > newlines in it.", "Value3" > "Value4", "Value5", "Another value", "value6" > ! "Value1", "Value2", "This is a longer piece of text with newlines in it.", "Value3" "Value4", "Value5", "Another value", "value6" 
+2
source

Source: https://habr.com/ru/post/1382392/


All Articles