Fetching a string between quotation marks separated by multiple lines in Python

I have a file containing several entries. Each entry has the following form:

"field1","field2","field3","field4","field5" 

All fields, as a rule, do not contain quotes, however they may contain,. The problem is that field4 can be split across multiple lines. Thus, an example file might look like this:

 "john","male US","done","Some sample text across multiple lines. There can be many lines of this","foo bar baz" "jane","female UK","done","fields can have , in them","abc xyz" 

I want to extract fields using Python. If the field were not divided into several lines, it would be simple: Extract the line from quotes . But I cannot find an easy way to do this in the presence of multi-line fields.

EDIT: Actually five fields. Sorry about the confusion, if any. The question has been edited to reflect this.

+4
source share
4 answers

I think the csv module can solve this problem. It decomposes correctly with newline characters:

 import csv f = open('infile', newline='') reader = csv.reader(f) for row in reader: for field in row: print('-- {}'.format(field)) 

This gives:

 -- john -- male US -- done -- Some sample text across multiple lines. There can be many lines of this -- foo bar baz -- jane -- female UK -- done -- fields can have , in them -- abc xyz 
+4
source

The answer from a question related to you worked for me:

 import re f = open("test.txt") text = f.read() string_list = re.findall('"([^"]*"', text) 

At this point, string_list contains your strings. Now these lines may have line breaks in them, but you can use

 new_string = string_list.replace("\n", " ") 

to clear it.

+1
source

Try:

 awk '{FS=','} /pattern if needed/{print $0}' fname 
0
source

If you control the entrance to this file, you need to sanitize it first, replacing \n with something ([\ n]?) Before putting the values ​​in a comma-separated list.

Or instead of saving strings - save them as r-strings.

Then use the csv module to parse it using predefined delimiters, coding and quoting

0
source

Source: https://habr.com/ru/post/1499967/


All Articles