Reading rich text using python

Question

Reading rich text using python

I would like to use python to read and write files in the following format:

#h -F, field1 field2 field3 a,b,c d,e,f # some comments g,h,i

This file is very similar to a typical CSV, except for the following:

The title bar starts with C # h
The second element of the title bar is the tag for the delimiter.
The remaining header elements are field names (always separated by a single space)
Comment lines always begin with C # and can be scattered throughout the file.

Is it possible to use csv.DictReader () and csv.DictWriter () to read and write these files?

+6

python csv

Dave Feb 07 '12 at 14:48

source share

2 answers

Suppose the input file is open as input . First read in the header:

 header = input.readline()

Parse the separator and field names and use them to build the DictReader . Now, instead of input enter the expression

 (ln for ln in input where ln[0] != '#')

to skip comments.

0

Fred foo Feb 07 '12 at 14:56

source share

unutbu · Accepted Answer · 2012-02-07T14:55:01+0000

You can parse the first line separately to find the separator and field names:

  firstline = next(f).split() delimiter = firstline[1][-1] fields = firstline[2:]

Please note that csv.DictReader can take any iterative value as the first argument. Therefore, to skip comments, you can wrap f in an iterator ( skip_comments ), which gives only lines without comments:

 import csv def skip_comments(iterable): for line in iterable: if not line.startswith('#'): yield line with open('data.csv','rb') as f: firstline = next(f).split() delimiter = firstline[1][-1] fields = firstline[2:] for line in csv.DictReader(skip_comments(f), delimiter = delimiter, fieldnames = fields): print line

The data you enter gives

 {'field2': 'b', 'field3': 'c', 'field1': 'a'} {'field2': 'e', 'field3': 'f', 'field1': 'd'} {'field2': 'h', 'field3': 'i', 'field1': 'g'}

To write a file in this format, you can use the header helper function:

 def header(delimiter,fields): return '#h -F{d} {f}\n'.format(d = delimiter, f=' '.join(fields)) with open('data.csv', 'rb') as f: with open('output.csv', 'wb') as g: firstline = next(f).split() delimiter = firstline[1][-1] fields = firstline[2:] writer = csv.DictWriter(g, delimiter = delimiter, fieldnames = fields) g.write(header(delimiter,fields)) for row in csv.DictReader(skip_comments(f), delimiter = delimiter, fieldnames = fields): writer.writerow(row) g.write('# comment\n')

Note that you can write output.csv using g.write (for title lines or comments) or writer.writerow (for csv).

Reading rich text using python

More articles: