Reading rich text using python

I would like to use python to read and write files in the following format:

#h -F, field1 field2 field3 a,b,c d,e,f # some comments g,h,i 

This file is very similar to a typical CSV, except for the following:

  • The title bar starts with C # h
  • The second element of the title bar is the tag for the delimiter.
  • The remaining header elements are field names (always separated by a single space)
  • Comment lines always begin with C # and can be scattered throughout the file.

Is it possible to use csv.DictReader () and csv.DictWriter () to read and write these files?

+6
source share
2 answers

You can parse the first line separately to find the separator and field names:

  firstline = next(f).split() delimiter = firstline[1][-1] fields = firstline[2:] 

Please note that csv.DictReader can take any iterative value as the first argument. Therefore, to skip comments, you can wrap f in an iterator ( skip_comments ), which gives only lines without comments:

 import csv def skip_comments(iterable): for line in iterable: if not line.startswith('#'): yield line with open('data.csv','rb') as f: firstline = next(f).split() delimiter = firstline[1][-1] fields = firstline[2:] for line in csv.DictReader(skip_comments(f), delimiter = delimiter, fieldnames = fields): print line 

The data you enter gives

 {'field2': 'b', 'field3': 'c', 'field1': 'a'} {'field2': 'e', 'field3': 'f', 'field1': 'd'} {'field2': 'h', 'field3': 'i', 'field1': 'g'} 

To write a file in this format, you can use the header helper function:

 def header(delimiter,fields): return '#h -F{d} {f}\n'.format(d = delimiter, f=' '.join(fields)) with open('data.csv', 'rb') as f: with open('output.csv', 'wb') as g: firstline = next(f).split() delimiter = firstline[1][-1] fields = firstline[2:] writer = csv.DictWriter(g, delimiter = delimiter, fieldnames = fields) g.write(header(delimiter,fields)) for row in csv.DictReader(skip_comments(f), delimiter = delimiter, fieldnames = fields): writer.writerow(row) g.write('# comment\n') 

Note that you can write output.csv using g.write (for title lines or comments) or writer.writerow (for csv).

+8
source

Suppose the input file is open as input . First read in the header:

 header = input.readline() 

Parse the separator and field names and use them to build the DictReader . Now, instead of input enter the expression

 (ln for ln in input where ln[0] != '#') 

to skip comments.

0
source

Source: https://habr.com/ru/post/907854/


All Articles