I get such CSV data by making an Http request in a CSV file. Very wrong line

I get such CSV data by making an Http request to the CSV file. Very distorted line.

response = '"Subject";"Start Date";"Start Time";"End Date";"End Time";"All day event";"Description""Play football";"16/11/2009";"10:00 PM";"16/11/2009";"11:00 PM";"false";"""Watch 2012";"20/11/2009";"07:00 PM";"20/11/2009";"08:00 PM";"false";""' 

And I want to convert this to a dictionary list

[{"Subject": "Play football", "Start Date": "16/11/2009", "Start Time": "10:00 PM", "End Date": "16/11/2009", "End Time": "11:00 PM", "All day event", false, "Description": ""},
 {"Subject": "Watch 2012", "Start Date": "20/11/2009", "Start Time": "07:00 PM", "End Date": "20/11/2009", "End Time": "08:00 PM", "All day event", false, "Description": ""}]

I tried to solve this with the python csv module, but did not work.

import csv
from cStringIO import StringIO

>>> str_obj = StringIO(response)
>>> reader = csv.reader(str_obj, delimiter=';')
>>> [x for x in reader] 
    [['Subject',
      'Start Date',
      'Start Time',
      'End Date',
      'End Time',
      'All day event',
      'Description"Play football',
      '16/11/2009',
      '10:00 PM',
      '16/11/2009',
      '11:00 PM',
      'false',
      '"Watch 2012',
      '20/11/2009',
      '07:00 PM',
      '20/11/2009',
      '08:00 PM',
      'false',
      '']]

I get the result.

Any help would be greatly appreciated. Thanks in advance.

+3
source share
4 answers

Here's the pyaring solution:

from pyparsing import QuotedString, Group, delimitedList, OneOrMore

# a row of headings or data is a list of quoted strings, delimited by ';'s
qs = QuotedString('"')
datarow = Group(delimitedList(qs, ';'))

# an entire data set is a single data row containing the headings, followed by
# one or more data rows containing the data
dataset_parser = datarow("headings") + OneOrMore(datarow)("rows")

# parse the returned response
data = dataset_parser.parseString(response)

# create dict by zipping headings with each row data values
datadict = [dict(zip(data.headings, row)) for row in data.rows]

print datadict

Print

[{'End Date': '16/11/2009', 'Description': '', 'All day event': 'false', 
  'Start Time': '10:00 PM', 'End Time': '11:00 PM', 'Start Date': '16/11/2009', 
  'Subject': 'Play football'}, 
 {'End Date': '20/11/2009', 'Description': '', 'All day event': 'false', 
  'Start Time': '07:00 PM', 'End Time': '08:00 PM', 'Start Date': '20/11/2009', 
  'Subject': 'Watch 2012'}]

This will also handle the case if the quoted strings contain embedded semicolons.

+8
source

Here is one approach.

I noticed that there is no separator between the lines. To clear the input, I make a few assumptions:

  • "" - "" ,
  • (: no "")
  • (..: "")
  • "

:

>>> response = '"Subject";"Start Date";"Start Time";"End Date";"End Time";"All day event";"Description""Play football";"16/11/2009";"10:00 PM";"16/11/2009";"11:00 PM";"false";"""Watch 2012";"20/11/2009";"07:00 PM";"";"08:00 PM";"false";"""";"17/11/2009";"9:00 AM";"17/11/2009";"10:00 AM";"false";""'    

,

  • " " " 2012" .
  • ""

" ", .

" (|) ", :

>>> response.replace('""', '|').replace('"', '')
'Subject;Start Date;Start Time;End Date;End Time;All day event;Description|Play football;16/11/2009;10:00 PM;16/11/2009;11:00 PM;false;|Watch 2012;20/11/2009;07:00 PM;|;08:00 PM;false;||;17/11/2009;9:00 AM;17/11/2009;10:00 AM;false;|'

(: Watch 2012 End Date), : ;|; - :

>>> response.replace('""', '|').replace('"', '').replace(';|;', ';;')
'Subject;Start Date;Start Time;End Date;End Time;All day event;Description|Play football;16/11/2009;10:00 PM;16/11/2009;11:00 PM;false;|Watch 2012;20/11/2009;07:00 PM;;08:00 PM;false;||;17/11/2009;9:00 AM;17/11/2009;10:00 AM;false;|'

| . , |?

>>> response.replace('""', '|').replace('"', '').replace(';|;', ';;').split('|')
['Subject;Start Date;Start Time;End Date;End Time;All day event;Description',
 'Play football;16/11/2009;10:00 PM;16/11/2009;11:00 PM;false;',
 'Watch 2012;20/11/2009;07:00 PM;;08:00 PM;false;',
 '',
 ';17/11/2009;9:00 AM;17/11/2009;10:00 AM;false;',
 '']

, - . ; , ''. , | , :

>>> "a|b||c".split('|')
['a', 'b', '', 'c']

:

>>> "a||b|c|".split('|')
['a', '', 'b', 'c', '']

, :

>>> rows = [row for row in response.replace('""', '|').replace('"', '').replace(';|;', ';;').split('|') if row]
>>> rows
['Subject;Start Date;Start Time;End Date;End Time;All day event;Description',
 'Play football;16/11/2009;10:00 PM;16/11/2009;11:00 PM;false;',
 'Watch 2012;20/11/2009;07:00 PM;;08:00 PM;false;',
 ';17/11/2009;9:00 AM;17/11/2009;10:00 AM;false;']

; . -, :

>>> dict_keys = rows[0].split(';')
>>> dict_keys
['Subject',
 'Start Date',
 'Start Time',
 'End Date',
 'End Time',
 'All day event',
 'Description']

, :

>>> import itertools
>>> events = []
>>> for row in rows[1:]:
...     d = {}
...     for k, v in itertools.izip(dict_keys, row.split(';')):
...         d[k] = v
...     events.append(d)
... 
>>> events
[{'All day event': 'false',
  'Description': '',
  'End Date': '16/11/2009',
  'End Time': '11:00 PM',
  'Start Date': '16/11/2009',
  'Start Time': '10:00 PM',
  'Subject': 'Play football'},
 {'All day event': 'false',
  'Description': '',
  'End Date': '',
  'End Time': '08:00 PM',
  'Start Date': '20/11/2009',
  'Start Time': '07:00 PM',
  'Subject': 'Watch 2012'},
 {'All day event': 'false',
  'Description': '',
  'End Date': '17/11/2009',
  'End Time': '10:00 AM',
  'Start Date': '17/11/2009',
  'Start Time': '9:00 AM',
  'Subject': ''}]

, !

:

  • , | , ; .
  • (: "": "" 2012 ")
  • " " boolean : D
+3

, .

. .

, ....

+2
response = response.split(';') # split it into words
response = [w[1:-1] for w in response] # strip off the quotes 
response = [w.replace('""','"\n"') for w in response] # add in the newlines
response = ['"%s"'%w for w in response] # add the quotes back
response = ';'.join(response) 

But that will not work if you have a ";" character in the data that should have been escaped. First of all, you should find what happened to the missing newline characters.

+1
source

Source: https://habr.com/ru/post/1722993/


All Articles