I have file.csv with ~ 15k lines that look like
SAMPLE_TIME, POS, OFF, HISTOGRAM 2015-07-15 16:41:56, 0-0-0-0-3, 1, 2,0,5,59,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0, 2015-07-15 16:42:55, 0-0-0-0-3, 1, 0,0,5,9,0,0,0,0,0,2,0,0,0,50,0, 2015-07-15 16:43:55, 0-0-0-0-3, 1, 0,0,5,5,0,0,0,0,0,2,0,0,0,0,4,0,0,0, 2015-07-15 16:44:56, 0-0-0-0-3, 1, 2,0,5,0,0,0,0,0,0,2,0,0,0,6,0,0,0,0
I wanted it to be imported into pandas.DataFrame with any random value given for a column that has no header, something like this:
SAMPLE_TIME, POS, OFF, HISTOGRAM 1 2 3 4 5 6 2015-07-15 16:41:56, 0-0-0-0-3, 1, 2, 0, 5, 59, 4, 0, 0, 2015-07-15 16:42:55, 0-0-0-0-3, 1, 0, 0, 5, 0, 6, 0, nan 2015-07-15 16:43:55, 0-0-0-0-3, 1, 0, 0, 5, 0, 7, nan nan 2015-07-15 16:44:56, 0-0-0-0-3, 1, 2, 0, 5, 0, 0, 2, nan
It was impossible to import, since I tried another solution, for example, giving a specific header , but still not fun, the only way I could get it to work was to manually add the header to the .csv file. which downplay the goal of automation!
Then I tried this solution : Doing this
lines=list(csv.reader(open('file.csv'))) header, values = lines[0], lines[1:]
it correctly reads files giving me a list of elements ~ 15k values , each element is a list of lines, where each line is a correctly parsed data field from a file, but when I try to do this:
data = {h:v for h,v in zip (header, zip(*values))} df = pd.DataFrame.from_dict(data)
or that:
data2 = {h:v for h,v in zip (str(xrange(16)), zip(*values))} df2 = pd.DataFrame.from_dict(data)
Then the headingless columns disappear and the column order is completely mixed. any idea of a possible solution?