Storing table columns in a Python dictionary

I have a table stored in an Excel file as follows:

  Species Garden Hedgerow Parkland Pasture Woodland
 Blackbird 47 10 40 2 2
 Chaffinch 19 3 5 0 2
 Great Tit 50 0 10 7 0
 House Sparrow 46 16 8 4 0
 Robin 9 3 0 0 2
 Song Thrush 4 0 6 0 0

I am using the Python xlrd library to read this data. I have no problem reading it into a list of lists (with each row of the table stored as a list) using the following code:

 from xlrd import open_workbook wb = open_workbook("Sample.xls") headers = [] sdata = [] for s in wb.sheets(): print "Sheet:",s.name if s.name.capitalize() == "Data": for row in range(s.nrows): values = [] for col in range(s.ncols): data = s.cell(row,col).value if row == 0: headers.append(data) else: values.append(data) sdata.append(values) 

As you can see, headers is a simple list that stores column headers, and sdata contains table data stored as a list of lists. Here is what they are looking at:

headers:

 [u'Species', u'Garden', u'Hedgerow', u'Parkland', u'Pasture', u'Woodland'] 

SData:

 [[u'Blackbird', 47.0, 10.0, 40.0, 2.0, 2.0], [u'Chaffinch', 19.0, 3.0, 5.0, 0.0, 2.0], [u'Great Tit', 50.0, 0.0, 10.0, 7.0, 0.0], [u'House Sparrow', 46.0, 16.0, 8.0, 4.0, 0.0], [u'Robin', 9.0, 3.0, 0.0, 0.0, 2.0], [u'Song Thrush', 4.0, 0.0, 6.0, 0.0, 0.0]] 

But I want to store this data in a Python dictionary, with each column being the key to a list containing all the values ​​for each column. For example (only part of the data is shown to save space):

 dict = { 'Species': ['Blackbird','Chaffinch','Great Tit'], 'Garden': [47,19,50], 'Hedgerow': [10,3,0], 'Parkland': [40,5,10], 'Pasture': [2,0,7], 'Woodland': [2,2,0] } 

So my question is: how can I achieve this? I know that I could read data column by column rather than row, as in the code snippet above, but I could not figure out how to store the columns in the dictionary.

Thanks in advance for any help you can provide.

+6
source share
5 answers

Once you have the columns, this is pretty simple:

 dict(zip(headers, sdata)) 

Actually, it looks like sdata in your example can be row data, even if it's still pretty easy, you can also transpose the table using zip :

 dict(zip(headers, zip(*sdata))) 

One of these two is what you ask for.

+2
source

1. XLRD

I highly recommend using defaultdict from the collections library. The value of each key will start with the default value, in which case it will be empty. I did not put a special exception in this exception, you might want to add exception detection based on your use case.

 import xlrd import sys from collections import defaultdict result = defaultdict(list) workbook = xlrd.open_workbook("/Users/datafireball/Desktop/stackoverflow.xlsx") worksheet = workbook.sheet_by_name(workbook.sheet_names()[0]) headers = worksheet.row(0) for index in range(worksheet.nrows)[1:]: try: for header, col in zip(headers, worksheet.row(index)): result[header.value].append(col.value) except: print sys.exc_info() print result 

Output:

 defaultdict(<type 'list'>, {u'Garden': [47.0, 19.0, 50.0, 46.0, 9.0, 4.0], u'Parkland': [40.0, 5.0, 10.0, 8.0, 0.0, 6.0], u'Woodland': [2.0, 2.0, 0.0, 0.0, 2.0, 0.0], u'Hedgerow': [10.0, 3.0, 0.0, 16.0, 3.0, 0.0], u'Pasture': [2.0, 0.0, 7.0, 4.0, 0.0, 0.0], u'Species': [u'Blackbird', u'Chaffinch', u'Great Tit', u'House Sparrow', u'Robin', u'Song Thrush']}) 

2. Pandas

 import pandas as pd xl = pd.ExcelFile("/Users/datafireball/Desktop/stackoverflow.xlsx") df = xl.parse(xl.sheet_names[0]) print df 

The conclusion is, and you cannot imagine what kind of flexibility you can get using the data framework.

  Species Garden Hedgerow Parkland Pasture Woodland 0 Blackbird 47 10 40 2 2 1 Chaffinch 19 3 5 0 2 2 Great Tit 50 0 10 7 0 3 House Sparrow 46 16 8 4 0 4 Robin 9 3 0 0 2 5 Song Thrush 4 0 6 0 0 
+3
source

I will do my part by providing another answer to my own question!

Right after I posted my question, I discovered pyexcel - a rather small Python library that acts as a wrapper for another table of processing packages (namely, xlrd and odfpy). It has a nice to_dict method that does exactly what I want (even without having to rearrange the table)!

Here is an example using the above data:

 from pyexcel import SeriesReader from pyexcel.utils import to_dict sheet = SeriesReader("Sample.xls") print sheet.series() #--- just the headers, stored in a list data = to_dict(sheet) print data #--- the full dataset, stored in a dictionary 

Output:

 u'Species', u'Garden', u'Hedgerow', u'Parkland', u'Pasture', u'Woodland'] {u'Garden': [47.0, 19.0, 50.0, 46.0, 9.0, 4.0], u'Hedgerow': [10.0, 3.0, 0.0, 16.0, 3.0, 0.0], u'Pasture': [2.0, 0.0, 7.0, 4.0, 0.0, 0.0], u'Parkland': [40.0, 5.0, 10.0, 8.0, 0.0, 6.0], u'Woodland': [2.0, 2.0, 0.0, 0.0, 2.0, 0.0], u'Species': [u'Blackbird', u'Chaffinch', u'Great Tit', u'House Sparrow', u'Robin', u'Song Thrush']} 

Hope this also helps!

+2
source

If XLRD does not solve your problem, consider XLWings . One example video shows how to receive data from an Excel spreadsheet and import it into the Pandas framework, which would be more convenient than a dictionary.

If you really need a dictionary, Pandas can easily convert to it, see here .

+1
source

This script allows you to convert excel data to dictionnary list

 import xlrd workbook = xlrd.open_workbook('Sample.xls') workbook = xlrd.open_workbook('Sample.xls', on_demand = True) worksheet = workbook.sheet_by_index(0) first_row = [] # The row where we stock names of columns for col in range(worksheet.ncols): first_row.append( worksheet.cell_value(0,col) ) # tronsform the workbook to a list of dictionnary data =[] for row in range(1, worksheet.nrows): elm = {} for col in range(worksheet.ncols): elm[first_row[col]]=worksheet.cell_value(row,col) data.append(elm) print data 
+1
source

Source: https://habr.com/ru/post/976535/


All Articles