Access python submenu for import into pandas DataFrame

Question

Access python submenu for import into pandas DataFrame

I am trying to import fantasy basketball data from yql into the pandas framework, but I am having problems with nested content.

The data from yql (results.rows) looks like this (when I use type (results.rows), I get a list).

{u'display_position': u'PF', u'editorial_player_key': u'nba.p.4175', u'editorial_team_abbr': u'Uta', u'editorial_team_full_name': u'Utah Jazz', u'editorial_team_key': u'nba.t.26', u'eligible_positions': {u'position': u'PF'}, u'headshot': {u'size': u'small', u'url': u'http://l.yimg.com/iu/api/res/1.2/KjAPlP83IIrP9iReWfjyjw--/YXBwaWQ9eXZpZGVvO2NoPTIxNTtjcj0xO2N3PTE2NDtkeD0xO2R5PTE7Zmk9dWxjcm9wO2g9NjA7cT0xMDA7dz00Ng--/http://l.yimg.com/a/i/us/sp/v/nba/players_l/20101116/4175.jpg'}, u'image_url': u'http://l.yimg.com/iu/api/res/1.2/KjAPlP83IIrP9iReWfjyjw--/YXBwaWQ9eXZpZGVvO2NoPTIxNTtjcj0xO2N3PTE2NDtkeD0xO2R5PTE7Zmk9dWxjcm9wO2g9NjA7cT0xMDA7dz00Ng--/http://l.yimg.com/a/i/us/sp/v/nba/players_l/20101116/4175.jpg', u'is_undroppable': u'0', u'name': {u'ascii_first': u'Paul', u'ascii_last': u'Millsap', u'first': u'Paul', u'full': u'Paul Millsap', u'last': u'Millsap'}, u'player_id': u'4175', u'player_key': u'304.p.4175', u'position_type': u'P', u'uniform_number': u'24'}

When i do

 DataFrame(results.rows)

it imports the data in order, however, the data in both the header and the name is imported as columns with their nested lists.

I can access the sublist from iPython, however, when I try to import it into a dataframe, I get an error:

 results[0]['name'] {u'ascii_first': u'Pau', u'ascii_last': u'Gasol', u'first': u'Pau', u'full': u'Pau Gasol', u'last': u'Gasol'} DataFrame([results[0]['name']) ValueError: If use all scalar values, must pass index

The behavior I want is to import nested lists as my own columns, and not as a column containing a nested list. How can i do this?

The end result I would like for a DataFrame with the following layout:

 +---------------------------------------------------------------------------------------+ |display_position | (...) | ascii_first | ascii_last | first | full | last | player_id | +---------------------------------------------------------------------------------------+ | Data | | | | | | | | +---------------------------------------------------------------------------------------+

+4

python pandas

Tom mcmahon Oct 28 '12 at 0:55

source share

1 answer

Ngure nyaga · Accepted Answer · 2012-10-29T09:49:48+0000

You need to “smooth out” the dictionaries contained in results.rows . In your case, results[n] (where n is a zero-based index representing a separate “record”) is a dict that contains nested dicts (for the name and headshot keys).

Smoothing dicts has been discussed in detail in this issue and related issues.

One possible approach:

 import collections def flatten(d, parent_key=''): items = [] for k, v in d.items(): new_key = parent_key + '_' + k if parent_key else k if isinstance(v, collections.MutableMapping): items.extend(flatten(v, new_key).items()) else: items.append((new_key, v)) return dict(items) flattened_records = [flatten(record) for record in results.rows] df = DataFrame(flattened_records)

Note that with this approach, the keys of nested columns will be inferred by concatenating the "parent" key with the key in the nested dict, for example, "name_first", "name_last". You can customize the flatten method to change this.

Here you can use more than one approach. The key insight is that you need to smooth out the dictionaries contained in results.rows .

Access python submenu for import into pandas DataFrame

More articles: