Access python submenu for import into pandas DataFrame

I am trying to import fantasy basketball data from yql into the pandas framework, but I am having problems with nested content.

The data from yql (results.rows) looks like this (when I use type (results.rows), I get a list).

{u'display_position': u'PF', u'editorial_player_key': u'nba.p.4175', u'editorial_team_abbr': u'Uta', u'editorial_team_full_name': u'Utah Jazz', u'editorial_team_key': u'nba.t.26', u'eligible_positions': {u'position': u'PF'}, u'headshot': {u'size': u'small', u'url': u'http://l.yimg.com/iu/api/res/1.2/KjAPlP83IIrP9iReWfjyjw--/YXBwaWQ9eXZpZGVvO2NoPTIxNTtjcj0xO2N3PTE2NDtkeD0xO2R5PTE7Zmk9dWxjcm9wO2g9NjA7cT0xMDA7dz00Ng--/http://l.yimg.com/a/i/us/sp/v/nba/players_l/20101116/4175.jpg'}, u'image_url': u'http://l.yimg.com/iu/api/res/1.2/KjAPlP83IIrP9iReWfjyjw--/YXBwaWQ9eXZpZGVvO2NoPTIxNTtjcj0xO2N3PTE2NDtkeD0xO2R5PTE7Zmk9dWxjcm9wO2g9NjA7cT0xMDA7dz00Ng--/http://l.yimg.com/a/i/us/sp/v/nba/players_l/20101116/4175.jpg', u'is_undroppable': u'0', u'name': {u'ascii_first': u'Paul', u'ascii_last': u'Millsap', u'first': u'Paul', u'full': u'Paul Millsap', u'last': u'Millsap'}, u'player_id': u'4175', u'player_key': u'304.p.4175', u'position_type': u'P', u'uniform_number': u'24'} 

When i do

 DataFrame(results.rows) 

it imports the data in order, however, the data in both the header and the name is imported as columns with their nested lists.

I can access the sublist from iPython, however, when I try to import it into a dataframe, I get an error:

 results[0]['name'] {u'ascii_first': u'Pau', u'ascii_last': u'Gasol', u'first': u'Pau', u'full': u'Pau Gasol', u'last': u'Gasol'} DataFrame([results[0]['name']) ValueError: If use all scalar values, must pass index 

The behavior I want is to import nested lists as my own columns, and not as a column containing a nested list. How can i do this?

The end result I would like for a DataFrame with the following layout:

 +---------------------------------------------------------------------------------------+ |display_position | (...) | ascii_first | ascii_last | first | full | last | player_id | +---------------------------------------------------------------------------------------+ | Data | | | | | | | | +---------------------------------------------------------------------------------------+ 
+4
source share
1 answer

You need to โ€œsmooth outโ€ the dictionaries contained in results.rows . In your case, results[n] (where n is a zero-based index representing a separate โ€œrecordโ€) is a dict that contains nested dicts (for the name and headshot keys).

Smoothing dicts has been discussed in detail in this issue and related issues.

One possible approach:

 import collections def flatten(d, parent_key=''): items = [] for k, v in d.items(): new_key = parent_key + '_' + k if parent_key else k if isinstance(v, collections.MutableMapping): items.extend(flatten(v, new_key).items()) else: items.append((new_key, v)) return dict(items) flattened_records = [flatten(record) for record in results.rows] df = DataFrame(flattened_records) 

Note that with this approach, the keys of nested columns will be inferred by concatenating the "parent" key with the key in the nested dict, for example, "name_first", "name_last". You can customize the flatten method to change this.

Here you can use more than one approach. The key insight is that you need to smooth out the dictionaries contained in results.rows .

+2
source

Source: https://habr.com/ru/post/1442559/


All Articles