Pandas: add row to DataFrame with multiple indexes in columns

I have a DataFrame with multiple indexes in columns and would like to use dictionaries to add new rows.

Let's say that each row in the DataFrame is a city. The columns contain “distance” and “vehicle”. And each cell will be a percentage of the population that this car selects for this distance.

I create an index like this:

index_tuples=[] for distance in ["near", "far"]: for vehicle in ["bike", "car"]: index_tuples.append([distance, vehicle]) index = pd.MultiIndex.from_tuples(index_tuples, names=["distance", "vehicle"]) 

Then I create a dataframe:

 dataframe = pd.DataFrame(index=["city"], columns = index) 

The data frame structure looks good. Although pandas added Nans as default values?

data block layout

Now I would like to create a dictionary for a new city and add it:

 my_home_city = {"near":{"bike":1, "car":0},"far":{"bike":0, "car":1}} dataframe["my_home_city"] = my_home_city 

But this fails:

ValueError: the length of the values ​​does not match the length of the index

Here is the complete error message (pastebin)

UPDATE:

Thanks for all the good answers. I'm afraid I simplified the problem in my example. In fact, my index is nested in 3 levels (and it can become larger).

So, I accepted the universal answer for translating my dictionary into a list of tuples. It may not be as clean as other approaches, but it works for any multiindex installation.

+5
source share
4 answers

Multi index is a list of tuple , we just need to change your dict , then we could directly assign a value

 d = {(x,y):my_home_city[x][y] for x in my_home_city for y in my_home_city[x]} df.loc['my_home_city',:]=d df Out[994]: distance near far vehicle bike car bike car city NaN NaN NaN NaN my_home_city 1 0 0 1 

Additional Information

 d Out[995]: {('far', 'bike'): 0, ('far', 'car'): 1, ('near', 'bike'): 1, ('near', 'car'): 0} df.columns.values Out[996]: array([('near', 'bike'), ('near', 'car'), ('far', 'bike'), ('far', 'car')], dtype=object) 
+2
source

I don’t think you even need to initialize an empty framework. With d I can get the desired result with unstack and transpose:

 pd.DataFrame(d).unstack().to_frame().T far near bike car bike car 0 0 1 1 0 
+2
source

You can add a data frame to you as follows:

 my_home_city = {"near":{"bike":1, "car":0},"far":{"bike":0, "car":1}} dataframe.append(pd.DataFrame.from_dict(my_home_city).unstack().rename('my_home_city')) 

Output:

 distance near far vehicle bike car bike car city NaN NaN NaN NaN my_home_city 1 0 0 1 

The trick is to create a dataframe row with from_dict , then unstack to get the structure of the original frame with multiple columns, and then rename to get the index and append .

Or, if you do not want to create an empty framework first, you can use this method to create a dataframe with new data.

 pd.DataFrame.from_dict(my_home_city).unstack().rename('my_home_city').to_frame().T 

Output:

  far near bike car bike car my_home_city 0 1 1 0 

Explanations:

 pd.DataFrame.from_dict(my_home_city) far near bike 0 1 car 1 0 

Now let's unlock the creation of a multi-index and move on to this new data structure in the structure of the original frame.

 pd.DataFrame.from_dict(my_home_city).unstack() far bike 0 car 1 near bike 1 car 0 dtype: int64 

We use renaming to give this series a name that becomes the index mark of this row of the data frame when added to the original data frame.

 far bike 0 car 1 near bike 1 car 0 Name: my_home_city, dtype: int64 

Now, if you converted this series into a frame and transfer it, it will be very similar to a new line, however, there is no need to do this because Pandas performs internal data alignment, so adding this series to the data file will automatically align and add a new data record.

 dataframe.append(pd.DataFrame.from_dict(my_home_city).unstack().rename('my_home_city')) distance near far vehicle bike car bike car city NaN NaN NaN NaN my_home_city 1 0 0 1 
+2
source

Initialize your empty framework with MultiIndex.from_product .

 distances = ['near', 'far'] vehicles = ['bike', 'car'] df = pd.DataFrame([], columns=pd.MultiIndex.from_product([distances, vehicles]), index=pd.Index([], name='city')) 

Your dictionary results in a square matrix (distance over the vehicle), so set it aside (which will lead to a series), and then convert it to a dataframe string by calling ( to_frame ) using the appropriate city name and moving the column to the row.

 >>> df.append(pd.DataFrame(my_home_city).unstack().to_frame('my_home_city').T) far near bike car bike car city my_home_city 0 1 1 0 
+1
source

Source: https://habr.com/ru/post/1273448/


All Articles