Merge a DataFrame list in a column?

I'm having trouble merging a DataFrames array into one DataFrame merged into a specific column.

I have a list of DataFrames called data , each data[i] element looks like this:

  Rank Name 2400 1 name1 2401 2 name2 2402 3 name3 2403 4 name4 2404 5 name5 

Each DataFrame contains a Top 5 list for a given month, and the list contains monthly results for the year.

I would like the final merged DataFrame to look like this:

  Rank Name_month1 Name_month2 Name_month3 ... 2400 1 name1 name1 name1 ... 2401 2 name2 name2 name2 ... 2402 3 name3 name3 name3 ... 2403 4 name4 name4 name4 ... 2404 5 name5 name5 name5 ... 

where each column, after the first, corresponds to a monthly rank.

I have no problem combining 2 DataFrames from a list, data :

 pandas.merge(data[0], data[1], on='Rank', suffix=['_month1', '_month2']) 

But when I try to use filter() or the .merge chain, I have problems all the time.

Any thoughts? Thanks!

+6
source share
2 answers

The problem is that during the first merge, you changed the column names (adding suffixes) and there will be no collision of names during the second merge, so the suffixes in the second merge will never be used. The solution is to manually rename the columns after the merge.

 In [2]: df Out[2]: Rank Name 2400 1 name1 2401 2 name2 2402 3 name3 2403 4 name4 2404 5 name5 In [3]: df.merge( df, on='Rank', suffixes=['_month1', '_month2'] ).merge(df, on='Rank').rename( columns={'Name': 'Name_month3'} ).merge(df, on='Rank').rename( columns={'Name': 'Name_month4'} ) Out[3]: Rank Name_month1 Name_month2 Name_month3 Name_month4 0 1 name1 name1 name1 name1 1 2 name2 name2 name2 name2 2 3 name3 name3 name3 name3 3 4 name4 name4 name4 name4 4 5 name5 name5 name5 name5 

If you have a list of DataFrames, just do:

 In [4]: data = [df, df, df, df] current = data[0].rename(columns={'Name': 'Name_month1'}) for i, frame in enumerate(data[1:], 2): current = current.merge(frame, on='Rank').rename( columns={'Name': 'Name_month%d' % i}) current Out[4]: Rank Name_month1 Name_month2 Name_month3 Name_month4 0 1 name1 name1 name1 name1 1 2 name2 name2 name2 name2 2 3 name3 name3 name3 name3 3 4 name4 name4 name4 name4 4 5 name5 name5 name5 name5 
+6
source

I created a Gist containing a function to join a "list" of data. The list is actually a dictionary that contains keys that are suffixes that are used when colliding column names:

Join the list (dict) of pandas dataframes

https://gist.github.com/mpschr/5db20df78c034654f030

0
source

Source: https://habr.com/ru/post/953948/


All Articles