Join multiple data frames in a column

I am trying to merge / merge several Dataframe, and still have no luck. I found a method merge, but only works with two Dataframes. I also found this SO answer , suggesting to do something like this:

df1.merge(df2,on='name').merge(df3,on='name')

Unfortunately, in my case, this will not work, because I have 20 + the number of frames.

My next idea was to use join. According to the link, when combining several data frames, I need to use a list, and only I can join the index column. Therefore, I changed the indexes for all columns (well, this can be done grammatically easily) and in the end I got something like this:

df.join([df1,df2,df3])

Unfortunately, this approach also failed, because the other column names are the same in all data files. I decided to do the latter, which renames all columns. But when I finally joined everything: df = pd.Dataframe () df.join ([df1, df2, DF3])

I got an empty framework. I no longer know how I can join them. Can anyone suggest anything else?

EDIT1:

Input Example:

import pandas as pd

df1 = pd.DataFrame(np.array([
    ['a', 5, 19],
    ['b', 14, 16],
    ['c', 4, 9]]),
    columns=['name', 'attr1', 'attr2'])
df2 = pd.DataFrame(np.array([
    ['a', 15, 49],
    ['b', 4, 36],
    ['c', 14, 9]]),
    columns=['name', 'attr1', 'attr2'])

df1 
  name attr1 attr2
0    a     5    19
1    b    14    16
2    c     4     9

df2
  name attr1 attr2
0    a    15    49
1    b     4    36
2    c    14     9

Expected Result:

df
  name attr1_1 attr2_1 attr1_2 attr2_2
0    a     5    19      15      49
1    b    14    16      4       36
2    c     4     9      14      9

Indexes may be unordered between data files, but it is guaranteed that they will exist.

+4
source share
3 answers

use pd.concat

dflist = [df1, df2]
keys = ["%d" % i for i in range(1, len(dflist) + 1)]

merged = pd.concat([df.set_index('name') for df in dflist], axis=1, keys=keys)
merged.columns = merged.swaplevel(0, 1, 1).columns.to_series().str.join('_')

merged

enter image description here

or

merged.reset_index()

enter image description here

+8
source

use the abbreviation:

def my_merge(df1, df2):
    return df1.merge(df2,on='name')

final_df = reduce(my_merge, df_list)

given that df_list is a list of your data frames

+1
source

@piRSquared 20 + , . script 20 + :

N = 25
dflist = []

for d in range(N):
    df = pd.DataFrame(np.random.rand(3,2))
    df.columns = ['attr1', 'attr2']

    df['name'] = ['a', 'b', 'c']

    dflist.append(df)
-1

Source: https://habr.com/ru/post/1648424/


All Articles