Combining multiple Dataframes where some rows do not match

therefore, I have 5 frames of data from the 80 best players taken from FIFA 13-17, each of which contains the player’s name, rating and club. My ultimate goal is to bring all of these datasets together so that I can have a rating for each player every year and a null value if they don't. Obviously, some players are not in the top 80 every year, that is: retirement. Heres a snippet of three data blocks.

FIFA18

Name Overall Club 0 Cristiano Ronaldo 94 Real Madrid CF 1 L. Messi 93 FC Barcelona 2 Neymar 92 FC Barcelona 3 L. Suárez 92 FC Barcelona 4 M. Neuer 92 FC Bayern Munich 5 De Gea 90 Manchester United 6 R. Lewandowski 90 FC Bayern Munich 7 J. Boateng 90 FC Bayern Munich 8 G. Bale 90 Real Madrid CF 9 Z. Ibrahimović 90 Manchester United 10 T. Courtois 89 Chelsea

FIFA13

Name Overall Club 0 L. Messi 94 FC Barcelona 1 Cristiano Ronaldo 92 Real Madrid CF 2 F. Ribéry 90 FC Bayern Munich 3 Xavi 90 FC Barcelona 4 Iniesta 90 FC Barcelona 5 N. Vidić 89 Manchester United 6 W. Rooney 89 Manchester United 7 Casillas 89 Real Madrid CF 8 David Silva 88 Manchester City 9 Falcao 88 Atlético Madrid 10 Z. Ibrahimović 88 Paris Saint-Germain

An example of where this happens can be with N. Vidic, who has since retired.

My goal table will be this

Name FIFA17 FIA13 Club 0 Cristiano Ronaldo 94 92 Real Madrid CF 1 L. Messi 93 94 FC Barcelona 2 Neymar 92 83 FC Barcelona 3 L. Suárez 92 86 FC Barcelona 4 M. Neuer 92 87 FC Bayern Munich 5 De Gea 90 82 Manchester United 6 R. Lewandowski 90 80 FC Bayern Munich 7 J. Boateng 90 84 FC Bayern Munich 8 G. Bale 90 86 Real Madrid CF 9 Z. Ibrahimović 90 88 Manchester United 10 T. Courtois 89 83 Chelsea 11 F. Ribéry 86 90 FC Bayern Munich 12 Xavi 0 90 FC Barcelona 13 Iniesta 88 90 FC Barcelona 14 N. Vidić 0 89 Manchester United 15 W. Rooney 0 89 Manchester United 16 Casillas 0 89 Real Madrid CF 17 David Silva 87 88 Manchester City 18 Falcao 0 88 Atlético Madrid

I am new to python and pandas, but I tried using join and merge, but it always seems to use the index of each table, not the unique names.

!

+4
2

: pd.concat pivot_table. , , .

, .

dfs = {13: df13, 18: df18}

df = pd.concat([dfs[k].assign(Year=k) for k in dfs])

club_map = df.sort_values('Year', ascending=False)\
             .drop_duplicates('Name')\
             .set_index('Name')['Club']

df['Club'] = df['Name'].map(club_map)

res = df.pivot_table(index=['Name', 'Club'], columns='Year',
                     values='Overall', aggfunc=np.sum, fill_value=0)\
        .reset_index().rename_axis(None, axis='columns')

                 Name               Club  13  18
0            Casillas     Real Madrid CF  89   0
1   Cristiano Ronaldo     Real Madrid CF  92  94
2         David Silva    Manchester City  88   0
3              De Gea  Manchester United   0  90
4           F. Ribéry   FC Bayern Munich  90   0
5              Falcao    Atlético Madrid  88   0
6             G. Bale     Real Madrid CF   0  90
7             Iniesta       FC Barcelona  90   0
8          J. Boateng   FC Bayern Munich   0  90
9            L. Messi       FC Barcelona  94  93
10          L. Suárez       FC Barcelona   0  92
11           M. Neuer   FC Bayern Munich   0  92
12           N. Vidić  Manchester United  89   0
13             Neymar       FC Barcelona   0  92
14     R. Lewandowski   FC Bayern Munich   0  90
15        T. Courtois            Chelsea   0  89
16          W. Rooney  Manchester United  89   0
17               Xavi       FC Barcelona  90   0
18     Z. Ibrahimović  Manchester United  88  90
+3

set_index MultiIndex concat, NaN fillna, integer MultiIndex reset_index:

s1 = df1.drop_duplicates(['Name','Club']).set_index(['Name','Club'])['Overall']
s2 = df2.drop_duplicates(['Name','Club']).set_index(['Name','Club'])['Overall']
df = pd.concat([s2, s1], axis=1, keys=('FIFA13','FIFA18')).fillna(0).astype(int).reset_index()
print (df)
                 Name                 Club  FIFA13  FIFA18
0            Casillas       Real Madrid CF      89       0
1   Cristiano Ronaldo       Real Madrid CF      92      94
2         David Silva      Manchester City      88       0
3              De Gea    Manchester United       0      90
4           F. Ribéry     FC Bayern Munich      90       0
5              Falcao      Atlético Madrid      88       0
6             G. Bale       Real Madrid CF       0      90
7             Iniesta         FC Barcelona      90       0
8          J. Boateng     FC Bayern Munich       0      90
9            L. Messi         FC Barcelona      94      93
10          L. Suárez         FC Barcelona       0      92
11           M. Neuer     FC Bayern Munich       0      92
12           N. Vidić    Manchester United      89       0
13             Neymar         FC Barcelona       0      92
14     R. Lewandowski     FC Bayern Munich       0      90
15        T. Courtois             Chelsean       0      89
16          W. Rooney    Manchester United      89       0
17               Xavi         FC Barcelona      90       0
18     Z. Ibrahimović    Manchester United       0      90
19     Z. Ibrahimović  Paris Saint-Germain      88       0

, , Names Club, drop_duplicates reindex:

s1 = df1.drop_duplicates(['Name','Club']).set_index(['Name','Club'])['Overall']
s2 = df2.drop_duplicates(['Name','Club']).set_index(['Name','Club'])['Overall']
df = pd.concat([s2, s1], axis=1, keys=('FIFA13','FIFA18')).fillna(0).astype(int)

idx = pd.concat([df1[['Name','Club']], df2[['Name','Club']]]).drop_duplicates()
df = df.reindex(idx).reset_index().drop_duplicates('Name', keep='last')
print (df)
                 Name               Club  FIFA13  FIFA18
0            L. Messi       FC Barcelona      94      93
1   Cristiano Ronaldo     Real Madrid CF      92      94
2           F. Ribéry   FC Bayern Munich      90       0
3                Xavi       FC Barcelona      90       0
4             Iniesta       FC Barcelona      90       0
5            N. Vidić  Manchester United      89       0
6           W. Rooney  Manchester United      89       0
7            Casillas     Real Madrid CF      89       0
8         David Silva    Manchester City      88       0
9              Falcao    Atlético Madrid      88       0
11             Neymar       FC Barcelona       0      92
12          L. Suárez       FC Barcelona       0      92
13           M. Neuer   FC Bayern Munich       0      92
14             De Gea  Manchester United       0      90
15     R. Lewandowski   FC Bayern Munich       0      90
16         J. Boateng   FC Bayern Munich       0      90
17            G. Bale     Real Madrid CF       0      90
18     Z. Ibrahimović  Manchester United       0      90
19        T. Courtois           Chelsean       0      89

list comprehension s:

dfs = [df2, df1]
names= ['FIFA13','FIFA18']
s = [x.drop_duplicates(['Name','Club']).set_index(['Name','Club'])['Overall'] for x in dfs]
df = pd.concat(s, axis=1, keys=(names)).fillna(0).astype(int)
s1 = [x[['Name','Club']] for x in dfs]
idx = pd.concat(s1).drop_duplicates()
df = df.reindex(idx).reset_index().drop_duplicates('Name', keep='last')
+2

Source: https://habr.com/ru/post/1694964/


All Articles