For example, I have two data frames
df1:
0 1 2 3
0 Name Unit Attribute Date
1 a A xxy xxx
2 b B xyx xxx
3 c C xxx xxx
4 d D yxx xxx
5 e E yxy yyy
df2:
0 1 2
Name Unit Date
0 a F xxx
1 b G xxx
2 e H xxx
3 f I xxx
I want to overwrite entries in df1 with their corresponding entries in df2.
For example, replace df1.loc [5,3] with df2.loc [2,2]. That is, for rows with the same "name", rewrite df1 in the same column if it is in df2.
I'm currently doing this stupidly:
def find_column_num(key, df_name, start_row, stop_row, start_column, stop_column):
for i in range(start_row,stop_row+1):
for j in range(start_column, stop_column+1):
if df_name.loc[i,j]== key:
column_num_with_key = j
return column_num_with_key
break
for i in range(0,len(df1.index)):
for ii in range(0,len(df2.index)):
if df1.loc[i,0] == df2.loc[ii,0]:
for j in range(0,len(df1.columns)):
if df1.loc[0,j] in df2.loc[0,:]:
df1.set_value(i,j, df2.loc[ii,find_column_num(df1.loc[0,j],df2,0,0,0,len(df2.columns))]
I am not proud of it. I did a little research and came up with replacing '=' with set_value (), which helps. I sincerely hope to hear other suggestions. The size of the actual problem is 200 rows and 30 columns. Thus, going through all the for loops takes 20 seconds.