Pandas join / merge / concat two DataFrames and merge rows of the same key / index

I am trying to combine two datasets, but I cannot figure out which method is most suitable (join, merge, concat, etc.) for this application, and there are no examples in the documentation that do what I need to do.

I have two data sets structured like this:

>>> A
Time     Voltage
1.0      5.1
2.0      5.5
3.0      5.3
4.0      5.4
5.0      5.0

>>> B
Time     Current
-1.0     0.5
0.0      0.6
1.0      0.3
2.0      0.4
3.0      0.7

I would like to combine the data columns and combine the "Time" column together to get the following:

>>> AB
Time     Voltage     Current
-1.0                 0.5
0.0                  0.6
1.0      5.1         0.3
2.0      5.5         0.4
3.0      5.3         0.7
4.0      5.4            
5.0      5.0            

I tried AB = merge_ordered(A, B, on='Time', how='outer'), and while it successfully combined the data, it outputs something similar to:

>>> AB
Time     Voltage     Current
-1.0                 0.5
0.0                  0.6
1.0      5.1            
1.0                  0.3
2.0      5.5            
2.0                  0.4
3.0      5.3            
3.0                  0.7
4.0      5.4            
5.0      5.0            

You will notice that he did not join the lines with the common values ​​of "Time".

I also tried to merge a la AB = A.merge(B, on='Time', how='outer'), but it outputs something combined, but not sorted, for example:

>>> AB
Time     Voltage     Current
-1.0                 0.5
0.0                  0.6
1.0      5.1            
2.0      5.5            
3.0      5.3         0.7
4.0      5.4            
5.0      5.0            
1.0                  0.3
2.0                  0.4

... "Current" , . .

AB = pandas.concat(A, B, axis=1), . , , DataFrames, :

>>> AB
Time     Voltage     Time     Current
1.0      5.1         -1.0     0.5
2.0      5.5         0.0      0.6
3.0      5.3         1.0      0.3
4.0      5.4         2.0      0.4
5.0      5.0         3.0      0.7

, merge join, , , . , , " , /". - , ? Pandas!

+4
3

merge
merge . . , . Time.

A.merge(B, 'outer', 'Time')

   Time  Voltage  Current
0   1.0      5.1      0.3
1   2.0      5.5      0.4
2   3.0      5.3      0.7
3   4.0      5.4      NaN
4   5.0      5.0      NaN
5  -1.0      NaN      0.5
6   0.0      NaN      0.6

join
join , . Time.

A.join(B.set_index('Time'), 'Time', 'outer')

   Time  Voltage  Current
0   1.0      5.1      0.3
1   2.0      5.5      0.4
2   3.0      5.3      0.7
3   4.0      5.4      NaN
4   5.0      5.0      NaN
4  -1.0      NaN      0.5
4   0.0      NaN      0.6    ​

pd.concat
concat ... , . [A, B]. , d, , for d in [A, B]. axis=1 , .

pd.concat([d.set_index('Time') for d in [A, B]], axis=1).reset_index()

   Time  Voltage  Current
0  -1.0      NaN      0.5
1   0.0      NaN      0.6
2   1.0      5.1      0.3
3   2.0      5.5      0.4
4   3.0      5.3      0.7
5   4.0      5.4      NaN
6   5.0      5.0      NaN

combine_first

A.set_index('Time').combine_first(B.set_index('Time')).reset_index()

   Time  Current  Voltage
0  -1.0      0.5      NaN
1   0.0      0.6      NaN
2   1.0      0.3      5.1
3   2.0      0.4      5.5
4   3.0      0.7      5.3
5   4.0      NaN      5.4
6   5.0      NaN      5.0
+4

, Time dtype DF:

In [192]: A.merge(B, how='outer').sort_values('Time')
Out[192]:
   Time  Voltage  Current
5  -1.0      NaN      0.5
6   0.0      NaN      0.6
0   1.0      5.1      0.3
1   2.0      5.5      0.4
2   3.0      5.3      0.7
3   4.0      5.4      NaN
4   5.0      5.0      NaN

In [193]: A.dtypes
Out[193]:
Time       float64
Voltage    float64
dtype: object

In [194]: B.dtypes
Out[194]:
Time       float64
Current    float64
dtype: object

:

In [198]: A.merge(B.assign(Time=B.Time.astype(str)), how='outer').sort_values('Time')
Out[198]:
   Time  Voltage  Current
5  -1.0      NaN      0.5
6   0.0      NaN      0.6
0   1.0      5.1      NaN
7   1.0      NaN      0.3
1   2.0      5.5      NaN
8   2.0      NaN      0.4
2   3.0      5.3      NaN
9   3.0      NaN      0.7
3   4.0      5.4      NaN
4   5.0      5.0      NaN

In [199]: B.assign(Time=B.Time.astype(str)).dtypes
Out[199]:
Time        object   # <------ NOTE
Current    float64
dtype: object

:

In [200]: B.assign(Time=B.Time.astype(str))
Out[200]:
   Time  Current
0  -1.0      0.5
1   0.0      0.6
2   1.0      0.3
3   2.0      0.4
4   3.0      0.7

In [201]: B
Out[201]:
   Time  Current
0  -1.0      0.5
1   0.0      0.6
2   1.0      0.3
3   2.0      0.4
4   3.0      0.7
+2

"" , , (float64). :

A = A.assign(A.Time = A.Time.round(4))

", ()" ( , ). :

A['Time, (sec)'] = A['Time, (sec)'].round(4)

. ?

0

Source: https://habr.com/ru/post/1678295/


All Articles