Pandas DataFrame Algorithm

Say I have two data frames

df1
df2

that I can join df1_keysand df2_keys.

I'd like to do:

  • (A-B)
  • (A-B) U (B-A)

with A=df1and B=df2.

From what I read in the documentation , the argument howfor pd.mergesupports the following options:

how : {โ€˜leftโ€™, โ€˜rightโ€™, โ€˜outerโ€™, โ€˜innerโ€™}, default โ€˜innerโ€™
        left: use only keys from left frame (SQL: left outer join)
        right: use only keys from right frame (SQL: right outer join)
        outer: use union of keys from both frames (SQL: full outer join)
        inner: use intersection of keys from both frames (SQL: inner join)

but not one of them gives us directly given operations 1 and 2.

For reference, below is the corresponding link for SQL (from this thread ):

enter image description here

+4
source share
1 answer

Although they are not supported directly, they can be achieved by tuning with indexes before trying to connect ...

-:

In [11]: ind = pd.Index([1, 2, 3])

In [12]: ind2 = pd.Index([3, 4, 5])

In [13]: ind - ind2
Out[13]: Int64Index([1, 2], dtype='int64')

| &:

In [14]: ind | ind2
Out[14]: Int64Index([1, 2, 3, 4, 5], dtype='int64')

In [15]: ind & ind2
Out[15]: Int64Index([3], dtype='int64')

, DataFrames , , :

In [21]: df = pd.DataFrame(np.random.randn(3), ind, ['a'])  # ind = df.index

In [22]: df2 = pd.DataFrame(np.random.randn(3), ind2, ['b'])  # ind2 = df2.index

In [23]: df.reindex(ind & ind2)
Out[23]:
          a
3  1.368518

, , :

In [24]: df.reindex(ind & ind2).join(df2.reindex(ind & ind2))  # equivalent to inner
Out[24]:
          a         b
3  1.368518 -1.335534

In [25]: df.reindex(ind - ind2).join(df2.reindex(ind - ind2))  # join on A set minus B
Out[25]:
          a   b
1  1.193652 NaN
2  0.064467 NaN
+4

Source: https://habr.com/ru/post/1529092/


All Articles