I believe you can use dask . and the merge
function.
The docs say:
What definitely works?
Skillfully parallelizable operations (also fast):
Join index: dd.merge (df1, df2, left_index = True, right_index = True)
Or:
Operations requiring shuffling (slow-ish, if only by index)
Set Index: df.set_index (df.x)
Join non-index: pd.merge (df1, df2, on = 'name')
You can also check how to Create Dask DataFrames .
Example
import pandas as pd left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'], 'A': ['A0', 'A1', 'A2', 'A3'], 'B': ['B0', 'B1', 'B2', 'B3']}) right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'], 'C': ['C0', 'C1', 'C2', 'C3'], 'D': ['D0', 'D1', 'D2', 'D3']}) result = pd.merge(left, right, on='key') print result AB key CD 0 A0 B0 K0 C0 D0 1 A1 B1 K1 C1 D1 2 A2 B2 K2 C2 D2 3 A3 B3 K3 C3 D3 import dask.dataframe as dd
#first set indexes and then merge by them print dd.merge(left1.set_index('key').compute(), right1.set_index('key').compute(), left_index=True, right_index=True) ABCD key K0 A0 B0 C0 D0 K1 A1 B1 C1 D1 K2 A2 B2 C2 D2 K3 A3 B3 C3 D3