I work with a very large data donation database with corresponding columns for donation identifier, channel identifier, quantity, for example:
TRANSACTION_ID BACK_REFERENCE_TRAN_ID_NUMBER CONTRIBUTION_AMOUNT
0 VR0P4H2SEZ1 0 100
1 VR0P4H3X770 0 2700
2 VR0P4GY6QV1 0 500
3 VR0P4H3X720 0 1700
4 VR0P4GYHHA0 VR0P4GYHHA0E 200
What I need to do is identify all rows where the TRANSACTION_ID value matches any BACK_REFERENCE_TRAN_ID_NUMBER. My current code, although a bit awkward, is:
is_from_conduit = df[df.BACK_REFERENCE_TRAN_ID_NUMBER != "0"].BACK_REFERENCE_TRAN_ID_NUMBER.tolist()
df['CONDUIT_FOR_OTHER_DONATION'] = 0
for row in df.index:
if df['TRANSACTION_ID'][row] in is_from_conduit:
df['CONDUIT_FOR_OTHER_DONATION'][row] = 1
else:
df['CONDUIT_FOR_OTHER_DONATION'][row] = 0
However, on very large datasets with lots of channel donations, it takes forever. I know that there should be an easier way, but I canβt figure out how to do this to figure out what it could be.
source
share