PythonValueError: can only compare objects with the same Series label

The 2 data blocks that I compare are of different sizes (have the same index), and I suppose that's why I get the error. Could you suggest me a way around this. I am looking for these lines in df2 whose user_id matches df1 names. Thank you and rate your answer.

 data = np.array([['user_id','comment','label'],
            [100,'RT @Dvillain_: #oomf should text me.',0],
            [100,'Buy viagra',1],
            [101,'#nowplaying M.C. Shan - Juice Crew Law on',0],
            [101,'Buy viagra two',1]])

 data2 = np.array([['user_id','comment','label'],
            [100,'First comment',0],
            [100,'Buy viagra',1],
            [102,'Buy viagra two',1]])

df1 = pd.DataFrame(data=data[1:,0:],columns = data[0,0:])
df2 = pd.DataFrame(data=data2[1:,0:],columns = data[0,0:])

df = df2[df2['user_id'] == df1['user_id']]
+4
source share
1 answer

You are looking for isin

df = df2[df2['user_id'].isin(df1['user_id'])]
df
Out[814]: 
  user_id        comment label
0     100  First comment     0
1     100     Buy viagra     1
+3
source

Source: https://habr.com/ru/post/1693224/


All Articles