Pandas - Merging column rows does not work (error?)

I am trying to do a simple merge between two data files. They come from two different SQL tables, where the joining keys are rows:

>>> df1.col1.dtype
dtype('O')
>>> df2.col2.dtype
dtype('O')

I am trying to combine them using this:

>>> merge_res = pd.merge(df1, df2, left_on='col1', right_on='col2')

The result of the inner join is empty, and this first told me that there might not be any entries at the intersection:

>>> merge_res.shape
(0, 19)

But when I try to match one element, I see this really strange behavior.

# Pick random element in second dataframe
>>> df2.iloc[5,:].col2
'95498208100000'

# Manually look for it in the first dataframe
>>> df1[df1.col1 == '95498208100000']
0 rows × 19 columns
# Empty, which makes sense given the above merge result

# Now look for the same value as an integer
>>> df1[df1.col1 == 95498208100000]
1 rows × 19 columns
# FINDS THE ELEMENT!?!

So the columns are defined with the type 'object'. Searching for them as strings does not produce any results. Searching for them as integers returns the result, and I think that is the reason why merging does not work above.

Any ideas what is going on?

, Pandas df1.col1 , , .

( , , . , , .)

+12
4

, dtpe object . , , . , , pandas int .

, :

>>> df1.col1 = df1.col1.astype(str)
>>> df2.col2 = df2.col2.astype(str)

, .

( dtype str...)

+20

, df.col = df.col.astype(str) . , .

:

In [72]: df1['col1'][:3]
Out[73]: 
             col1
0  dustin pedroia
1  kevin youkilis
2     david ortiz

In [72]: df2['col2'][:3]
Out[73]: 
             col2
0  dustin pedroia
1  kevin youkilis
2     david ortiz

.astype(str) , :

df1.col1 = df1.col1.str.encode('utf-8')
df2.col2 = df2.col2.str.encode('utf-8')

:

In [95]: df1
Out[95]: 
                       col1
0  b'dustin\xc2\xa0pedroia'
1  b'kevin\xc2\xa0youkilis'
2     b'david\xc2\xa0ortiz'

In [95]: df2
Out[95]: 
                col2
0  b'dustin pedroia'
1  b'kevin youkilis'
2     b'david ortiz'

, , df1.col1 = df1.col1.str.replace('\xa0',' ') df1.col1 (.. .str.encode('utf-8')), .

. , , .str.encode('utf-8'), , .

IDE Spyder Anaconda, .

import re
#places the raw string into a list
df1.col1 = df1.col1.apply(lambda x: re.findall(x, x))  
df2.col2 = df2.col2.apply(lambda x: re.findall(x, x))

df1 ( Spyder):

['dustin\xa0pedroia']
['kevin\xa0youkilis']
['david\xa0ortiz']

. , , , , - :)

+8

, @seeiespi the..str.encode('utf-8') , ,

20                 b'Belize '   ...     0,612
21                  b'Benin '   ...     0,546

df1.col1 = df1.col1.str.strip()
df1.col1 = df1.col1.str.strip()
+2

, , . :

df['sth'] = df.merge(df2, how='left', on=['x', 'y'])['sth'].values
0

Source: https://habr.com/ru/post/1655129/


All Articles