How do numpy functions work inside pandas objects inside?

Numpy functions, such as np.mean (), np.var (), etc., take an array type argument, such as np.array, or a list, etc.

But data transfer in pandas also works. This means that the pandas framework can really disguise itself as a numpy array, which I find a bit strange (even though I know that the basic df values ​​are really numpy arrays).

In order for the object to be like an array, I thought it should be slicable using integer indexing in how the numpy array is sliced. So, for example, df [1: 3, 2: 3] should work, but this will lead to an error.

Thus, it is possible that the DataFrame is converted to a numpy array when it enters a function. But if so, why does np.mean (numpy_array) produce a different result than np.mean (df)?

a = np.random.rand(4,2)
a
Out[13]: 
array([[ 0.86688862,  0.09682919],
   [ 0.49629578,  0.78263523],
   [ 0.83552411,  0.71907931],
   [ 0.95039642,  0.71795655]])

np.mean(a)
Out[14]: 0.68320065182041034

gives a different result than what below gives ...

df = pd.DataFrame(data=a, index=range(np.shape(a)[0]), 
columns=range(np.shape(a)[1]))

df
Out[18]: 
      0         1
0  0.866889  0.096829
1  0.496296  0.782635
2  0.835524  0.719079
3  0.950396  0.717957

np.mean(df)
Out[21]: 
0    0.787276
1    0.579125
dtype: float64

The first output is a singular, while the last is a column average. How does the numpy function know about creating a data block?

+6
source share
1 answer

If you do this:

--Call--
> d:\winpython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py(2796)mean()
-> def mean(a, axis=None, dtype=None, out=None, keepdims=False):
(Pdb) s
> d:\winpython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py(2877)mean()
-> if type(a) is not mu.ndarray:
(Pdb) s
> d:\winpython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py(2878)mean()
-> try:
(Pdb) s
> d:\winpython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py(2879)mean()
-> mean = a.mean

You can see what is typenot ndarray, so it tries to call a.mean, which in this case will be df.mean():

In [6]:

df.mean()
Out[6]:
0    0.572999
1    0.468268
dtype: float64

That's why the output is different

Code to play above:

In [3]:
a = np.random.rand(4,2)
a

Out[3]:
array([[ 0.96750329,  0.67623187],
       [ 0.44025179,  0.97312747],
       [ 0.07330062,  0.18341157],
       [ 0.81094166,  0.04030253]])

In [4]:    
np.mean(a)

Out[4]:
0.52063384885403818

In [5]:    
df = pd.DataFrame(data=a, index=range(np.shape(a)[0]), 
columns=range(np.shape(a)[1]))

df

Out[5]:
          0         1
0  0.967503  0.676232
1  0.440252  0.973127
2  0.073301  0.183412
3  0.810942  0.040303

numpy output:

In [7]:
np.mean(df)

Out[7]:
0    0.572999
1    0.468268
dtype: float64

.values, np, :

In [8]:
np.mean(df.values)

Out[8]:
0.52063384885403818
+4

Source: https://habr.com/ru/post/1017102/


All Articles