How do numpy functions work inside pandas objects inside?

Question

How do numpy functions work inside pandas objects inside?

Numpy functions, such as np.mean (), np.var (), etc., take an array type argument, such as np.array, or a list, etc.

But data transfer in pandas also works. This means that the pandas framework can really disguise itself as a numpy array, which I find a bit strange (even though I know that the basic df values are really numpy arrays).

In order for the object to be like an array, I thought it should be slicable using integer indexing in how the numpy array is sliced. So, for example, df [1: 3, 2: 3] should work, but this will lead to an error.

Thus, it is possible that the DataFrame is converted to a numpy array when it enters a function. But if so, why does np.mean (numpy_array) produce a different result than np.mean (df)?

a = np.random.rand(4,2)
a
Out[13]: 
array([[ 0.86688862,  0.09682919],
   [ 0.49629578,  0.78263523],
   [ 0.83552411,  0.71907931],
   [ 0.95039642,  0.71795655]])

np.mean(a)
Out[14]: 0.68320065182041034

gives a different result than what below gives ...

df = pd.DataFrame(data=a, index=range(np.shape(a)[0]), 
columns=range(np.shape(a)[1]))

df
Out[18]: 
      0         1
0  0.866889  0.096829
1  0.496296  0.782635
2  0.835524  0.719079
3  0.950396  0.717957

np.mean(df)
Out[21]: 
0    0.787276
1    0.579125
dtype: float64

The first output is a singular, while the last is a column average. How does the numpy function know about creating a data block?

+6

python numpy pandas

aa May 09 '17 at 9:06

source share

1 answer

EdChum · Answer 1 · 2017-05-09T09:13:09+0000

If you do this:

--Call--
> d:\winpython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py(2796)mean()
-> def mean(a, axis=None, dtype=None, out=None, keepdims=False):
(Pdb) s
> d:\winpython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py(2877)mean()
-> if type(a) is not mu.ndarray:
(Pdb) s
> d:\winpython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py(2878)mean()
-> try:
(Pdb) s
> d:\winpython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py(2879)mean()
-> mean = a.mean

You can see what is typenot ndarray, so it tries to call a.mean, which in this case will be df.mean():

In [6]:

df.mean()
Out[6]:
0    0.572999
1    0.468268
dtype: float64

That's why the output is different

Code to play above:

In [3]:
a = np.random.rand(4,2)
a

Out[3]:
array([[ 0.96750329,  0.67623187],
       [ 0.44025179,  0.97312747],
       [ 0.07330062,  0.18341157],
       [ 0.81094166,  0.04030253]])

In [4]:    
np.mean(a)

Out[4]:
0.52063384885403818

In [5]:    
df = pd.DataFrame(data=a, index=range(np.shape(a)[0]), 
columns=range(np.shape(a)[1]))

df

Out[5]:
          0         1
0  0.967503  0.676232
1  0.440252  0.973127
2  0.073301  0.183412
3  0.810942  0.040303

numpy output:

In [7]:
np.mean(df)

Out[7]:
0    0.572999
1    0.468268
dtype: float64

.values, np, :

In [8]:
np.mean(df.values)

Out[8]:
0.52063384885403818

How do numpy functions work inside pandas objects inside?

More articles: