Numpy functions, such as np.mean (), np.var (), etc., take an array type argument, such as np.array, or a list, etc.
But data transfer in pandas also works. This means that the pandas framework can really disguise itself as a numpy array, which I find a bit strange (even though I know that the basic df values are really numpy arrays).
In order for the object to be like an array, I thought it should be slicable using integer indexing in how the numpy array is sliced. So, for example, df [1: 3, 2: 3] should work, but this will lead to an error.
Thus, it is possible that the DataFrame is converted to a numpy array when it enters a function. But if so, why does np.mean (numpy_array) produce a different result than np.mean (df)?
a = np.random.rand(4,2)
a
Out[13]:
array([[ 0.86688862, 0.09682919],
[ 0.49629578, 0.78263523],
[ 0.83552411, 0.71907931],
[ 0.95039642, 0.71795655]])
np.mean(a)
Out[14]: 0.68320065182041034
gives a different result than what below gives ...
df = pd.DataFrame(data=a, index=range(np.shape(a)[0]),
columns=range(np.shape(a)[1]))
df
Out[18]:
0 1
0 0.866889 0.096829
1 0.496296 0.782635
2 0.835524 0.719079
3 0.950396 0.717957
np.mean(df)
Out[21]:
0 0.787276
1 0.579125
dtype: float64
The first output is a singular, while the last is a column average. How does the numpy function know about creating a data block?