How to select an item in a column of a data frame array?

I have the following data frame:

pa=pd.DataFrame({'a':np.array([[1.,4.],[2.],[3.,4.,5.]])}) 

I want to select the column "a", and then only the specific element (i.e. the first: 1., 2., 3.)

What I need to add:

 pa.loc[:,['a']] 

?

+6
source share
2 answers

pa.loc[row] selects a row labeled row .

pa.loc[row, col] selects cells that are the increment of row and col

pa.loc[:, col] selects all rows and a column named col . Please note: although this works, it is not an idiomatic way to refer to a data column. For this you must use pa['a']

Now you have lists in the cells of your column, so you can use vectorized row methods to access the elements of these lists.

 pa['a'].str[0] #first value in lists pa['a'].str[-1] #last value in lists 
+11
source

Storing lists as values ​​in a Pandas DataFrame is usually a mistake because it does not allow you to use the fast vectorized NumPy or Pandas operations.

Therefore, you might be better off converting your DataFrame to lists of numbers into a wider DataFrame with your own NumPy types:

 import numpy as np import pandas as pd pa = pd.DataFrame({'a':np.array([[1.,4.],[2.],[3.,4.,5.]])}) df = pd.DataFrame(pa['a'].values.tolist()) # 0 1 2 # 0 1.0 4.0 NaN # 1 2.0 NaN NaN # 2 3.0 4.0 5.0 

Now you can select the first column as follows:

 In [36]: df.iloc[:, 0] Out[36]: 0 1.0 1 2.0 2 3.0 Name: 0, dtype: float64 

or first row, for example:

 In [37]: df.iloc[0, :] Out[37]: 0 1.0 1 4.0 2 NaN Name: 0, dtype: float64 

If you want to abandon NaN, use .dropna() :

 In [38]: df.iloc[0, :].dropna() Out[38]: 0 1.0 1 4.0 Name: 0, dtype: float64 

and .tolist() to get list values:

 In [39]: df.iloc[0, :].dropna().tolist() Out[39]: [1.0, 4.0] 

but if you want to use NumPy / Pandas for speed, you want to express your calculations as vectorized operations on df itself without converting back to Python lists.

+8
source

Source: https://habr.com/ru/post/1203514/


All Articles