Python Pandas Counting Objects of a Specific Value

I am trying to find the number of times a particular value appears in a single column.

I made a dataframe with data = pd.DataFrame.from_csv('data/DataSet2.csv')

and now I want to know how many times something appears in the column. How to do it?

I thought it was lower, where do I look in the education column and count the amount of time ? going on.

The code below shows that I am trying to find the number of times 9th appears and the error occurs when the code runs.

Code

 missing2 = df.education.value_counts()['9th'] print(missing2) 

error

 KeyError: '9th' 
+13
source share
4 answers

IIUC, you can create a subset data with your state, and then use shape or len :

 print df col1 education 0 a 9th 1 b 9th 2 c 8th print df.education == '9th' 0 True 1 True 2 False Name: education, dtype: bool print df[df.education == '9th'] col1 education 0 a 9th 1 b 9th print df[df.education == '9th'].shape[0] 2 print len(df[df['education'] == '9th']) 2 

Performance is interesting, the fastest solution is to compare the numpy and sum array:

graph

Code :

 import perfplot, string np.random.seed(123) def shape(df): return df[df.education == 'a'].shape[0] def len_df(df): return len(df[df['education'] == 'a']) def query_count(df): return df.query('education == "a"').education.count() def sum_mask(df): return (df.education == 'a').sum() def sum_mask_numpy(df): return (df.education.values == 'a').sum() def make_df(n): L = list(string.ascii_letters) df = pd.DataFrame(np.random.choice(L, size=n), columns=['education']) return df perfplot.show( setup=make_df, kernels=[shape, len_df, query_count, sum_mask, sum_mask_numpy], n_range=[2**k for k in range(2, 25)], logx=True, logy=True, equality_check=False, xlabel='len(df)') 
+14
source

A couple of ways to use count or sum

 In [338]: df Out[338]: col1 education 0 a 9th 1 b 9th 2 c 8th In [335]: df.loc[df.education == '9th', 'education'].count() Out[335]: 2 In [336]: (df.education == '9th').sum() Out[336]: 2 In [337]: df.query('education == "9th"').education.count() Out[337]: 2 
+7
source

Try it:

 (df[education]=='9th').sum() 
+2
source

An elegant way to count the occurrence of '?' or any character in any column should use the built-in isin function of the dataframe object.

Suppose we have loaded the 'Automobile' dataset into a df object. We don’t know which columns contain the missing value ( '?' Character), so let's do:

 df.isin(['?']).sum(axis=0) 

The official DataFrame.isin(values) document states:

it returns a boolean DataFrame showing whether each element in the DataFrame is contained in values

Note that isin accepts an iteration as input, so we need to pass a list containing the target character to this function. df.isin(['?']) will return a logical data frame as follows.

  symboling normalized-losses make fuel-type aspiration-ratio ... 0 False True False False False 1 False True False False False 2 False True False False False 3 False False False False False 4 False False False False False 5 False True False False False ... 

To count the number of occurrences of the target character in each column, let's take sum over all the lines of the above data frame, specifying axis=0 . The final (truncated) result shows what we expect:

 symboling 0 normalized-losses 41 ... bore 4 stroke 4 compression-ratio 0 horsepower 2 peak-rpm 2 city-mpg 0 highway-mpg 0 price 4 
0
source

Source: https://habr.com/ru/post/1242561/


All Articles