Python Pandas: the number of times each unique value appears for multiple columns

Suppose I have a DataFrame, for example,

In [7]: source = pd.DataFrame([['amazon.com', 'correct', 'correct'], ['amazon.com', 'incorrect', 'correct'], ['walmart.com', 'incorrect', 'correct'], ['walmart.com', 'incorrect', 'incorrect']], columns=['domain', 'price', 'product']) In [8]: source Out[8]: domain price product 0 amazon.com correct correct 1 amazon.com incorrect correct 2 walmart.com incorrect correct 3 walmart.com incorrect incorrect 

I would like to count for each domain number of times price == 'correct' and price == 'incorrect' and the same for product . In other words, I would like to see a way out

  domain key value count 0 amazon.com price correct 1 1 amazon.com price incorrect 1 2 amazon.com product correct 2 3 walmart.com price incorrect 2 4 walmart.com product correct 1 5 walmart.com product incorrect 1 

How to do it?

+4
source share
2 answers

The attached application will do this.

 In [24]: source.groupby('domain').apply(lambda x: x[['price','product']].apply(lambda y: y.value_counts())).fillna(0) Out[24]: price product domain amazon.com correct 1 2 incorrect 1 0 walmart.com correct 0 1 incorrect 2 1 
+7
source
 In [17]: %paste ( pd.melt(source, id_vars=['domain'], value_vars=['price', 'product']) .groupby(['domain', 'variable', 'value']) .size() .reset_index() .rename(columns={'variable': 'key', 0: 'count'}) ) ## -- End pasted text -- Out[17]: domain key value count 0 amazon.com price correct 1 1 amazon.com price incorrect 1 2 amazon.com product correct 2 3 walmart.com price incorrect 2 4 walmart.com product correct 1 5 walmart.com product incorrect 1 
0
source

Source: https://habr.com/ru/post/1492408/


All Articles