Python pandas error when doing batch count

When batch counting multiple columns, I get an error. Here is my dataframe, as well as an example that just names the individual groups "b" and "c".

df = pd.DataFrame(np.random.randint(0,2,(4,4)), columns=['a', 'b', 'c', 'd']) df['gr'] = df.groupby(['b', 'c']).grouper.group_info[0] print df abcd gr 0 0 1 0 0 1 1 1 1 1 0 2 2 0 0 1 0 0 3 1 1 1 1 2 

However, when the example is slightly modified, so the count () function is called instead of grouper.group_info [0], an error occurs.

 df = pd.DataFrame(np.random.randint(0,2,(4,4)), columns=['a', 'b', 'c', 'd']) df['gr'] = df.groupby(['b', 'c']).count() print df --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-70-a46f632214e1> in <module>() 1 df = pd.DataFrame(np.random.randint(0,2,(4,4)), 2 columns=['a', 'b', 'c', 'd']) ----> 3 df['gr'] = df.groupby(['b', 'c']).count() 4 print df C:\Python27\lib\site-packages\pandas\core\frame.pyc in __setitem__(self, key, value) 2036 else: 2037 # set column -> 2038 self._set_item(key, value) 2039 2040 def _setitem_slice(self, key, value): C:\Python27\lib\site-packages\pandas\core\frame.pyc in _set_item(self, key, value) 2082 ensure homogeneity. 2083 """ -> 2084 value = self._sanitize_column(key, value) 2085 NDFrame._set_item(self, key, value) 2086 C:\Python27\lib\site-packages\pandas\core\frame.pyc in _sanitize_column(self, key, value) 2110 value = value.values.copy() 2111 else: -> 2112 value = value.reindex(self.index).values 2113 2114 if is_frame: C:\Python27\lib\site-packages\pandas\core\frame.pyc in reindex(self, index, columns, method, level, fill_value, limit, copy) 2527 if index is not None: 2528 frame = frame._reindex_index(index, method, copy, level, -> 2529 fill_value, limit) 2530 2531 return frame C:\Python27\lib\site-packages\pandas\core\frame.pyc in _reindex_index(self, new_index, method, copy, level, fill_value, limit) 2606 limit=None): 2607 new_index, indexer = self.index.reindex(new_index, method, level, -> 2608 limit=limit) 2609 return self._reindex_with_indexers(new_index, indexer, None, None, 2610 copy, fill_value) C:\Python27\lib\site-packages\pandas\core\index.pyc in reindex(self, target, method, level, limit) 2181 else: 2182 # hopefully? -> 2183 target = MultiIndex.from_tuples(target) 2184 2185 return target, indexer C:\Python27\lib\site-packages\pandas\core\index.pyc in from_tuples(cls, tuples, sortorder, names) 1803 tuples = tuples.values 1804 -> 1805 arrays = list(lib.tuples_to_object_array(tuples).T) 1806 elif isinstance(tuples, list): 1807 arrays = list(lib.to_object_array_tuples(tuples).T) C:\Python27\lib\site-packages\pandas\lib.pyd in pandas.lib.tuples_to_object_array (pandas\lib.c:42342)() ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long' 
+4
source share
1 answer

Rate df.groupby(['b', 'c']).count() in an interactive session:

 In [150]: df.groupby(['b', 'c']).count() Out[150]: abcd bc 0 0 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 

This is a whole DataFrame. This is probably not what you want to assign the df column to the new column (in fact, you cannot assign the DataFrame column, so there is at least a critical exception.).


If you want to create a new column that counts the number of rows in each group, you can use

 df['gr'] = df.groupby(['b', 'c'])['a'].transform('count') 

For instance,

 import pandas as pd import numpy as np np.random.seed(1) df = pd.DataFrame(np.random.randint(0, 2, (4, 4)), columns=['a', 'b', 'c', 'd']) print(df) # abcd # 0 1 1 0 0 # 1 1 1 1 1 # 2 1 0 0 1 # 3 0 1 1 0 df['gr'] = df.groupby(['b', 'c'])['a'].transform('count') df['comp_ids'] = df.groupby(['b', 'c']).grouper.group_info[0] print(df) 

gives

  abcd gr comp_ids 0 1 1 0 0 1 1 1 1 1 1 1 2 2 2 1 0 0 1 1 0 3 0 1 1 0 2 2 

Note that df.groupby(['b', 'c']).grouper.group_info[0] returns something other than the number of lines in each group. Rather, it returns a label for each group.

+6
source

Source: https://habr.com/ru/post/1489335/


All Articles