Pandas returns empty groups to groupby

I have Pandas DataFrame with 3 columns target, predand conf_bin. If I run groupby(by='conf_bin').apply(...), my apply function is called empty DataFramefor values ​​that do not appear in the column conf_bin. How is this possible?


More details

The DataFrame looks something like this:

        target  pred conf_bin
0            5     6     0.50
1            4     4     0.60
2            4     4     0.50
3            4     3     0.50
4            4     5     0.50
5            5     5     0.55
6            5     5     0.55
7            5     5     0.55

Obviously, this conf_binis a numeric bit with values ​​in the range np.arange(0, 1, 0.05). However, not all values ​​are present in the data:

In [224]: grp = tp.groupby(by='conf_bin')

In [225]: grp.groups.keys()
Out[225]: dict_keys([0.5, 0.60000000000000009, 0.35000000000000003, 0.75, 0.85000000000000009, 0.65000000000000002, 0.55000000000000004, 0.80000000000000004, 0.20000000000000001, 0.45000000000000001, 0.40000000000000002, 0.30000000000000004, 0.70000000000000007, 0.25])

For example, the values 0, and 0.05are not displayed. However, when I run applyin a group, my function calls calls for these values:

In [226]: grp.apply(lambda x: x.shape)
Out[226]:
conf_bin
0.00        (0, 3)
0.05        (0, 3)
0.10        (0, 3)
0.15        (0, 3)
0.20       (22, 3)
0.25       (75, 3)
0.30       (95, 3)
0.35      (870, 3)
0.40     (8505, 3)
0.45    (40068, 3)
0.50    (51238, 3)
0.55    (54305, 3)
0.60    (47191, 3)
0.65    (38977, 3)
0.70    (34444, 3)
0.75    (20435, 3)
0.80     (3352, 3)
0.85        (4, 3)
0.90        (0, 3)
dtype: object

Questions:

  • How can Pandas even know that the values ​​0.0 and 0.5 "make sense" since they do not appear in mine DataFrame?
  • apply DataFrame , grp.groups?
+4
1

, .

( SO), .

groups = df.groupby('conf_bin')
group_list = [(index, group) for index, group in groups if len(g) > 0]

, " pandas" , , , .


groupby , groupby,

fig, axes = plt.subplots(nrows=len(group_list), ncols=1)
for (index, group), ax in zip(group_list, axes.flatten()):
    group['target'].plot(ax=ax, title=index)
0

Source: https://habr.com/ru/post/1658981/


All Articles