Grouped_df object indexing

Attempting to select a column of an object of class grouped_df by index gives "Error: the index is out of bounds." for instance

 x <- mtcars %>% group_by(am, gear) %>% summarise_each(funs(sum), disp, hp, drat) class(x) # "grouped_df" "tbl_df" "tbl" "data.frame" # For some reason the first column can be selected... x[1] # Source: local data frame [4 x 1] # Groups: am # am # 0 # 0 # 1 # 1 # ...but any index > 1 fails x[2] # Error: index out of bounds # Coercing to data frame does the trick... as.data.frame(x)[2] # gear # 3 # 4 # 4 # 5 #... and so does ungrouping all(ungroup(x)[2] == as.data.frame(x)[2]) # TRUE 

It uses R version 3.1.1 and dplyr 0.3.0.2. I am not sure if this is a mistake or intentional. Is there a good reason why it works that way? I would rather remember to ungroup my data frames after using dplyr every time ...

Update. Looking a little further at this, I assume that the motivation for defining [.grouped_df in this way is to keep the groups when called, for example x[1:3] (which works). However, when the index is not part of the grouping variables, the above error is thrown. Perhaps it can be changed so that in this case it [.tbl_df and at the same time [.tbl_df warning ...

Update 2 [.grouped_df was changed in dplyr development version (0.3.0.9000). It still raises an error, but is now clearer by indicating which grouping variables were not included.

 x[2] # Error in `[.grouped_df`(x, 2) : # cannot group, grouping variables 'am' not included 

The best solution I have found that my code does not crash in this situation is to include %>% ungroup at the end of the dplyr command dplyr .

+5
source share
1 answer

For group_by function [ cannot multiply the df column, except for grouped variables. Learn more about isuse ,

0
source

Source: https://habr.com/ru/post/1204570/


All Articles