I am trying to use a map()
package purrr
to apply a function filter()
to data stored in a nested data frame.
"Why don't you filter first and then the nest?" - you ask. This will work (and I will show the desired result using such a process), but I am looking for ways to do this with purrr
. I want to have only one data frame with two columns of the list, both are nested data frames - one full and one filtered.
I can achieve this by doing it nest()
twice: once for all the data and the second for the filtered data:
library(tidyverse)
df <- tibble(
a = sample(x = rep(c('x','y'),5), size = 10),
b = sample(c(1:10)),
c = sample(c(91:100))
)
df_full_nested <- df %>%
group_by(a) %>%
nest(.key = 'full')
df_filter_nested <- df %>%
filter(c >= 95) %>%
group_by(a) %>%
nest(.key = 'filtered')
df_nested <- df_full_nested %>%
left_join(df_filter_nested, by = 'a')
Objects are as follows:
> df
a b c
<chr> <int> <int>
1 y 8 93
2 x 9 94
3 y 10 99
4 x 5 97
5 y 2 100
6 y 3 95
7 x 7 96
8 y 6 92
9 x 4 91
10 x 1 98
> df_full_nested
a full
<chr> <list>
1 y <tibble [5 x 2]>
2 x <tibble [5 x 2]>
> df_filter_nested
a filtered
<chr> <list>
1 y <tibble [3 x 2]>
2 x <tibble [3 x 2]>
> df_nested
a full filtered
<chr> <list> <list>
1 y <tibble [5 x 2]> <tibble [4 x 2]>
2 x <tibble [5 x 2]> <tibble [4 x 2]>
, . . , , ... .
, . , . .
,
df_full_nested %>% mutate(filtered = map(full, ...))
, filter()
!