Use filter () (and other dplyr functions) inside nested data frames with map ()

I am trying to use a map()package purrrto apply a function filter()to data stored in a nested data frame.

"Why don't you filter first and then the nest?" - you ask. This will work (and I will show the desired result using such a process), but I am looking for ways to do this with purrr. I want to have only one data frame with two columns of the list, both are nested data frames - one full and one filtered.

I can achieve this by doing it nest()twice: once for all the data and the second for the filtered data:

library(tidyverse)

df <- tibble(
  a = sample(x = rep(c('x','y'),5), size = 10),
  b = sample(c(1:10)),
  c = sample(c(91:100))
)

df_full_nested <- df %>% 
  group_by(a) %>% 
  nest(.key = 'full')

df_filter_nested <- df %>%
  filter(c >= 95) %>%  ##this is the key step
  group_by(a) %>% 
  nest(.key = 'filtered')

## Desired outcome - one data frame with 2 nested list-columns: one full and one filtered.
## How to achieve this without breaking it out into 2 separate data frames?
df_nested <- df_full_nested %>% 
  left_join(df_filter_nested, by = 'a')

Objects are as follows:

> df
# A tibble: 10 x 3
       a     b     c
   <chr> <int> <int>
 1     y     8    93
 2     x     9    94
 3     y    10    99
 4     x     5    97
 5     y     2   100
 6     y     3    95
 7     x     7    96
 8     y     6    92
 9     x     4    91
10     x     1    98

> df_full_nested
# A tibble: 2 x 2
      a             full
  <chr>           <list>
1     y <tibble [5 x 2]>
2     x <tibble [5 x 2]>

> df_filter_nested
# A tibble: 2 x 2
      a         filtered
  <chr>           <list>
1     y <tibble [3 x 2]>
2     x <tibble [3 x 2]>

> df_nested
# A tibble: 2 x 3
      a             full         filtered
  <chr>           <list>           <list>
1     y <tibble [5 x 2]> <tibble [4 x 2]>
2     x <tibble [5 x 2]> <tibble [4 x 2]>

, . . , , ... .

, . , . .

,

df_full_nested %>% mutate(filtered = map(full, ...))

, filter()

!

+4
1

map(full, ~ filter(., c >= 95)), . , :

df_nested_2 <- df_full_nested %>% mutate(filtered = map(full, ~ filter(., c >= 95)))

identical(df_nested, df_nested_2)
# [1] TRUE
+8

Source: https://habr.com/ru/post/1688904/


All Articles