Recode resume / overview of levels before and after recoding

I have dplyr::recode some factors, and I'm looking for a clean way to make a LaTeX table where new and old categories, i.e. levels, are compared.

Here is an illustration of problems using cyl from `mtcars. First a few packages,

 # install.packages("tidyverse", "stargazer","reporttools") library(tidyverse) 

and the data that I intend to use,

 mcr <- mtcars %>% select(cyl) %>% as_tibble() mcr %>% print(n=5) #> # A tibble: 32 x 1 #> cyl #> * <dbl> #> 1 6.00 #> 2 6.00 #> 3 4.00 #> 4 6.00 #> 5 8.00 #> # ... with 27 more rows 

Now I create two new factors: one with three categories, cyl_3col and one with two, cyl_is_red , i.e.:

 mcr_col <- mcr %>% as_tibble() %>% mutate(cyl_3col = factor(cyl, levels = c(4, 6, 8),labels = c("red", "blue", "green")), cyl_is_red = recode(cyl_3col, .default = 'is not red', 'red' = 'is red')) mcr_col %>% print(n=5) #> # A tibble: 32 x 3 #> cyl cyl_3col cyl_is_red #> <dbl> <fct> <fct> #> 1 6.00 blue is not red #> 2 6.00 blue is not red #> 3 4.00 red is red #> 4 6.00 blue is not red #> 5 8.00 green is not red #> # ... with 27 more rows 

Now I would like to show how the categories in cyl_3col and cyl_is_red .

Maybe something like this is better

 #> cyl_is_red cyl_3col #> is red #> red #> is not red #> blue #> green 

maybe something like this, I imagine the is not red category spanning two lines with \multirow{} or something like that.

 #> cyl_3col cyl_is_red #> 1 red is red #> 2 blue is not red #> 3 green ---------- 

using or maybe some other teX. I am very open about how best to show transcoding. I guess there is some clever way to code this thoughtful by someone who is in front of me?

I used something like mcr_col %>% count(cyl_3col, cyl_is_red) , but I don't think it really works.

+5
source share
4 answers

pixiedust has a merge option.

 --- title: "Untitled" output: pdf_document header-includes: - \usepackage{amssymb} - \usepackage{arydshln} - \usepackage{caption} - \usepackage{graphicx} - \usepackage{hhline} - \usepackage{longtable} - \usepackage{multirow} - \usepackage[dvipsnames,table]{xcolor} --- ```{r} library(pixiedust) library(dplyr) mcr <- mtcars %>% select(cyl) %>% as_tibble() mcr_col <- mcr %>% as_tibble() %>% mutate(cyl_3col = factor(cyl, levels = c(4, 6, 8),labels = c("red", "blue", "green")), cyl_is_red = recode(cyl_3col, .default = 'is not red', 'red' = 'is red')) mcr_col %>% count(cyl_3col, cyl_is_red) %>% select(-n) %>% dust(float = FALSE) %>% sprinkle(cols = "cyl_is_red", rows = 2:3, merge = TRUE) %>% sprinkle(sanitize = TRUE, part = "head") ``` 

enter image description here

+2
source

Perhaps a slightly different way to solve the problem was to display the images as a graph rather than a table - thus, bypassing the latex syntax generation. You can do something like:

 # Here I make some data with lots of levels tdf <- data.frame(cat1 = factor(letters), cat2 = factor(c(rep("Low", 9), rep("Mid", 9), rep("High", 8)))) # We'll collapse the alphabet down to three factors tdf$cat2 <- factor(tdf$cat2, levels(tdf$cat2)[c(2,3,1)]) # Now plot it as arrows running from the first encoding to the second ggplot2::ggplot(tdf) + geom_segment(data=tdf, aes(x=.05, xend = .45, y = cat1, yend = cat2), arrow = arrow()) + geom_text(aes(x=0, y=cat1, label=cat1)) + geom_text(aes(x=.5, y=cat2, label=cat2))+ facet_wrap(~cat2, nrow = 3, scales = "free_y") + theme_classic()+ theme(axis.title.x=element_blank(), axis.text.x=element_blank(), axis.ticks.x=element_blank(), axis.title.y=element_blank(), axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.line = element_blank(), strip.background = element_blank(), strip.text.y = element_blank()) + ggtitle("Variable Recodings") 

enter image description here

With a lot of variables, this can be easier in the eyes of the reader.

+2
source

If HTML works for you instead of latex, you can find many options with the tableHTML library

Here is an example of what you can do with it:

 library(tableHTML) connections <- mcr_col %>% count(cyl_3col, cyl_is_red) groups <- connections %>% group_by(cyl_is_red) %>% summarise(cnt = length(cyl_3col)) tableHTML(connections %>% select(-n, -cyl_is_red), rownames = FALSE, row_groups = list(groups$cnt, groups$cyl_is_red)) 
+2
source

I'm still not sure how you want this to generalize, but assuming there is a column (such as cyl) that you want to exclude from this repetition analysis, how about

 > mcr_col %>% select(-cyl) %>% distinct # A tibble: 3 x 2 cyl_3col cyl_is_red <fct> <fct> 1 blue is not red 2 red is red 3 green is not red 

This gives you a table of individual exits, where the only column you need to specify is the one (possibly the answer) that you want to exclude.

+1
source

Source: https://habr.com/ru/post/1275582/


All Articles