How to create a histogram for all variables in a data set with minimal effort in R?

Learning a new dataset: what is the easiest and fastest way to visualize many (all) variables?

Ideally, the output shows histograms next to each other with minimal interference and maximum information. The key to this issue is the flexibility and stability of working with large and different data sets. I use RStudio and usually deal with large and dirty polling data.

One example that exits from the Hmisc window and works pretty well:

 library(ggplot2) str(mpg) library(Hmisc) hist.data.frame(mpg) 

Unfortunately, somewhere else I ran into problems with data tables (Error in plot.new (): drawing fields too large). It also crashed for a larger dataset than mpg , and I did not understand how to control binning. Moreover, I would prefer a flexible solution in ggplot2 . Please note that I just started to learn R and got used to the convenient solutions provided by commercial software.

Other questions on this topic:

R histogram - too many variables

...

+6
source share
1 answer

There are three approaches:

  • Commands from packages such as hist.data.frame()
  • Quoting on variables or similar macrostructures
  • Stacking variables and using faces

Packages

Other available commands:

 library(plyr) library(psych) multi.hist(mpg) #error, not numeric multi.hist(mpg[,sapply(mpg, is.numeric)]) 

or maybe a multhist from plotrix , which I have not studied. Both of them do not offer the flexible form I was looking for.

Loops

As a beginner R, everyone advised me to stay away from the loops. So I did, but it might be worth a try here. Any suggestions are welcome. Perhaps you could comment on how to combine the charts into one file.

Stacking

My first suspicion was that stacking variables could get out of hand. However, this may be the best strategy for a reasonable set of variables.

In one example, I used the melt function.

 library(reshape2) mpgid <- mutate(mpg, id=as.numeric(rownames(mpg))) mpgstack <- melt(mpgid, id="id") pp <- qplot(value, data=mpgstack) + facet_wrap(~variable, scales="free") # pp + stat_bin(geom="text", aes(label=..count.., vjust=-1)) ggsave("mpg-histograms.pdf", pp, scale=2) 

(As you can see, I tried putting value labels in columns for greater information density, but that’s not so good. Labels on the x axis are also less ideal.)

No solution here is perfect, and there will be no one-size-fits-all team. But perhaps we can get closer to exploring a new dataset.

+8
source

Source: https://habr.com/ru/post/919133/


All Articles