Drawing a graph for sequential division into categories (R, ggplot2)

This is a question of both the best methods of visual presentation of data, and how to draw graphs in R / ggplot2.

I am trying to find a way to graphically represent the story told here:

“We had 2000 test cases, of which 500 had errors. After the study, we found that 400 tests were large and 1600 were Small, only 25 of the Big tests had errors, so we postponed them, leaving 1600 Small tests, of which 475 had errors, then we found out that 400 Small tests were clockwise and 1200 were counterclockwise, only 20 of the Smallwisewise tests had errors, so we deferred them, leaving 1200 Small Counter-Clockwise tests, of which 455 had errors. "

In other words, I use categories to separate my test cases, and I want to imagine how the error rate in each category changes with my progress.

Here are some Rs with data:

tests <- data.frame(n.all=c(2000,400,1600,400,1200),n.err=c(500,25,475,20,455),sep.1=as.factor(c("all","Big","Small","Small","Small")),sep.2=as.factor(c("all","all","all","Clockwise","Counter-Clockwise"))) 

With so little data, a simple numerical table might be the best choice; suppose the story goes on, with more and more separating categories being used, so simply listing numbers is not the best choice.

What would be a good way to present this data? I can imagine several possibilities:

Four possible plots: pie, bar, bar with path, horizontal bar with path

  • Pie charts showing pieces of a bounced pie and a breakdown of errors / errors in what remains
  • Histograms Similar
  • Histograms with ribbons showing the "flow" of the separating categories, for example, the Minar map of Napoleon’s march
  • Similarly, but with histograms showing fractions horizontally rather than vertically

All four methods show the absolute number of reductions in test cases and the percentage of errors in a particular category, as well as what remains. I think I like # 4, but I have an open mind.

How should this kind of data be presented and is it possible to use R / ggplot2?

+6
source share
1 answer

Remember the three things that should be in a line when drawing graphs; the message that you are saying, the message that you are being informed, and the message that the graph is telling you. In my opinion, your option 4 is the best to get the message consistent.

I also come to number 4 by a pure exception :;)

Columns are not suitable because you are combining vertical representation with horizontal flow, comparing pie charts is also not easy (even in a pie chart it is already difficult to compare different parts), therefore they are not an option, leaving you with option 4 really :)

You can also try the Sanki chart. Sankey Charts in R? may be helpful

0
source

Source: https://habr.com/ru/post/946988/


All Articles