R ggplot: apply the label only to the last N data points in the graph

I created a line chart (graph) in R with labels on each data point. Due to the large number of data points, the graph becomes very complete with labels. I would like to apply labels only to the last N (let's say 4) data points. I tried the subset and tail in the geom_label_repel function, but could not identify them or received an error message. My dataset consists of 99 values ​​distributed across 3 groups (KPI).

I have the following code in R:

library(ggplot) library(ggrepel) data.trend <- read.csv(file=....) plot.line <- ggplot(data=data.trend, aes(x = Version, y = Value, group = KPI, color = KPI)) + geom_line(aes(group = KPI), size = 1) + geom_point(size = 2.5) + # Labels defined here geom_label_repel( aes(Version, Value, fill = factor(KPI), label = sprintf('%0.1f%%', Value)), box.padding = unit(0.35, "lines"), point.padding = unit(0.4, "lines"), segment.color = 'grey50', show.legend = FALSE ) ); 

I’m all honest, I’m completely new to R. Maybe I missed something basic.

Thanks in advance.

+5
source share
1 answer

Easiest approach - set the parameter data = to geom_label_repel to include only those points that you want to mark.

Here's a reproducible example:

 set.seed(1235) data.trend <- data.frame(Version = rnorm(25), Value = rnorm(25), group = sample(1:2,25,T), KPI = sample(1:2,25,T)) ggplot(data=data.trend, aes(x = Version, y = Value, group = KPI, color = KPI)) + geom_line(aes(group = KPI), size = 1) + geom_point(size = 2.5) + geom_label_repel(aes(Version, Value, fill = factor(KPI), label = sprintf('%0.1f%%', Value)), data = tail(data.trend, 4), box.padding = unit(0.35, "lines"), point.padding = unit(0.4, "lines"), segment.color = 'grey50', show.legend = FALSE) 

enter image description here

Unfortunately, this is a little contrary to the repulsion algorithm, making the placement of marks suboptimal with respect to other points that are not marked (you can see in the above figure that some points are covered with marks).

So, the best approach is to use color and fill to simply make unwanted labels invisible (by setting the color and fill to NA for the labels you want to hide)

 ggplot(data=data.trend, aes(x = Version, y = Value, group = KPI, color = KPI)) + geom_line(aes(group = KPI), size = 1) + geom_point(size = 2.5) + geom_label_repel(aes(Version, Value, fill = factor(KPI), label = sprintf('%0.1f%%', Value)), box.padding = unit(0.35, "lines"), point.padding = unit(0.4, "lines"), show.legend = FALSE, color = c(rep(NA,21), rep('grey50',4)), fill = c(rep(NA,21), rep('lightblue',4))) 

enter image description here

+6
source

Source: https://habr.com/ru/post/1262460/


All Articles