Conditional row insertion based on sequential values ​​in a column in R

I have a dataframe where I need to insert rows between tow lines if the value in the column changes from "A" to "B".

Event   Price   Type    Date    Time

A       100      Sell   27-01-2018 12:00
C       200      Buy    27-01-2018 12:15
C       300      Buy    27-01-2018 12:30
D       350      Sell   27-01-2018 12:31
A       320      Buy    27-01-2018 12:32
B       321      Sell   27-01-2018 12:32
B       220      Buy    27-01-2018 12:34
L       550      Buy    27-01-2018 12:35
A       320      Buy    27-01-2018 12:32
B       320      Sell   27-01-2018 12:32

I want to insert a new line if event "B" follows event "A". A new line should be inserted between two lines that have all the values ​​equal to the line where "B" is the Event, except that the Event will be "Z".

Expected Data Frame

Event   Price   Type    Date    Time

A       100      Sell   27-01-2018 12:00
C       200      Buy    27-01-2018 12:15
C       300      Buy    27-01-2018 12:30
D       350      Sell   27-01-2018 12:31
A       320      Buy    27-01-2018 12:32
Z       321      Sell   27-01-2018 12:32
B       321      Sell   27-01-2018 12:32
B       220      Buy    27-01-2018 12:34
L       550      Buy    27-01-2018 12:35
A       320      Buy    27-01-2018 12:32
Z       320      Sell   27-01-2018 12:32
B       320      Sell   27-01-2018 12:32
+4
source share
3 answers

Alternative approach tidyverse

library(tidyverse)
df %>%
  group_by(G = cumsum(Event == "B" & dplyr::lag(Event, 1, default=NA) == "A")) %>%
  do(rbind(mutate(head(., 1), Event = "Z"), .)) %>%
  ungroup() %>%
  slice(-1) %>%
  select(-G)

# A tibble: 12 x 5
   # Event Price Type  Date       Time 
   # <chr> <int> <chr> <chr>      <chr>
 # 1 A       100 Sell  27-01-2018 12:00
 # 2 C       200 Buy   27-01-2018 12:15
 # 3 C       300 Buy   27-01-2018 12:30
 # 4 D       350 Sell  27-01-2018 12:31
 # 5 A       320 Buy   27-01-2018 12:32
 # 6 Z       321 Sell  27-01-2018 12:32
 # 7 B       321 Sell  27-01-2018 12:32
 # 8 B       220 Buy   27-01-2018 12:34
 # 9 L       550 Buy   27-01-2018 12:35
# 10 A       320 Buy   27-01-2018 12:32
# 11 Z       320 Sell  27-01-2018 12:32
# 12 B       320 Sell  27-01-2018 12:32

Data

df <- read.table(text="Event   Price   Type    Date    Time
A       100      Sell   27-01-2018 12:00
C       200      Buy    27-01-2018 12:15
C       300      Buy    27-01-2018 12:30
D       350      Sell   27-01-2018 12:31
A       320      Buy    27-01-2018 12:32
B       321      Sell   27-01-2018 12:32
B       220      Buy    27-01-2018 12:34
L       550      Buy    27-01-2018 12:35
A       320      Buy    27-01-2018 12:32
B       320      Sell   27-01-2018 12:32", header=TRUE, stringsAsFactors=FALSE)
+2
source

Here is an approach using tidyverse:

library(tidyverse)
df %>%
  mutate(lagE = lag(Event),  #create a lag Even column
         splt = ifelse(Event == "B" & lagE == "A", T, F),  #label the condition B after A
         cum = cumsum(splt)) %>% #create a column to split by
  {split(., .$cum)} %>% #split the data frame
  map(function(x){  #in each list data frame check if first element is B, if it is duplicate it and rename to Z, if not just return the data frame.
    if(x[1,1] == "B"){
      z <- rbind(x[1,], x)
      z[,1] <- as.character(z[,1])
      z[1,1] <- "Z" 
    } else {z <- x}
    z
  }) %>%
  bind_rows() %>% #put back to a data frame
  select(1:5) #remove helper columns

#output
   Event Price Type       Date  Time
1      A   100 Sell 27-01-2018 12:00
2      C   200  Buy 27-01-2018 12:15
3      C   300  Buy 27-01-2018 12:30
4      D   350 Sell 27-01-2018 12:31
5      A   320  Buy 27-01-2018 12:32
6      Z   321 Sell 27-01-2018 12:32
7      B   321 Sell 27-01-2018 12:32
8      B   220  Buy 27-01-2018 12:34
9      L   550  Buy 27-01-2018 12:35
10     A   320  Buy 27-01-2018 12:32
11     Z   320 Sell 27-01-2018 12:32
12     B   320 Sell 27-01-2018 12:32

The problem seems simple, and I'm sure someone will provide a more concise solution.

+4

An option is used here base R. We create a logical vectorone by comparing the next “Event” with the current “Event” and check if it is “A” and “B”. Then, a subset of the dataset using the index, rbindwith the original dataset, and then change the "Event" to "Z" based on the index "i2"

i1 <- with(df1, c(FALSE, Event[-1] == "B" & Event[-nrow(df1)] == "A"))
i2 <- which(i1) + seq_along(which(i1))-1
n <- sum(i1)+ length(i1)
res <- rbind(df1, transform(df1[i1,], Event = "Z"))[order(c(setdiff(seq_len(n), i2), i2)),]
row.names(res) <- NULL
res
#   Event Price Type       Date  Time
#1      A   100 Sell 27-01-2018 12:00
#2      C   200  Buy 27-01-2018 12:15
#3      C   300  Buy 27-01-2018 12:30
#4      D   350 Sell 27-01-2018 12:31
#5      A   320  Buy 27-01-2018 12:32
#6      Z   321 Sell 27-01-2018 12:32
#7      B   321 Sell 27-01-2018 12:32
#8      B   220  Buy 27-01-2018 12:34
#9      L   550  Buy 27-01-2018 12:35
#10     A   320  Buy 27-01-2018 12:32
#11     Z   320 Sell 27-01-2018 12:32
#12     B   320 Sell 27-01-2018 12:32
+4
source

Source: https://habr.com/ru/post/1694088/


All Articles