Conditional row insertion based on sequential values in a column in R

Question

Conditional row insertion based on sequential values in a column in R

I have a dataframe where I need to insert rows between tow lines if the value in the column changes from "A" to "B".

Event   Price   Type    Date    Time

A       100      Sell   27-01-2018 12:00
C       200      Buy    27-01-2018 12:15
C       300      Buy    27-01-2018 12:30
D       350      Sell   27-01-2018 12:31
A       320      Buy    27-01-2018 12:32
B       321      Sell   27-01-2018 12:32
B       220      Buy    27-01-2018 12:34
L       550      Buy    27-01-2018 12:35
A       320      Buy    27-01-2018 12:32
B       320      Sell   27-01-2018 12:32

I want to insert a new line if event "B" follows event "A". A new line should be inserted between two lines that have all the values equal to the line where "B" is the Event, except that the Event will be "Z".

Expected Data Frame

Event   Price   Type    Date    Time

A       100      Sell   27-01-2018 12:00
C       200      Buy    27-01-2018 12:15
C       300      Buy    27-01-2018 12:30
D       350      Sell   27-01-2018 12:31
A       320      Buy    27-01-2018 12:32
Z       321      Sell   27-01-2018 12:32
B       321      Sell   27-01-2018 12:32
B       220      Buy    27-01-2018 12:34
L       550      Buy    27-01-2018 12:35
A       320      Buy    27-01-2018 12:32
Z       320      Sell   27-01-2018 12:32
B       320      Sell   27-01-2018 12:32

+4

r dataframe dplyr

NinjaR Feb 24 '18 at 13:28

source share

3 answers

Here is an approach using tidyverse:

library(tidyverse)
df %>%
  mutate(lagE = lag(Event),  #create a lag Even column
         splt = ifelse(Event == "B" & lagE == "A", T, F),  #label the condition B after A
         cum = cumsum(splt)) %>% #create a column to split by
  {split(., .$cum)} %>% #split the data frame
  map(function(x){  #in each list data frame check if first element is B, if it is duplicate it and rename to Z, if not just return the data frame.
    if(x[1,1] == "B"){
      z <- rbind(x[1,], x)
      z[,1] <- as.character(z[,1])
      z[1,1] <- "Z" 
    } else {z <- x}
    z
  }) %>%
  bind_rows() %>% #put back to a data frame
  select(1:5) #remove helper columns

#output
   Event Price Type       Date  Time
1      A   100 Sell 27-01-2018 12:00
2      C   200  Buy 27-01-2018 12:15
3      C   300  Buy 27-01-2018 12:30
4      D   350 Sell 27-01-2018 12:31
5      A   320  Buy 27-01-2018 12:32
6      Z   321 Sell 27-01-2018 12:32
7      B   321 Sell 27-01-2018 12:32
8      B   220  Buy 27-01-2018 12:34
9      L   550  Buy 27-01-2018 12:35
10     A   320  Buy 27-01-2018 12:32
11     Z   320 Sell 27-01-2018 12:32
12     B   320 Sell 27-01-2018 12:32

The problem seems simple, and I'm sure someone will provide a more concise solution.

+4

missuse 24 . '18 13:41

An option is used here base R. We create a logical vectorone by comparing the next “Event” with the current “Event” and check if it is “A” and “B”. Then, a subset of the dataset using the index, rbindwith the original dataset, and then change the "Event" to "Z" based on the index "i2"

i1 <- with(df1, c(FALSE, Event[-1] == "B" & Event[-nrow(df1)] == "A"))
i2 <- which(i1) + seq_along(which(i1))-1
n <- sum(i1)+ length(i1)
res <- rbind(df1, transform(df1[i1,], Event = "Z"))[order(c(setdiff(seq_len(n), i2), i2)),]
row.names(res) <- NULL
res
#   Event Price Type       Date  Time
#1      A   100 Sell 27-01-2018 12:00
#2      C   200  Buy 27-01-2018 12:15
#3      C   300  Buy 27-01-2018 12:30
#4      D   350 Sell 27-01-2018 12:31
#5      A   320  Buy 27-01-2018 12:32
#6      Z   321 Sell 27-01-2018 12:32
#7      B   321 Sell 27-01-2018 12:32
#8      B   220  Buy 27-01-2018 12:34
#9      L   550  Buy 27-01-2018 12:35
#10     A   320  Buy 27-01-2018 12:32
#11     Z   320 Sell 27-01-2018 12:32
#12     B   320 Sell 27-01-2018 12:32

+4

akrun Feb 24 '18 at 13:47

source share

CPak · Accepted Answer · 2018-02-24T15:15:22+0000

Alternative approach tidyverse

library(tidyverse)
df %>%
  group_by(G = cumsum(Event == "B" & dplyr::lag(Event, 1, default=NA) == "A")) %>%
  do(rbind(mutate(head(., 1), Event = "Z"), .)) %>%
  ungroup() %>%
  slice(-1) %>%
  select(-G)

# A tibble: 12 x 5
   # Event Price Type  Date       Time 
   # <chr> <int> <chr> <chr>      <chr>
 # 1 A       100 Sell  27-01-2018 12:00
 # 2 C       200 Buy   27-01-2018 12:15
 # 3 C       300 Buy   27-01-2018 12:30
 # 4 D       350 Sell  27-01-2018 12:31
 # 5 A       320 Buy   27-01-2018 12:32
 # 6 Z       321 Sell  27-01-2018 12:32
 # 7 B       321 Sell  27-01-2018 12:32
 # 8 B       220 Buy   27-01-2018 12:34
 # 9 L       550 Buy   27-01-2018 12:35
# 10 A       320 Buy   27-01-2018 12:32
# 11 Z       320 Sell  27-01-2018 12:32
# 12 B       320 Sell  27-01-2018 12:32

Data

df <- read.table(text="Event   Price   Type    Date    Time
A       100      Sell   27-01-2018 12:00
C       200      Buy    27-01-2018 12:15
C       300      Buy    27-01-2018 12:30
D       350      Sell   27-01-2018 12:31
A       320      Buy    27-01-2018 12:32
B       321      Sell   27-01-2018 12:32
B       220      Buy    27-01-2018 12:34
L       550      Buy    27-01-2018 12:35
A       320      Buy    27-01-2018 12:32
B       320      Sell   27-01-2018 12:32", header=TRUE, stringsAsFactors=FALSE)

Conditional row insertion based on sequential values ​​in a column in R

More articles:

Conditional row insertion based on sequential values in a column in R