I have a data frame in R which must have duplicates. However, there are some duplicates that I will need to remove. In particular, I only want to remove adjacent adjacent duplicates, but the rest. For example, suppose I had a data frame:
df = data.frame(x = c("A", "B", "C", "A", "B", "C", "A", "B", "B", "C"), y = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
The result is the following data frame
xy A 1 B 2 C 3 A 4 B 5 C 6 A 7 B 8 B 9 C 10
In this case, I expect "A, B, C, A, B, C, etc." to be repeated there. However, this is only a problem if I see duplicates next to the lines . In my example above, these will be lines 8 and 9, where the duplicate "B" is next to each other.
In my dataset, every time this happens, the first instance is always a user error, and the second is always the correct version. In very rare cases, there may be an instance where duplicates occur 3 (or more) times. However, in each case, I would always like to keep the last event. So, following the example above, I would like the final dataset to look like
A 1 B 2 C 3 A 4 B 5 C 6 A 7 B 9 C 10
Is there an easy way to do this in R? Thank you in advance for your help!
Edit: 11/19/2014 12:14 PM EST There was a solution submitted by Akron (spelling?), Which has since been deleted. Now I'm sure why, because it worked for me?
The decision was
df = df[with(df, c(x[-1]!= x[-nrow(df)], TRUE)),]
It seems to work for me, why was it deleted? For example, in cases with more than two consecutive duplicates:
df = data.frame(x = c("A", "B", "B", "B", "C", "C", "C", "A", "B", "C", "A", "B", "B", "C"), y = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)) xy 1 A 1 2 B 2 3 B 3 4 B 4 5 C 5 6 C 6 7 C 7 8 A 8 9 B 9 10 C 10 11 A 11 12 B 12 13 B 13 14 C 14 > df = df[with(df, c(x[-1]!= x[-nrow(df)], TRUE)),] > df xy 1 A 1 4 B 4 7 C 7 8 A 8 9 B 9 10 C 10 11 A 11 13 B 13 14 C 14
Does this seem to work?