Mannequin for the first new item in the series.

Suppose I have a variable that lasts several periods. Like the number of years when I have an ipod. So, I had the first generation Ipod from 2001 to 2004, and then in 2005 I had Ipod 2 and so on. So my dataframe will look like this:

2001 Ipod1 2002 Ipod1 2003 Ipod1 2004 Ipod1 2005 Ipod2 2006 Ipod2 2007 Ipod2 2008 Ipod2 2009 Ipod3 2010 Ipod3 

I want to create a dummy for the period when a new variable will act so that I get:

  Year Var Dummy 2001 Ipod1 1 2002 Ipod1 0 2003 Ipod1 0 2004 Ipod1 0 2005 Ipod2 1 2006 Ipod2 0 2007 Ipod2 0 2008 Ipod2 0 2009 Ipod3 1 2010 Ipod3 0 

So far, I could do this:

 df = structure(list(Year = 2001:2010, Var = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L), .Label = c("Ipod1", "Ipod2", "Ipod3" ), class = "factor")), .Names = c("Year", "Var"), class = "data.frame", row.names = c(NA, -10L)) df$number.in.group = unlist(lapply(table(df$Var),seq.int)) df$dummy = ifelse(df$number.in.group == 1,1,0) df$dummy[1]=0 

In fact, I would like the first element of the dummy to be zero.

My question is: is there a way to make this better?

thanks

+4
source share
4 answers

How about this:

 df$Dummy <- as.numeric(!duplicated(df$Var)) # Or, if you want the first element to be 0, df$Dummy <- c(0, as.numeric(!duplicated(df$Var))[-1]) 
+9
source

I believe this gives the desired result:

 > df$Dummy <- c(0, diff(as.numeric(df$Var))) > df Year Var Dummy 1 2001 Ipod1 0 2 2002 Ipod1 0 3 2003 Ipod1 0 4 2004 Ipod1 0 5 2005 Ipod2 1 6 2006 Ipod2 0 7 2007 Ipod2 0 8 2008 Ipod2 0 9 2009 Ipod3 1 10 2010 Ipod3 0 

This works because Var is a factor, so use as.numeric works.

+5
source

The rle function rle very useful in such situations. It finds successive runs of the same element in the vector.

 rle_result = rle(as.character(df$Var)) rle_result Run Length Encoding lengths: int [1:3] 4 4 2 values : chr [1:3] "Ipod1" "Ipod2" "Ipod3" 

To create a new variable:

 df$new = 0 change_ids = 1 + cumsum(rle_result$lengths) df$new[change_ids[-length(change_ids)]] <- 1 df Year Var new 1 2001 Ipod1 0 2 2002 Ipod1 0 3 2003 Ipod1 0 4 2004 Ipod1 0 5 2005 Ipod2 1 6 2006 Ipod2 0 7 2007 Ipod2 0 8 2008 Ipod2 0 9 2009 Ipod3 1 10 2010 Ipod3 0 

what exactly are you looking for, I think.

+2
source

(1) The question asked for the Dummy column, but the sample answer in the question also created the column number.in.group , so I was not sure whether the column number.in.group or not; however, we assume below that this is necessary. Note that assigning 0 to the first Dummy element converts this column to numeric:

 within(df, { number.in.group <- ave(Year, Var, FUN = seq_along) Dummy <- number.in.group == 1 Dummy[1] <- 0 }) 

(2a) If number.in.group not required, and the groups from Var are contiguous, as in the example, then the preferred duplicated solution would be preferred, except that I think it would be a little clearer if it were written like this:

 df$Dummy <- !duplicated(df$Var) df$Dummy[1] <- 0 

although this requires another statement.

(2b) We may also prefer a non-destructive form:

 within(df, { Dummy <- !duplicated(Var) Dummy[1] <- 0 }) 
+2
source

Source: https://habr.com/ru/post/1394557/


All Articles