Fill in the missing year in an ordered list of dates

I collected some time series data from the Internet, and the timestamp I received looks below.

24 Jun 21 Mar 20 Jan 10 Dec 20 Jun 20 Jan 10 Dec ... 

The interesting part is that the year is missing from the data, however, all the records are ordered, and you can get the year out of the record and fill in the missing data. Therefore, the data after imputation should be as follows:

 24 Jun 2014 21 Mar 2014 20 Jan 2014 10 Dec 2013 20 Jun 2013 20 Jan 2013 10 Dec 2012 ... 

Before you raise your sleeves and start writing a for loop with nested logic .. there is a simple way that can work out of the box in R to ascribe the missing year.

Thanks so much for any suggestion!

+5
source share
2 answers

Here is one idea

 ## Make data easily reproducible df <- data.frame(day=c(24, 21, 20, 10, 20, 20, 10), month = c("Jun", "Mar", "Jan", "Dec", "Jun", "Jan", "Dec")) ## Convert each month-day combo to its corresponding "julian date" datestring <- paste("2012", match(df[[2]], month.abb), df[[1]], sep = "-") date <- strptime(datestring, format = "%Y-%m-%d") julian <- as.integer(strftime(date, format = "%j")) ## Transitions between years occur wherever julian date increases between ## two observations df$year <- 2014 - cumsum(diff(c(julian[1], julian))>0) ## Check that it worked df # day month year # 1 24 Jun 2014 # 2 21 Mar 2014 # 3 20 Jan 2014 # 4 10 Dec 2013 # 5 20 Jun 2013 # 6 20 Jan 2013 # 7 10 Dec 2012 
+5
source

The OP requested completion of the years in descending order starting in 2014.

Here is an alternative approach that works without converting dates and fake dates. In addition, this approach can be modified to work with fiscal years that start on a different month than January.

 # create sample dataset df <- data.frame( day = c(24L, 21L, 20L, 10L, 20L, 20L, 21L, 10L, 30L, 10L, 10L, 7L), month = c("Jun", "Mar", "Jan", "Dec", "Jun", "Jan", "Jan", "Dec", "Jan", "Jan", "Jan", "Jun")) df$year <- 2014 - cumsum(c(0L, diff(100L*as.integer( factor(df$month, levels = month.abb)) + df$day) > 0)) df 
  day month year 1 24 Jun 2014 2 21 Mar 2014 3 20 Jan 2014 4 10 Dec 2013 5 20 Jun 2013 6 20 Jan 2013 7 21 Jan 2012 8 10 Dec 2011 9 30 Jan 2011 10 10 Jan 2011 11 10 Jan 2011 12 7 Jun 2010 

The end of fiscal years

Suppose a business decides to start its fiscal year on February 1. Thus, January is in a different fiscal year than in February or March of the same calendar year.

To cope with fiscal years, we just need to shuffle the factor levels accordingly:

 df$fy <- 2014 - cumsum(c(0L, diff(100L*as.integer( factor(df$month, levels = month.abb[c(2:12, 1)])) + df$day) > 0)) df 
  day month year fy 1 24 Jun 2014 2014 2 21 Mar 2014 2014 3 20 Jan 2014 2013 4 10 Dec 2013 2013 5 20 Jun 2013 2013 6 20 Jan 2013 2012 7 21 Jan 2012 2011 8 10 Dec 2011 2011 9 30 Jan 2011 2010 10 10 Jan 2011 2010 11 10 Jan 2011 2010 12 7 Jun 2010 2010 
0
source

Source: https://habr.com/ru/post/1201675/


All Articles