Change column from date of birth to age in r

I am using data.table for the first time.

I have about 400,000 people in my column. I need to convert them from date of birth to age.

What is the best way to do this?

+8
source share
7 answers

From the comments on this blog post, I found the age_calc function in the eeptools package. He takes care of extreme cases (leap years, etc.), checks the input data and looks pretty confident.

 library(eeptools) x <- as.Date(c("2011-01-01", "1996-02-29")) age_calc(x[1],x[2]) # default is age in months 

[1] 46.73333 224.83118

 age_calc(x[1],x[2], units = "years") # but you can set it to years 

[1] 3.893151 18.731507

 floor(age_calc(x[1],x[2], units = "years")) 

[1] 3 18

For your data

 yourdata$age <- floor(age_calc(yourdata$birthdate, units = "years")) 

Assuming you need age in whole years.

+18
source

I thought about it and am still unhappy with the two answers. I like to use lubridate , as @KFB did, but I also want everything to be well complemented by the function, as in my answer, using the eeptools package. So here we use the wrapper function using the lubridate interval method with some good parameters:

 #' Calculate age #' #' By default, calculates the typical "age in years", with a #' \code{floor} applied so that you are, eg, 5 years old from #' 5th birthday through the day before your 6th birthday. Set #' \code{floor = FALSE} to return decimal ages, and change \code{units} #' for units other than years. #' @param dob date-of-birth, the day to start calculating age. #' @param age.day the date on which age is to be calculated. #' @param units unit to measure age in. Defaults to \code{"years"}. Passed to \link{\code{duration}}. #' @param floor boolean for whether or not to floor the result. Defaults to \code{TRUE}. #' @return Age in \code{units}. Will be an integer if \code{floor = TRUE}. #' @examples #' my.dob <- as.Date('1983-10-20') #' age(my.dob) #' age(my.dob, units = "minutes") #' age(my.dob, floor = FALSE) age <- function(dob, age.day = today(), units = "years", floor = TRUE) { calc.age = interval(dob, age.day) / duration(num = 1, units = units) if (floor) return(as.integer(floor(calc.age))) return(calc.age) } 

Examples of using:

 > my.dob <- as.Date('1983-10-20') > age(my.dob) [1] 31 > age(my.dob, floor = FALSE) [1] 31.15616 > age(my.dob, units = "minutes") [1] 16375680 > age(seq(my.dob, length.out = 6, by = "years")) [1] 31 30 29 28 27 26 
+18
source

Suppose you have a data table. You can do the following:

 library(data.table) library(lubridate) # toy data X = data.table(birth=seq(from=as.Date("1970-01-01"), to=as.Date("1980-12-31"), by="year")) Sys.Date() 

Option 1: use "as.period" from the lubriate package

 X[, age := as.period(Sys.Date() - birth)][] birth age 1: 1970-01-01 44y 0m 327d 0H 0M 0S 2: 1971-01-01 43y 0m 327d 6H 0M 0S 3: 1972-01-01 42y 0m 327d 12H 0M 0S 4: 1973-01-01 41y 0m 326d 18H 0M 0S 5: 1974-01-01 40y 0m 327d 0H 0M 0S 6: 1975-01-01 39y 0m 327d 6H 0M 0S 7: 1976-01-01 38y 0m 327d 12H 0M 0S 8: 1977-01-01 37y 0m 326d 18H 0M 0S 9: 1978-01-01 36y 0m 327d 0H 0M 0S 10: 1979-01-01 35y 0m 327d 6H 0M 0S 11: 1980-01-01 34y 0m 327d 12H 0M 0S 

Option 2: if you do not like the format of option 1, you can do the following:

 yr = duration(num = 1, units = "years") X[, age := new_interval(birth, Sys.Date())/yr][] # you get birth age 1: 1970-01-01 44.92603 2: 1971-01-01 43.92603 3: 1972-01-01 42.92603 4: 1973-01-01 41.92329 5: 1974-01-01 40.92329 6: 1975-01-01 39.92329 7: 1976-01-01 38.92329 8: 1977-01-01 37.92055 9: 1978-01-01 36.92055 10: 1979-01-01 35.92055 11: 1980-01-01 34.92055 

Believe me, option 2 should be more desirable.

+3
source
 (Sys.Date() - yourDate) / 365.25 
+1
source

I was not happy with the answers when it comes to calculating the age in months or years, when it comes to leap years, so this is my function using the lubridate package.

Basically, it cuts the interval between from and to in (to) annual chunks, and then adjusts the interval to see if this chunk is a leap year or not. The full interval is the sum of the age of each block.

 library(lubridate) #' Get Age of Date relative to Another Date #' #' @param from,to the date or dates to consider #' @param units the units to consider #' @param floor logical as to whether to floor the result #' @param simple logical as to whether to do a simple calculation, a simple calculation doesn't account for leap year. #' @author Nicholas Hamilton #' @export age <- function(from, to = today(), units = "years", floor = FALSE, simple = FALSE) { #Account for Leap Year if Working in Months and Years if(!simple && length(grep("^(month|year)",units)) > 0){ df = data.frame(from,to) calc = sapply(1:nrow(df),function(r){ #Start and Finish Points st = df[r,1]; fn = df[r,2] #If there is no difference, age is zero if(st == fn){ return(0) } #If there is a difference, age is not zero and needs to be calculated sign = +1 #Age Direction if(st > fn){ tmp = st; st = fn; fn = tmp; sign = -1 } #Swap and Change sign #Determine the slice-points mid = ceiling_date(seq(st,fn,by='year'),'year') #Build the sequence dates = unique( c(st,mid,fn) ) dates = dates[which(dates >= st & dates <= fn)] #Determine the age of the chunks chunks = sapply(head(seq_along(dates),-1),function(ix){ k = 365/( 365 + leap_year(dates[ix]) ) k*interval( dates[ix], dates[ix+1] ) / duration(num = 1, units = units) }) #Sum the Chunks, and account for direction sign*sum(chunks) }) #If Simple Calculation or Not Months or Not years }else{ calc = interval(from,to) / duration(num = 1, units = units) } if (floor) calc = as.integer(floor(calc)) calc } 
0
source

I prefer to do this with the lubridate package, a borrow syntax that I first encountered in another post.

It is necessary to standardize input dates in terms of R date objects, preferably using lubridate::mdy() or lubridate::ymd() or similar functions, if applicable. You can use the interval() function to generate an interval describing the time elapsed between the two dates, and then use the duration() function to determine how this interval should be β€œcubed”.

I have summarized the simplest case for calculating the age of two dates below using the most recent syntax in R.

 df$DOB <- mdy(df$DOB) df$EndDate <- mdy(df$EndDate) df$Calc_Age <- interval(start= df$DOB, end=df$EndDate)/ duration(n=1, unit="years") 

Age can be rounded to the nearest full integer using the basic function R 'floor () `, for example:

 df$Calc_AgeF <- floor(df$Calc_Age) 

Alternatively, the digits= argument in the base function R round() can be used to round up or down and specify the exact number of decimal places in the return value, for example:

 df$Calc_Age2 <- round(df$Calc_Age, digits = 2) ## 2 decimals df$Calc_Age0 <- round(df$Calc_Age, digits = 0) ## nearest integer 

It is worth noting that as soon as the input dates are transmitted using the calculation step described above (i.e. the interval() and duration() functions), the return value will be numeric and will no longer be a date object in R. This is important while lubridate::floor_date() is strictly limited to date objects.

The above syntax works regardless of whether there are input dates in a data.table or data.frame .

0
source

I wanted an implementation that did not increase my dependencies outside of data.table , which is usually my only dependency. data.table is only needed for mday, which means the day of the month.

This is a function since my brain works when I consider someone as old as:

 require(data.table) agecalc <- function(origin, current){ y <- year(current) - year(origin) - 1 offset <- 0 if(month(current) > month(origin)) offset <- 1 if(month(current) == month(origin) & mday(current) >= mday(origin)) offset <- 1 age <- y + offset return(age) } 

This is the same logic, reorganized and vectorized:

 agecalc <- function(origin, current){ age <- year(current) - year(origin) - 1 ii <- (month(current) > month(origin)) | (month(current) == month(origin) & mday(current) >= mday(origin)) age[ii] <- age[ii] + 1 return(age) } 

I could imagine scenarios in which string comparisons could be faster if you specify the year as a number and the date of birth as a string.

 agecalc <- function(origin, current){ origin <- as.character(origin) current <- as.character(current) age <- as.numeric(substr(current, 1, 4)) - as.numeric(substr(origin, 1, 4)) - 1 if(substr(current, 6, 10) >= substr(origin, 6, 10)){ age <- age + 1 } return(age) } 

Some tests:

 agecalc(as.IDate("1985-08-13"), as.IDate("1985-08-12")) agecalc(as.IDate("1985-08-13"), as.IDate("1985-08-13")) agecalc(as.IDate("1985-08-13"), as.IDate("1986-08-12")) agecalc(as.IDate("1985-08-13"), as.IDate("1986-08-13")) agecalc(as.IDate("1985-08-13"), as.IDate("1986-09-12")) agecalc(as.IDate("2000-02-29"), as.IDate("2000-02-28")) agecalc(as.IDate("2000-02-29"), as.IDate("2000-02-29")) agecalc(as.IDate("2000-02-29"), as.IDate("2001-02-28")) agecalc(as.IDate("2000-02-29"), as.IDate("2001-02-29")) agecalc(as.IDate("2000-02-29"), as.IDate("2001-03-01")) agecalc(as.IDate("2000-02-29"), as.IDate("2004-02-28")) agecalc(as.IDate("2000-02-29"), as.IDate("2004-02-29")) agecalc(as.IDate("2000-02-29"), as.IDate("2011-03-01")) ## Requires vectorized version: d <- data.table(d=as.IDate("2000-01-01") + 0:10000) d[ , b1 := as.IDate("2000-08-15")] d[ , b2 := as.IDate("2000-02-29")] d[ , age1_num := (d - b1) / 365] d[ , age2_num := (d - b2) / 365] d[ , age1 := agecalc(b1, d)] d[ , age2 := agecalc(b2, d)] d 
0
source

Source: https://habr.com/ru/post/1274381/


All Articles