How to extract and translate months from a string variable

I have months.frame data in English and Dutch:

library(stringr)
months.english <- month.name
months.dutch <- c("januari", "februari", "maart", "april", "mei", "juni","juli", "augustus", "september", "oktober", "november", "december")
months <- data.frame(months.english, months.dutch)

I also have a date variable, which is somewhat unstructured in both English and Dutch:

time <- (c("1 januari 2001", "12 december 2001", "December 9 2001", "2001 maart 13"))
time <- data.frame(time)
time$months <- NA

I want to do the following: during data.frame, I want the month variable to be the month from the date string, but for those months that are in Dutch, I want the English translation to be such that dates $ month equal to c (" January "," December "," December "," March ").

How can I do this faster, possibly by preventing a for loop (since the actual data file has more than 100,000 lines)?

+4
source share
2

stringi.

1: :

library(stringi)
m <- stri_extract_first(tolower(dates$time), 
          regex = paste(months$months.dutch, collapse = "|"))

2: :

dates$months <- months$months.english[match(m, months$months.dutch)]

100 . .

:

dates
#              time   months
#1   1 januari 2001  January
#2 12 december 2001 December
#3  December 9 2001 December
#4    2001 maart 13    March
+3
library(stringr)
library(dplyr)
months.english <- month.name
months.dutch <- c("januari", "februari", "maart", "april", "mei", "juni","juli", "augustus", "september", "oktober", "november", "december")
months <- data.frame(months.english, months.dutch)

mtable <- data_frame( key = c(months.dutch, months.english),
                      months = rep(months.english, 2))

time <- (c("1 januari 2001", "12 december 2001", "December 9 2001", "2001 maart 13"))
time <- data_frame(time) %>%
  mutate(translate = str_extract(time, "[A-Za-z]+")) %>%
  left_join(mtable, by = c('translate' = 'key'))

tidyverse, .

+3

Source: https://habr.com/ru/post/1681789/


All Articles