I have a function that works for data.table (data.frame) from 1 row, but does not work for a full data table. I would like to extend the function to accommodate all input lines. Table.
The essence of the argument is:
A data table ( tryshort3 ), where the field is a row, must be replaced with another row from another data table. ( mapping ), MRE as follows:
#this is the original data.table tryshort3 <- structure(list(country = c("AT", "AT", "MT", "DE", "CH", "XK" ), name = c("ASDF AG", "ASDF GMBH", "ASDF DF", "ASDF KG", "ASDF SA", "ASDF DAF"), address = c("ACDSTR. 3", "ACDSTR. 4", "ACDSTR. 5", "ACDSTR. 6", "ACDSTR. 7", "ACDSTR. 8")), .Names = c("country", "name", "address"), row.names = c(NA, -6L), class = c("data.table", "data.frame")) #this is the "mapping mapping <- structure(list(country = c("AT", "AT", "DE", "DE", "HU"), short.form = c("AG", "GMBH", "GMBH", "EV", "EV"), long.form = c("AKTIENGESELLSCHAFT", "GESELLSCHAFT MIT BESCHRANKTER HAFTUNG", "GESELLSCHAFT MIT BESCHRANKTER HAFTUNG", "EINGETRAGENE VEREIN", "EGYENI VALLALKOZO")), .Names = c("country", "short.form", "long.form"), row.names = c(NA, -5L), class = c("data.table", "data.frame"), sorted = "country") #this is the function that I am using (please not that both data.tables are keyed, but that has currently no say in the output (just avoids throwing an error): substituting_short_form <- function(input) { #supply one data.frame of 1 row, the other data.frame is external to the function #get country from input setkey(input,country) setkey(mapping,country) matched_country <- input$country #subset of mapping to only the country from the input matched_map <- mapping[country == matched_country] #get list of short.forms from matched list_of_relevant_short_forms <- matched_map[,short.form] #which one matches will return true if there is any match, THIS IS A NUMBER THAT WILL HAVE TO BE MATCHED TO mapping again to retrieve the correct form #error catching for when there is no short form found, or no country found if there is no long form it does not matter! indextrue <- tryCatch(which(unlist(lapply(list_of_relevant_short_forms, function(y) grepl(y, input$name)))), error = function(e) return(input)) #substitute pattern_to_substitute <- paste0("(\\s|^)", matched_map[indextrue,short.form], "(\\s|$)") pattern_to_replace <- paste0("\\1", matched_map[indextrue,long.form], "\\2") input$name[1] <- gsub(pattern = pattern_to_substitute, replacement = pattern_to_replace,input$name , perl = TRUE) return(input) }
In short, what this function does is accept tryshort3 as an input (currently it works only with tryshort3[1,] ) and replace the name found in the mapping table with the name field, for example:
> tryshort3[1,] country name address 1: AT ASDF AG ACDSTR. 3 > substituting_short_form(tryshort3[1,]) country name address 1: AT ASDF AKTIENGESELLSCHAFT ACDSTR. 3
What I would like, I provide the full data.table as input and get the same output (the data table is the same length), here is my expected result:
country name address 1: AT ASDF AKTIENGESELLSCHAFT ACDSTR. 3 2: AT ASDF GESELLSCHAFT MIT BESCHRANKTER HAFTUNG ACDSTR. 4 3: CH ASDF SA ACDSTR. 7 4: DE ASDF KG ACDSTR. 6 5: MT ASDF DF ACDSTR. 5 6: XK ASDF DAF ACDSTR. 8
The solution I would like would be some of the apply(tryshort3, 1, function(x) substituting_short_form(x) ) function apply(tryshort3, 1, function(x) substituting_short_form(x) ) , possibly using the indexing capabilities of both data.tables and possibly using gapply from nlme from the inside ?