Extending a function that takes a data.table as an argument to use a full table (instead of a subset)

I have a function that works for data.table (data.frame) from 1 row, but does not work for a full data table. I would like to extend the function to accommodate all input lines. Table.

The essence of the argument is:

A data table ( tryshort3 ), where the field is a row, must be replaced with another row from another data table. ( mapping ), MRE as follows:

 #this is the original data.table tryshort3 <- structure(list(country = c("AT", "AT", "MT", "DE", "CH", "XK" ), name = c("ASDF AG", "ASDF GMBH", "ASDF DF", "ASDF KG", "ASDF SA", "ASDF DAF"), address = c("ACDSTR. 3", "ACDSTR. 4", "ACDSTR. 5", "ACDSTR. 6", "ACDSTR. 7", "ACDSTR. 8")), .Names = c("country", "name", "address"), row.names = c(NA, -6L), class = c("data.table", "data.frame")) #this is the "mapping mapping <- structure(list(country = c("AT", "AT", "DE", "DE", "HU"), short.form = c("AG", "GMBH", "GMBH", "EV", "EV"), long.form = c("AKTIENGESELLSCHAFT", "GESELLSCHAFT MIT BESCHRANKTER HAFTUNG", "GESELLSCHAFT MIT BESCHRANKTER HAFTUNG", "EINGETRAGENE VEREIN", "EGYENI VALLALKOZO")), .Names = c("country", "short.form", "long.form"), row.names = c(NA, -5L), class = c("data.table", "data.frame"), sorted = "country") #this is the function that I am using (please not that both data.tables are keyed, but that has currently no say in the output (just avoids throwing an error): substituting_short_form <- function(input) { #supply one data.frame of 1 row, the other data.frame is external to the function #get country from input setkey(input,country) setkey(mapping,country) matched_country <- input$country #subset of mapping to only the country from the input matched_map <- mapping[country == matched_country] #get list of short.forms from matched list_of_relevant_short_forms <- matched_map[,short.form] #which one matches will return true if there is any match, THIS IS A NUMBER THAT WILL HAVE TO BE MATCHED TO mapping again to retrieve the correct form #error catching for when there is no short form found, or no country found if there is no long form it does not matter! indextrue <- tryCatch(which(unlist(lapply(list_of_relevant_short_forms, function(y) grepl(y, input$name)))), error = function(e) return(input)) #substitute pattern_to_substitute <- paste0("(\\s|^)", matched_map[indextrue,short.form], "(\\s|$)") pattern_to_replace <- paste0("\\1", matched_map[indextrue,long.form], "\\2") input$name[1] <- gsub(pattern = pattern_to_substitute, replacement = pattern_to_replace,input$name , perl = TRUE) return(input) } 

In short, what this function does is accept tryshort3 as an input (currently it works only with tryshort3[1,] ) and replace the name found in the mapping table with the name field, for example:

 > tryshort3[1,] country name address 1: AT ASDF AG ACDSTR. 3 > substituting_short_form(tryshort3[1,]) country name address 1: AT ASDF AKTIENGESELLSCHAFT ACDSTR. 3 

What I would like, I provide the full data.table as input and get the same output (the data table is the same length), here is my expected result:

  country name address 1: AT ASDF AKTIENGESELLSCHAFT ACDSTR. 3 2: AT ASDF GESELLSCHAFT MIT BESCHRANKTER HAFTUNG ACDSTR. 4 3: CH ASDF SA ACDSTR. 7 4: DE ASDF KG ACDSTR. 6 5: MT ASDF DF ACDSTR. 5 6: XK ASDF DAF ACDSTR. 8 

The solution I would like would be some of the apply(tryshort3, 1, function(x) substituting_short_form(x) ) function apply(tryshort3, 1, function(x) substituting_short_form(x) ) , possibly using the indexing capabilities of both data.tables and possibly using gapply from nlme from the inside ?

+5
source share
2 answers

Perhaps you can try a few steps:

 # create the shortform variable in tryshort3 tryshort3[, short.form := sub(".+\\s([^s]+)$", "\\1", name)] # add the info from mapping tryshort3long <- merge(tryshort3, mapping, all.x=TRUE, by=c("country", "short.form")) # replace the short form by long form in the name and suppress the variables you don't need # (thanks to @DavidArenburg for the simplification of the "replace" part!) tryshort3long[!is.na(long.form), name := paste(sub(" .*", "", name), long.form) ][, c("long.form", "short.form") := NULL] tryshort3long # country name address # 1: AT ASDF AKTIENGESELLSCHAFT ACDSTR. 3 # 2: AT ASDF GESELLSCHAFT MIT BESCHRANKTER HAFTUNG ACDSTR. 4 # 3: CH ASDF SA ACDSTR. 7 # 4: DE ASDF KG ACDSTR. 6 # 5: MT ASDF DF ACDSTR. 5 # 6: XK ASDF DAF ACDSTR. 8 

NB: sorry, I just put it for your data.table example, not as a function

+4
source

The problem with apply is that it will force the argument to the matrix. Try a simple loop:

 lst <- list() for(i in 1:nrow(tryshort3)) lst[[i]] <- substituting_short_form(tryshort3[i,]) rbindlist(lst) # country name address # 1: AT ASDF AKTIENGESELLSCHAFT ACDSTR. 3 # 2: AT ASDF GESELLSCHAFT MIT BESCHRANKTER HAFTUNG ACDSTR. 4 # 3: MT ASDF DF ACDSTR. 5 # 4: DE ASDF KG ACDSTR. 6 # 5: CH ASDF SA ACDSTR. 7 # 6: XK ASDF DAF ACDSTR. 8 
+3
source

Source: https://habr.com/ru/post/1243321/


All Articles