Gsub () on multiple data frames in loop / lapply

I have two data frames with a column named โ€œHeaderโ€ in each containing a row. I need to reduce these lines in order to combine them. Now I want to make it as clean as possible in a loop, so I only need to write a gsub function only once.

Say I have:

df_1 <-read.table(text=" id Title 1 some_average_title 2 another:_one 3 the_third! 4 and_'the'_last ",header=TRUE,sep="") 

and

 df_2 <-read.table(text=" id Title 1 some_average.title 2 another:one 3 the_third 4 and_the_last ",header=TRUE,sep="") 

Now I run:

 df_1$Title <- gsub(" |\\.|'|:|!|\\'|_", "", df_1$Title ) df_2$Title <- gsub(" |\\.|'|:|!|\\'|_", "", df_2$Title ) 

I tried the following loop:

 for (dtfrm in c("dt_1", "df_2")) { assign(paste0(dtfrm, "$Title"), gsub(" |\\.|'|:|!|\\'|", "", get(paste0(dtfrm, "$Title"))) ) } 

but it does not work - despite the absence of error messages.

I also thought about lapply(list(dt_1, dt_2), function(w){ w$Title <- XXX }) , but I do not know what to put for XXX because gsub() needs the third argument in the list of strings.

+5
source share
3 answers

It works:

 for(df in c("df_1", "df_2")){ assign(df, transform(get(df), Title = gsub(" |\\.|'|:|!|\\'|_", "", Title))) } 

Testing:

 df_1 id Title 1 1 someaveragetitle 2 2 anotherone 3 3 thethird 4 4 andthelast 

and

  df_2 id Title 1 1 someaveragetitle 2 2 anotherone 3 3 thethird 4 4 andthelast 
+1
source

Somewhere between the comments of @David and @Carlos, answer by adding a little:

Use mget to capture data.frame s and list2env to copy the original data.frame if necessary.

mget + lapply will do the conversion ....

 lapply(mget(ls(pattern = "df_\\d")), function(w) transform(w, Title = gsub(" |\\.|'|:|!|\\'|_", "", Title))) # $df_1 # id Title # 1 1 someaveragetitle # 2 2 anotherone # 3 3 thethird # 4 4 andthelast # # $df_2 # id Title # 1 1 someaveragetitle # 2 2 anotherone # 3 3 thethird # 4 4 andthelast 

... but the result remains in list and does not affect the original data.frame s:

 # df_1 # id Title # 1 1 some_average_title # 2 2 another:_one # 3 3 the_third! # 4 4 and_'the'_last 

If you want to overwrite data.frame s, try:

 list2env( lapply(mget(ls(pattern = "df_\\d")), function(w) transform(w, Title = gsub(" |\\.|'|:|!|\\'|_", "", Title))), envir = .GlobalEnv) df_1 # id Title # 1 1 someaveragetitle # 2 2 anotherone # 3 3 thethird # 4 4 andthelast 
+1
source

get() allows you to programmatically capture your many data sets.
data.table() will be useful to easily change the columns in each of them.

 ## CREATING A FEW MORE DATA SETS df_3 <- df_2 df_4 <- df_1 set.seed(1) df_3$id <- sample(20, 4) df_4$id <- sample(20, 4) library(data.table) dt_1 <- as.data.table(df_1) dt_2 <- as.data.table(df_2) dt_3 <- as.data.table(df_3) dt_4 <- as.data.table(df_4) ## OR programatically: Numb_of_DTs <- 4 names_of_dt_objects <- paste("dt", 1:Numb_of_DTs, sep="_") # dt_1, dt_2, etc names_of_df_objects <- paste("df", 1:Numb_of_DTs, sep="_") # dt_1, dt_2, etc for (i in 1:Numb_of_DTs) assign(names_of_dt_objects[[i]], as.data.table(get(namse(names_of_df_objects[[i]])))) for (dt.nm in names_of_dt_objects) { get(dt.nm)[, Title := gsub("[ .':!_]", "", Title)] ## set the key for merging in the next step setkey(get(dt.nm), Title) ## You might want to insert a line to clean up the column names, using ## setnames(get(dt.nm), OLD_NAMES, NEW_NAMES) } Reduce(merge, lapply(names_of_dt_objects, function(x) get(x))) 
0
source

Source: https://habr.com/ru/post/1203257/


All Articles