R: combine several gsub () functions in a pipe

To clear some messy data, I would like to start using the%>% pipes, but I cannot get the R code, if gsub () is not at the beginning of the channel, it should appear later (Note: this question is not related to proper import, but with data cleaning)

A simple example:

df <- cbind.data.frame(A= c("2.187,78 ", "5.491,28 ", "7.000,32 "), B = c("A","B","C")) 

Column A contains characters (in this case, numbers, but it can also be a string) and must be cleared. Steps

 df$D <- gsub("\\.","",df$A) df$D <- str_trim(df$D) df$D <- as.numeric(gsub(",", ".",df$D)) 

You can easily skip this

 df$D <- gsub("\\.","",df$A) %>% str_trim() %>% as.numeric(gsub(",", ".")) %>% 

The problem is the second gsub, because it requests Input ... which is actually the result of the previous line.

Please, can someone explain how to use functions like gsub () further down the pipeline? Thank you very much!

: R 3.2.3, Windows

+7
source share
4 answers

Try the following:

 library(stringr) df$D <- df$A %>% { gsub("\\.","", .) } %>% str_trim() %>% { as.numeric(gsub(",", ".", .)) } 

With the pipe, your data is passed as the first argument to the next function, so if you want to use it somewhere else, you need to wrap the next line in {} and use it . as a "data marker".

+15
source

Pipes are usually applied to the data frame as a whole, as this returns a cleared data frame. The idea of โ€‹โ€‹functional programming is that objects are immutable and do not change in place, but new objects are generated.

 library(dplyr) df %>% mutate(C = gsub("\\.", "", A)) %>% mutate(C = gsub(",", ".", C)) %>% mutate(C = as.numeric(C)) 

Also note that these alternatives work:

 df %>% mutate(C = gsub("\\.", "", A), C = gsub(",", ".", C), C = as.numeric(C)) df %>% mutate(C = read.table(text = gsub("[.]", "", A), dec = ",")[[1]]) df %>% mutate(C = type.convert(gsub("[.]", "", A), dec = ",")) 

For this particular example, type.convert seems most appropriate, as it compactly expresses at a high level what we intend to do. For comparison, gsub / as.numeric solutions seem too low and verbose, while read.table adds a transform to data.frame, which we need to undo by making it too high.

+8
source

The problem is that the argument that is supplied to the pipe must be the first in the argument list. But this does not apply to gsub() , since x is the third. A (verbose) workaround might be:

 df$A %>% gsub(pattern = "\\.", replacement="") %>% str_trim() %>% gsub(patter = ",", replacement = ".") %>% as.numeric 
+2
source

You can use str_replace(string, pattern, replacement) from the stringr package as a replacement for gsub . String functions follow a neat approach in which the string / character vector is the first argument.

 c("hello", "hi") %>% str_replace_all("[aeiou]", "x") 

See Introduction to stringr for more information on reasonably named and defined stringr functions as a replacement for default R string functions.

0
source

Source: https://habr.com/ru/post/1258078/


All Articles