Gsub returns an empty string if no match is found

I use the gsub function in R to return the occurrences of my template (reference numbers) in a list of text. This works fine if no match is found, in which case I get the whole string back, and not an empty string. Consider an example:

 data <- list("a sentence with citation (Ref. 12)", "another sentence without reference") sapply(data, function(x) gsub(".*(Ref. (\\d+)).*", "\\1", x)) 

Return:

 [1] "Ref. 12" "another sentence without reference" 

But I would like to get

 [1] "Ref. 12" "" 

Thanks!

+6
source share
5 answers

I would choose a different route, since sapply does not seem to me necessary, since these functions are already vectorized:

 fun <- function(x){ ind <- grep(".*(Ref. (\\d+)).*",x,value = FALSE) x <- gsub(".*(Ref. (\\d+)).*", "\\1", x) x[-ind] <- "" x } fun(data) 
+6
source

according to the documentation, this is the gsub function, which returns the input string; if there are no matches with the specified match patterns, it returns the entire string.

here, I first use the grepl function to return the logical vector of the presence / absence of a template in a given line:

 ifelse(grepl(".*(Ref. (\\d+)).*", data), gsub(".*(Ref. (\\d+)).*", "\\1", data), "") 

embedding this in a function:

 mygsub <- function(x){ ans <- ifelse(grepl(".*(Ref. (\\d+)).*", x), gsub(".*(Ref. (\\d+)).*", "\\1", x), "") return(ans) } mygsub(data) 
+2
source
 xs <- sapply(data, function(x) gsub(".*(Ref. (\\d+)).*", "\\1", x)) xs[xs==data] <- "" xs #[1] "Ref. 12" "" 
+1
source

Try strapplyc in the gsubfn package:

 library(gsubfn) L <- fn$sapply(unlist(data), ~ strapplyc(x, "Ref. \\d+")) unlist(fn$sapply(L, ~ ifelse(length(x), x, ""))) 

which gives the following:

 a sentence with citation (Ref. 12) another sentence without reference "Ref. 12" "" 

If you don't mind listing, you can just use L and forget about the last line of code. Note that the fn$ prefix turns the arguments of the formula of the function to which it is applied into function calls, so the first line of code can be written without fn as sapply(unlist(data), function(x) strapplyc(x, "Ref x. \\d+")) .

+1
source

You can try embedding grep( ..., value = T) in this function.

 data <- list("a sentence with citation (Ref. 12)", "another sentence without reference") unlist( sapply(data, function(x) { x <- gsub(".*(Ref. (\\d+)).*", "\\1", x) grep( "Ref\\.", x, value = T ) } ) ) 

The view is bulky, but does it work? It also removes the empty second link.

0
source

Source: https://habr.com/ru/post/913561/


All Articles