Remove from the string all but the selected characters

I want to remove from the string all characters that are not numbers, minus signs or decimal points.

I imported data from Excel using read.xlsthat include some weird characters. I need to convert them to numeric. I'm not very familiar with regular expressions, so you need an easier way to do the following:

excel_coords <- c(" 19.53380ݰ", " 20.02591°", "-155.91059°", "-155.8154°")
unwanted <- unique(unlist(strsplit(gsub("[0-9]|\\.|-", "", excel_coords), "")))
clean_coords <- gsub(do.call("paste", args = c(as.list(unwanted), sep="|")), 
                     replacement = "", x = excel_coords)

> clean_coords
[1] "19.53380"   "20.02591"   "-155.91059" "-155.8154" 

A bonus if someone tells me why these symbols appeared in some of my data (degree signs are part of the original Excel worksheet, and the rest are not).

+3
source share
3 answers

Short and sweet. Thanks to the comment of G. Grothendieck.

gsub("[^-.0-9]", "", excel_coords)

http://stat.ethz.ch/R-manual/R-patched/library/base/html/regex.html: " - , [], ; ^, , ."

+5

strsplit, sapply paste , :

 excel_coords <- c(" 19.53380ݰ", " 20.02591°", "-155.91059°", "-155.8154°")
 correct_chars <- c(0:9,"-",".")
 sapply(strsplit(excel_coords,""), 
          function(x)paste(x[x%in%correct_chars],collapse=""))

[1] "19.53380"   "20.02591"   "-155.91059" "-155.8154" 
+2
gsub("(.+)([[:digit:]]+\\.[[:digit:]]+)(.+)", "\\2", excel_coords)
[1] "9.53380" "0.02591" "5.91059" "5.8154" 
+1
source

Source: https://habr.com/ru/post/1790590/