Strange regex behavior in R

I have a simple web scraper that seems to behave strangely:
- in the desktop version of RStudio (version 3.3.3 on Windows) it behaves as expected and creates a numerical vector - in the server version of RStudio (launch version 1.4.1 on Linux) gsub()(and therefore the subsequent numerical conversion) fails, and the code creates the vector NAs.

Do you have any ideas what might make a difference?

library(rvest)

url <- "http://benzin.impuls.cz/benzin.aspx?strana=3"
impuls <- read_html(url, encoding = "windows-1250")

asdf <- impuls %>%
  html_table()

Benzin <- asdf[[1]]$X7

chrBenzin <- gsub("\\sKč","",Benzin)  # something is wrong here...

numBenzin <- as.double(chrBenzin)
numBenzin
+4
source share
1 answer

, U+00A0. Benzin (/ ideone.com):

enter image description here

, , .

, , - .

[[:space:]] TRE ( Base R). PCRE (*UCP) , , Unicode.

, Linux, , PCRE , PCRE ( , TRE):

gsub("(*UCP)\\s+Kč","",Benzin, perl=TRUE)

- Linux R:

Benzin <- "29.60 Kč"
gsub("(*UCP)\\s+Kč","",Benzin, perl=TRUE)
## => [1] "29.60"
+2

Source: https://habr.com/ru/post/1684576/


All Articles