I am trying to extract numbers from strings in R with a package stringr. Sometimes there are no numbers. Here are some sample lines:
str <- c(
"cash dividends per share $ - $ - $ - $ 0.08 $ 0.16 cash",
"cash dividends per share $ 0.01 $ 12.10 $ 0.01 $ 0.08 $ 0.16 hello",
"cash dividends per share $ - $ - $ 0.91 $ - $ 0.16 world",
"cash dividends per share - - 0.12 - 0.16 hsac",
"cash dividends per share $ - $ - $ - $ - $ 0.16 afterwards",
"cash dividends per share $0.12 $ - $0.1 $ - $ - comes",
"cash dividends per share 0.12 - 0.12 - - text",
"cash dividends per share... 0.12 - 0.12 - - random",
"cash dividends per share...0.123 0.321 - - 0.12 blu",
"cash dividends per share ..... $ 0.12 $ - $ 0.12 $ - $ - foo",
"cash dividends per share ..... $0.42 $0.42 $- $- $- bar")
I have constructed the following regular expression, which IMO should correspond to all cases, but this is not so. Of course, I also tried different options, but I can not figure out the correct one (I do not even see the problem with the one I came across):
library("stringr")
rgxp <- "cash dividends [declared]* per share[ \\.]+[\\$]?[ ]?([-0.9\\.]+)[ ]?[\\$]?[ ]?([-0.9\\.]+)[ ]?[\\$]?[ ]?([-0.9\\.]+)[ ]?[\\$]?[ ]?([-0.9\\.]+)[ ]?[\\$]?[ ]?([-0.9\\.]+).*"
str_match_all(str, rgxp)
Do you see any problem that causes the above regular expression to be called?
Edit: I had to say that my desired result is a vector with five elements, that is, numbers or a hyphen if there is no number. Thanks!
source
share