Extract numbers from sentences

I need to extract some numbers from the text. Text

x <- "Lorem ipsum dolor sit amet[245], consectetur adipiscing (325). Deinde prima illa, quae in congressu[232]. solemus: Quid tu, inquit, huc? Sequitur disserendi ratio cognitioque 295. naturae;" 

The numbers to be extracted are 325 and 232. They are enclosed in brackets and at the end of the sentence. Other numbers excluded. I tried strsplit(text, "[A-Za-z]+") but didn’t get what I need.

+5
source share
4 answers

Here is a stringi approach

 x <- "Lorem ipsum dolor sit amet[245], consectetur adipiscing (325). Deinde prima illa, quae in congressu[232]. solemus: Quid tu, inquit, huc? Sequitur disserendi ratio cognitioque 295. naturae; Claudii libidini, qui tum erat summo ne imperio, dederetur" library(stringi) stri_extract_all_regex(x, "(?<=[\\[(])\\d+(?=[\\])][.?!])") ## [[1]] ## [1] "325" "232" 
+5
source

Other:

 r <- gregexpr("[[(]\\d+[])](?=\\.)", text, perl = TRUE) (m <- regmatches(text, r)[[1]]) # [1] "(325)" "[232]" as.integer(gsub("\\D", "", m)) # [1] 325 232 
+4
source

Here is a solution using strsplit ....

 > x <- 'Lorem ipsum dolor sit amet[245], consectetur adipiscing (325). Deinde prima illa, quae in congressu[232]. solemus: Quid tu, inquit, huc? Sequitur disserendi ratio cognitioque 295. naturae;' > strsplit(x, '[^0-9]+')[[1]][3:4] ## [1] "325" "232" 

Or using the R base to extract these values.

 > regmatches(x, gregexpr('[[(]\\K\\d+(?=[])](?!,))', x, perl=T))[[1]] ## [1] "325" "232" 
+3
source

With re module

 import re string="Lorem ipsum dolor sit amet[245], consectetur adipiscing (325). Deinde prima illa, quae in congressu[232]. solemus: Quid tu, inquit, huc? Sequitur disserendi ratio cognitioque 295. naturae;" print string pattern = re.compile(r'(?<=[\[(])\d+(?=[\])]\.)') result = pattern.findall(string) print result 
0
source

Source: https://habr.com/ru/post/1201040/


All Articles