R match origin

I am using R and have the following line below:

s <- "\t\t\t \t\t\thello world ! \t\t\thello" 

I want to match the number of spaces at the beginning of the line, and not anywhere else. Therefore, gaps between content should be ignored and only the beginning should be considered. The result will be here "9".

I tried the following, but it only returns the number "1" ...

 sapply(regmatches(s, gregexpr('^(\\s)', s)), length) 

I am not very good at regular expression, any help is appreciated.

+6
source share
3 answers

To match the first occurrence, regexpr() would be more appropriate than gregexpr() . As a result of this switch, sapply() no longer needed because regexpr() returns an atomic vector, whereas gregexpr() returns a list.

You can use the following regular expression by looking at the match.length attribute from the regexpr() result.

 attr(regexpr("^\\s+", s), "match.length") # [1] 9 

Regular expression explanation:

  • ^ Put a regular expression at the beginning of a line.
  • \\s Space characters: tab, new line, vertical tab, feed, carriage return, and space.
  • + previous item will be matched one or more times.

Link: http://en.wikibooks.org/wiki/R_Programming/Text_Processing

+3
source

Another way to solve this problem is to bind to \G The \G function is a binding that can coincide in one of two positions; the beginning of a line or the point at which the last character of the last match is used.

 sapply(gregexpr("\\G\\s", s, perl = TRUE), length) # [1] 9 
+2
source

You can also try this.

 > sapply(gregexpr("[^\\h].*(*SKIP)(*F)|\\h", s, perl = TRUE), length) [1] 9 > sapply(gregexpr("\\S.*(*SKIP)(*F)|\\h", s, perl = TRUE), length) [1] 9 

\\h corresponds to horizontal spaces. \S matches a character without a space, and the following .* Matches all characters following a non-space character to the end of the line. (*SKIP)(*F) results in a match failure. A part next to | , that is, \h corresponds to all other horizontal spaces (i.e., the spaces that are present at the beginning).

0
source

Source: https://habr.com/ru/post/980909/


All Articles