How to split a string without getting an empty string inserted into the array

I am having trouble splitting a character from a string using a regex if there is a match.

I want to separate either the character "m" or "f" from the first part of the string, assuming that the next character is one or more numbers, followed by optional whitespace characters, followed by a string from the array that I have.

I tried:

2.4.0 :006 > MY_SEPARATOR_TOKENS = ["-", " to "] => ["-", " to "] 2.4.0 :008 > str = "M14-19" => "M14-19" 2.4.0 :011 > str.split(/^(m|f)\d+[[:space:]]*#{Regexp.union(MY_SEPARATOR_TOKENS)}/i) => ["", "M", "19"] 

Notice the extraneous element "at the beginning of my array, and also note that the last expression is" 19 ", while I want everything else in the string (" 14-19 ").

How do I set up my regex so that only parts of the expression that get split get into the array?

+1
source share
4 answers

An empty element will always be there if you get a match, because the captured part appears at the beginning of the line, and the line between the beginning of the line and the match is added to the resulting array, whether it is an empty or non-empty line. Either shift / drop as soon as you get a match, or just delete all empty array elements with .reject { |c| c.empty? } .reject { |c| c.empty? } .reject { |c| c.empty? } (see How to remove empty elements from an array? ).

Then 14- eaten (consumed) by part of the template \d+[[:space:]]... - puts it in (?=...) lookahead, which will simply check the template for compliance, but will not use characters.

Use something like

 MY_SEPARATOR_TOKENS = ["-", " to "] s = "M14-19" puts s.split(/^(m|f)(?=\d+[[:space:]]*#{Regexp.union(MY_SEPARATOR_TOKENS)})/i).drop(1) #=> ["M", "14-19"] 

See Ruby demo

0
source

I find match little more elegant when extracting characters from regular expressions in Ruby:

 string = "M14-19" string.match(/\A(?<m>[M|F])(?<digits>\d{2}(-| to )\d{2})/)[1, 2] => ["M", "14-19"] # also can extract the symbols from match extract_string = string.match(/\A(?<m>[M|F])(?<digits>\d{2}(-| to )\d{2})/) [[extract_string[:m], extract_string[:digits]] => ["M", "14-19"] string = 'M14 to 14' extract_string = string.match(/\A(?<m>[M|F])(?<digits>\d{2}(-| to )\d{2})/)[1, 2] => ["M", "14 to 14"] 
+4
source

You have an error occurring in your code. Not used to doing this:

 #{Regexp.union(MY_SEPARATOR_TOKENS)} 

You are setting yourself up for a very difficult task.

Here's what happens:

 regex = Regexp.union(%w(ab)) # => /a|b/ /#{regex}/ # => /(?-mix:a|b)/ /#{regex.source}/ # => /a|b/ 

/(?-mix:a|b)/ - built-in subtasks with its set of regular expression flags m , i and x , which are independent of the surrounding template settings.

Consider this situation:

 'CAT'[/#{regex}/i] # => nil 

We expect the regex flag i to match because it is case-insensitive, but the sub-expression still allows only lowercase letters, which results in a match failure.

Using bare (a|b) or adding source is successful because the internal expression gets the main expression i :

 'CAT'[/(a|b)/i] # => "A" 'CAT'[/#{regex.source}/i] # => "A" 

See " How to include regular expressions in other regular expressions in Ruby " for an additional discussion of this subject.

+3
source
  TOKENS = ["-", " to "] r = / (?<=\A[mMfF]) # match the beginning of the string and then one # of the 4 characters in a positive lookbehind (?= # begin positive lookahead \d+ # match one or more digits [[:space:]]* # match zero or more spaces (?:#{TOKENS.join('|')}) # match one of the tokens ) # close the positive lookahead /x # free-spacing regex definition mode 

(?:#{TOKENS.join('|')}) is replaced by (?:-| to ) .

This, of course, can be written in the usual way.

 r = /(?<=\A[mMfF])(?=\d+[[:space:]]*(?:#{TOKENS.join('|')}))/ 

When split by r you split between two characters (between a positive lookbehind and a positive look), so no characters are consumed.

 "M14-19".split r #=> ["M", "14-19"] "M14 to 19".split r #=> ["M", "14 to 19"] "M14 To 19".split r #=> ["M14 To 19"] 

If you want ["M", "14 To 19"] be returned in the last example, change [mMfF] to [mf] and /x to /xi .

+3
source

Source: https://habr.com/ru/post/1265982/


All Articles