How to split a string without getting an empty string inserted into the array

Question

How to split a string without getting an empty string inserted into the array

I am having trouble splitting a character from a string using a regex if there is a match.

I want to separate either the character "m" or "f" from the first part of the string, assuming that the next character is one or more numbers, followed by optional whitespace characters, followed by a string from the array that I have.

I tried:

2.4.0 :006 > MY_SEPARATOR_TOKENS = ["-", " to "] => ["-", " to "] 2.4.0 :008 > str = "M14-19" => "M14-19" 2.4.0 :011 > str.split(/^(m|f)\d+[[:space:]]*#{Regexp.union(MY_SEPARATOR_TOKENS)}/i) => ["", "M", "19"]

Notice the extraneous element "at the beginning of my array, and also note that the last expression is" 19 ", while I want everything else in the string (" 14-19 ").

How do I set up my regex so that only parts of the expression that get split get into the array?

+1

split ruby regex

Dave Mar 10 '17 at 10:56

source share

4 answers

I find match little more elegant when extracting characters from regular expressions in Ruby:

 string = "M14-19" string.match(/\A(?<m>[M|F])(?<digits>\d{2}(-| to )\d{2})/)[1, 2] => ["M", "14-19"] # also can extract the symbols from match extract_string = string.match(/\A(?<m>[M|F])(?<digits>\d{2}(-| to )\d{2})/) [[extract_string[:m], extract_string[:digits]] => ["M", "14-19"] string = 'M14 to 14' extract_string = string.match(/\A(?<m>[M|F])(?<digits>\d{2}(-| to )\d{2})/)[1, 2] => ["M", "14 to 14"]

+4

David gross Mar 10 '17 at 23:32

source share

You have an error occurring in your code. Not used to doing this:

 #{Regexp.union(MY_SEPARATOR_TOKENS)}

You are setting yourself up for a very difficult task.

Here's what happens:

 regex = Regexp.union(%w(ab)) # => /a|b/ /#{regex}/ # => /(?-mix:a|b)/ /#{regex.source}/ # => /a|b/

/(?-mix:a|b)/ - built-in subtasks with its set of regular expression flags m , i and x , which are independent of the surrounding template settings.

Consider this situation:

 'CAT'[/#{regex}/i] # => nil

We expect the regex flag i to match because it is case-insensitive, but the sub-expression still allows only lowercase letters, which results in a match failure.

Using bare (a|b) or adding source is successful because the internal expression gets the main expression i :

 'CAT'[/(a|b)/i] # => "A" 'CAT'[/#{regex.source}/i] # => "A"

See " How to include regular expressions in other regular expressions in Ruby " for an additional discussion of this subject.

+3

the tin man Mar 11 '17 at 0:36

source share

  TOKENS = ["-", " to "] r = / (?<=\A[mMfF]) # match the beginning of the string and then one # of the 4 characters in a positive lookbehind (?= # begin positive lookahead \d+ # match one or more digits [[:space:]]* # match zero or more spaces (?:#{TOKENS.join('|')}) # match one of the tokens ) # close the positive lookahead /x # free-spacing regex definition mode

(?:#{TOKENS.join('|')}) is replaced by (?:-| to ) .

This, of course, can be written in the usual way.

 r = /(?<=\A[mMfF])(?=\d+[[:space:]]*(?:#{TOKENS.join('|')}))/

When split by r you split between two characters (between a positive lookbehind and a positive look), so no characters are consumed.

 "M14-19".split r #=> ["M", "14-19"] "M14 to 19".split r #=> ["M", "14 to 19"] "M14 To 19".split r #=> ["M14 To 19"]

If you want ["M", "14 To 19"] be returned in the last example, change [mMfF] to [mf] and /x to /xi .

+3

Cary swoveland Mar 11 '17 at 1:30

source share

Wiktor stribiżew · Accepted Answer · 2017-03-10T23:06:54+0000

An empty element will always be there if you get a match, because the captured part appears at the beginning of the line, and the line between the beginning of the line and the match is added to the resulting array, whether it is an empty or non-empty line. Either shift / drop as soon as you get a match, or just delete all empty array elements with .reject { |c| c.empty? } .reject { |c| c.empty? } .reject { |c| c.empty? } (see How to remove empty elements from an array? ).

Then 14- eaten (consumed) by part of the template \d+[[:space:]]... - puts it in (?=...) lookahead, which will simply check the template for compliance, but will not use characters.

Use something like

 MY_SEPARATOR_TOKENS = ["-", " to "] s = "M14-19" puts s.split(/^(m|f)(?=\d+[[:space:]]*#{Regexp.union(MY_SEPARATOR_TOKENS)})/i).drop(1) #=> ["M", "14-19"]

See Ruby demo

How to split a string without getting an empty string inserted into the array

More articles: