Regex to split a string into key / value pairs when the number of pairs is a variable?

I am using Ruby 1.9, and I am wondering if there is an easy way to regex.

I have many lines that look like some changes:

str = "Allocation:  Random, Control:  Active Control, Endpoint Classification:  Safety Study, Intervention Model:  Parallel Assignment, Masking:  Double Blind (Subject, Caregiver, Investigator, Outcomes Assessor), Primary Purpose:  Treatment"

The idea is that I would like to break this line down into its functional components

  • Highlight: Random
  • Management: active management
  • Endpoint Classification: Security Studies
  • Intervention Model: Concurrent Administration
  • Disguise: double blind (topic, guardian, investigator, results, evaluator)
  • Primary goal: treatment

The "syntax" of a string is that there is a "key" that consists of one or more words or other characters (for example, an intervention model) followed by a colon (:). Each key has a corresponding "value (for example, Parallel assignment), which immediately follows the colon (:) ..." Value "consists of words, commas (anything), but the end of the" value "is signaled by a comma.

A pair of key / value pairs is a variable. I also assume that colons (:) cannot be part of the "value", and commas (,) aren 't allowed to be part of the "key".

, "regexy" , /, , . , ?

 regex = /(([^,]+?): ([^:]+?,))+?/
=> /(([^,]+?): ([^:]+?,))+?/
irb(main):139:0> str = "Allocation:  Random, Control:  Active Control, Endpoint Classification:  Safety Study, Intervention Model:  Parallel Assignment, Masking:  Double Blind (Subject, Caregiver, Investigator, Outcomes Assessor), Primary Purpose:  Treatment"
=> "Allocation:  Random, Control:  Active Control, Endpoint Classification:  Safety Study, Intervention Model:  Parallel Assignment, Masking:  Double Blind (Subject, Caregiver, Investigator, Outcomes Assessor), Primary Purpose:  Treatment"
irb(main):140:0> str.match regex
=> #<MatchData "Allocation:  Random," 1:"Allocation:  Random," 2:"Allocation" 3:" Random,">
irb(main):141:0> $1
=> "Allocation:  Random,"
irb(main):142:0> $2
=> "Allocation"
irb(main):143:0> $3
=> " Random,"
irb(main):144:0> $4
=> nil
+3
2
irb(main):003:0> pp Hash[ *str.split(/\s*([^,]+:)\s+/)[1..-1] ]
{"Allocation:"=>"Random,",
 "Control:"=>"Active Control,",
 "Endpoint Classification:"=>"Safety Study,",
 "Intervention Model:"=>"Parallel Assignment,",
 "Masking:"=>
  "Double Blind (Subject, Caregiver, Investigator, Outcomes Assessor),",
 "Primary Purpose:"=>"Treatment"}

, . , , .

+6

, , :

str.split(/((?:[^,]+?): (?:[^:]+?,(?![^\(]+?\))))+?/).delete_if(&:empty?).map{|s| s.strip.chomp(',')}

lookahead, , , . delete_if map .

+2

Source: https://habr.com/ru/post/1784322/


All Articles