Regex pop quiz of the day: D

If I have a line like this ..

 There                   is             a lot           of           white space.

And I want to remove all the unwanted space in Ruby regex .. How do you identify the space and delete it so that there is still at least one space between all the words?

So far, I:

gsub(/\s{2,}/, '')

But, as you can see, it brings a few words together.

+3
source share
2 answers

You're close After trimming the spaces left and right,

str.strip.gsub(/\s{2,}/, ' ')

replace any sets from several spaces with one space. This, of course, assumes that you are dealing only with actual spaces.

+11
source

, Perl- , . , , , , , . . , .

, , String.squeeze(' ') . :

#!/usr/bin/env ruby

require 'benchmark'

asdf = 'There                   is             a lot           of           white space.'

asdf.squeeze(' ') # => "There is a lot of white space."
asdf.gsub(/  +/, ' ') # => "There is a lot of white space."
asdf.gsub(/ {2,}/, ' ') # => "There is a lot of white space."
asdf.gsub(/\s\s+/, ' ') # => "There is a lot of white space."
asdf.gsub(/\s{2,}/, ' ') # => "There is a lot of white space."

n = 500000
Benchmark.bm(8) do |x|
  x.report('squeeze:') { n.times{ asdf.squeeze(' ') } }
  x.report('gsub1:') { n.times{ asdf.gsub(/  +/, ' ') } }
  x.report('gsub2:') { n.times{ asdf.gsub(/ {2,}/, ' ') } }
  x.report('gsub3:') { n.times{ asdf.gsub(/\s\s+/, ' ') } }
  x.report('gsub4:') { n.times{ asdf.gsub(/\s{2,}/, ' ') } }
end

puts
puts "long strings"
n     = 1000
str_x = 1000
Benchmark.bm(8) do |x|
  x.report('squeeze:') { n.times{(asdf * str_x).squeeze(' ') }}
  x.report('gsub1:') { n.times{(asdf * str_x).gsub(/  +/, ' ') }}
  x.report('gsub2:') { n.times{(asdf * str_x).gsub(/ {2,}/, ' ') }}
  x.report('gsub3:') { n.times{(asdf * str_x).gsub(/\s\s+/, ' ') }}
  x.report('gsub4:') { n.times{(asdf * str_x).gsub(/\s{2,}/, ' ') }}
end
# >>               user     system      total        real
# >> squeeze:  1.050000   0.000000   1.050000 (  1.055833)
# >> gsub1:    3.700000   0.020000   3.720000 (  3.731957)
# >> gsub2:    3.960000   0.010000   3.970000 (  3.980328)
# >> gsub3:    4.520000   0.020000   4.540000 (  4.549919)
# >> gsub4:    4.840000   0.010000   4.850000 (  4.860474)
# >> 
# >> long strings
# >>               user     system      total        real
# >> squeeze:  0.310000   0.180000   0.490000 (  0.485224)
# >> gsub1:    3.420000   0.130000   3.550000 (  3.554505)
# >> gsub2:    3.850000   0.120000   3.970000 (  3.974213)
# >> gsub3:    4.880000   0.130000   5.010000 (  5.015750)
# >> gsub4:    5.310000   0.150000   5.460000 (  5.461797)

, squeeze(' ') gsub() . , ('') . , , , , \s.

, , .

+2

Source: https://habr.com/ru/post/1753734/


All Articles