Select as many rows as possible to ensure element density for each column.

Suppose we have a 0-1 matrix, for example:

0, 1, 1, 1, 1 
1, 1, 1, 1, 1 
1, 0, 1, 0, 1 
1, 1, 1, 1, 1 
0, 1, 1, 0, 1 
1, 1, 0, 1, 0  

The goal is to select as many rows from this matrix as possible to form a new matrix, and to ensure that each column in the new matrix contains at least 80% of 1.

For example, in the above matrix, the result would be:

0, 1, 1, 1, 1 
1, 1, 1, 1, 1 
1, 0, 1, 0, 1 
1, 1, 1, 1, 1 
1, 1, 0, 1, 0  

The greedy algorithm does not work for this problem, obviously, and normal DP, as I see.

In the real task, the matrix will contain about 7000 rows and 100 columns. And since there would be several whole 1 lines, therefore at least one solution will always exist.

Can someone help me inspire? Thank you

+4
source share
1

:

(, "" - , , , , , 80% .

code.rb:

#!/usr/bin/env ruby
data = [
[0, 1, 1, 1, 1 ],
[1, 1, 1, 1, 1 ],
[1, 0, 1, 0, 1 ],
[1, 1, 1, 1, 1 ],
[0, 1, 1, 0, 1 ],
[1, 1, 0, 1, 0 ]
]

# array with blocks of different average densities
data = 990.times.collect do
         limit = rand(1000)
         100.times.collect do
          rand(1000) <= limit ? 1 : 0
        end
      end + 10.times.collect { 100.times.collect { 1 } }

#puts "data = #{data.inspect}"

def sum(list)
  list.inject(0){|res,v| res + v}
end

def column_percent(array)
  multiplier = 100.0 / array.count
  array.transpose.collect{|column| sum(column) * multiplier}
end

sorted_data = data.sort{|a,b| sum(b) <=> sum(a)}

#puts sorted_data.inspect
puts "Data percentages: #{column_percent(data).inspect}"
puts "Average over data: #{column_percent(data).min.inspect}"


solution = [ ]
consider = sorted_data
discarded = [ ]
loops = 0
done_something = true
achieved = [ ]
while (done_something)
  loops += 1
  done_something = false
  while (!consider.empty?)
    row = consider.shift
    #puts "Considerring: #{row.inspect}"
    if column_percent(solution + [ row ]).min >= 80.0
      done_something = true
      solution.push row
    else
      discarded.push row
    end
  end
  achieved << solution.count
  consider = discarded
  discarded = [ ]
end

puts "solution: #{solution.inspect}" if solution.count < 10
puts "solution.percents = #{column_percent(solution).inspect}"
puts "min solution.percents = #{column_percent(solution).min.inspect}"
puts "solution has #{solution.count.inspect} rows"
puts "consider has #{consider.count.inspect} rows"
puts "went through #{loops} loops, achievment at end of each loop: #{achieved.inspect} rows"

exit

END

:

Data percentages: [66.66666666666667, 83.33333333333334, 83.33333333333334, 66.66666666666667, 83.33333333333334]
Average over data: 66.66666666666667
solution: [[1, 1, 1, 1, 1], [1, 1, 1, 1, 1]]
solution.percents = [100.0, 100.0, 100.0, 100.0, 100.0]
min solution.percents = 100.0
solution has 2 rows
consider has 4 rows
went through 2 loops, achievment at end of each loop: [2, 2] rows


$ time ruby code.rb # 1000 rows

Data percentages: [50.900000000000006, 48.900000000000006, 50.2, 49.7, 47.5, 50.800000000000004, 50.2, 48.900000000000006, 51.300000000000004, 49.1, 49.900000000000006, 48.7, 49.7, 48.900000000000006, 50.300000000000004, 52.400000000000006, 51.0, 49.900000000000006, 50.800000000000004, 49.6, 49.0, 50.1, 49.1, 48.7, 50.800000000000004, 49.0, 49.2, 49.900000000000006, 48.800000000000004, 50.1, 50.2, 49.6, 49.900000000000006, 50.2, 50.900000000000006, 49.2, 51.7, 49.300000000000004, 48.400000000000006, 49.400000000000006, 49.5, 49.6, 47.7, 50.0, 46.900000000000006, 51.0, 50.0, 51.5, 50.5, 49.300000000000004, 49.1, 50.400000000000006, 47.800000000000004, 51.800000000000004, 50.2, 49.400000000000006, 49.400000000000006, 49.0, 51.5, 48.0, 53.7, 49.1, 51.300000000000004, 50.400000000000006, 50.800000000000004, 48.900000000000006, 50.6, 47.0, 50.300000000000004, 49.400000000000006, 50.800000000000004, 51.300000000000004, 52.900000000000006, 50.0, 51.300000000000004, 47.800000000000004, 51.300000000000004, 47.6, 49.900000000000006, 54.5, 49.5, 51.800000000000004, 50.800000000000004, 50.400000000000006, 51.0, 50.1, 47.7, 49.6, 53.300000000000004, 50.2, 49.7, 51.5, 47.900000000000006, 49.7, 48.0, 48.6, 49.6, 48.900000000000006, 50.1, 50.7]
Average over data: 46.900000000000006
solution.percents = [84.02366863905326, 83.13609467455622, 85.50295857988166, 81.65680473372781, 80.17751479289942, 82.54437869822486, 83.13609467455622, 81.95266272189349, 82.54437869822486, 80.4733727810651, 87.27810650887574, 82.84023668639054, 83.4319526627219, 82.54437869822486, 80.76923076923077, 84.31952662721893, 81.36094674556213, 85.79881656804734, 82.24852071005917, 83.72781065088758, 81.65680473372781, 82.24852071005917, 80.76923076923077, 82.54437869822486, 85.20710059171599, 83.72781065088758, 80.17751479289942, 83.72781065088758, 82.84023668639054, 81.95266272189349, 84.61538461538461, 80.17751479289942, 81.95266272189349, 81.36094674556213, 84.02366863905326, 84.61538461538461, 84.02366863905326, 83.72781065088758, 82.24852071005917, 84.31952662721893, 84.02366863905326, 84.02366863905326, 80.17751479289942, 82.84023668639054, 80.4733727810651, 82.84023668639054, 83.13609467455622, 82.84023668639054, 80.17751479289942, 80.17751479289942, 82.84023668639054, 83.72781065088758, 80.17751479289942, 81.95266272189349, 81.95266272189349, 82.84023668639054, 80.76923076923077, 81.95266272189349, 81.95266272189349, 82.84023668639054, 85.20710059171599, 83.4319526627219, 83.72781065088758, 80.17751479289942, 84.31952662721893, 82.54437869822486, 86.09467455621302, 81.95266272189349, 82.54437869822486, 81.95266272189349, 81.95266272189349, 83.72781065088758, 83.4319526627219, 84.61538461538461, 86.68639053254438, 81.06508875739645, 83.4319526627219, 80.76923076923077, 80.76923076923077, 85.79881656804734, 82.84023668639054, 85.79881656804734, 84.31952662721893, 82.24852071005917, 84.02366863905326, 80.76923076923077, 80.17751479289942, 84.9112426035503, 83.72781065088758, 84.61538461538461, 83.13609467455622, 84.61538461538461, 84.61538461538461, 82.54437869822486, 80.76923076923077, 82.84023668639054, 80.4733727810651, 80.17751479289942, 82.84023668639054, 80.17751479289942]
min solution.percents = 80.17751479289942
solution has 338 rows
consider has 662 rows
went through 3 loops, achievment at end of each loop: [337, 338, 338] rows

real    0m7.588s
user    0m7.435s
sys 0m0.142s

7000 :

: 2742 : [1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1] : 2743 : 2743 solution.percents = [81.18847976667881, 80.42289464090412, 81.00619759387531, 80.93328472475392, 80.67808968282901, 80.38643820634341, 81.22493620123952, 81.55304411228582, 81.22493620123952, 80.20415603353992, 80.45935107546481, 81.22493620123952, 80.56872037914691, 80.02187386073642, 81.15202333211812, 82.39154210718192, 80.02187386073642, 80.24061246810062, 81.26139263580022, 80.38643820634341, 80.02187386073642, 80.27706890266131, 80.16769959897921, 81,00619759387531, +80,49580751002551, +81,37076193948232, 81,69886985052861, +80,24061246810062, 81,00619759387531, +80,34998177178271, +80,20415603353992, +81,69886985052861, +81,51658767772511, 80,64163324826832, +80,02187386073642, +80,02187386073642, +80,02187386073642, 80,02187386073642, +80,34998177178271, +80,27706890266131, +80,02187386073642, 80,16769959897921, +80,82391542107182, +81,29784907036091, 81,77178271965002, +80,75100255195042, +81,84469558877142, +80,53226394458622, 80.02187386073642, 80.86037185563251, 80.09478672985782, 81.1884797 6667881, +81,15202333211812, 80,31352533722202, 82,28217280349982, 82,02697776157491, +81,48013124316441, +80,64163324826832, +80,89682829019321, +81,11556689755741, 81,26139263580022, +80,64163324826832, +80,64163324826832, +80,45935107546481, 80,86037185563251, +80,31352533722202, 80,05833029529711, +81,40721837404301, 81,00619759387531, +81,77178271965002, +80,96974115931461, +81,22493620123952, 81,37076193948232, +80,49580751002551, +80,05833029529711, +80,89682829019321, 81,44367480860372, 80,02187386073642, +80,02187386073642, 81,55304411228582, 80,67808968282901, 80,49580751002551, 81,26139263580022, 80,02187386073642, 80,27706890266131, 80,42289464090412, 80,45935107546481, 81,55304411228582, 81,77178271965002, 80,45935107546481, 81,73532628508931, 80,75100255195042, +83,04775792927451, 80,45935107546481, 80,02187386073642, 80,02187386073642, 81,04265402843602, +81,51658767772511, +80,89682829019321, 81,58950054684651] . : 80.02187386073642 2743 4267 , : [2742, 2743, 2743]

real 5m57.637s 5m57.446s sys 0m0.335s

+1

Source: https://habr.com/ru/post/1531255/


All Articles