How to quickly compare variable length bit strings in C ++?

I am comparing objects based on the binary presence or absence of a feature set. These functions can be represented by a bit string, for example:

10011

This bit string has the first, fourth and fifth functions.

I am trying to calculate the similarity of a pair of bit strings as the number of functions that are shared. For a given set of bit strings, I know that all of them will have the same length, but at compile time I do not know what this length will be.

For example, these two lines have two things in common, so I would like the similarity function to return 2:

s(10011,10010) = 2

How to efficiently represent and compare bit strings in C ++?

+3
source share
4 answers

std::bitset STL.

AND 1:

#include <string>
#include <bitset>

int main()
{
  std::bitset<5> option1(std::string("10011")), option2(std::string("10010"));
  std::bitset<5> and_bit = option1 & option2; //bitset will have 1s only on common options
  size_t s = and_bit.count ();                //return the number of 1 in the bitfield
  return 0;
}

, boost::dynamic_bitset<>:

boost::dynamic_bitset<> option(bit_string);

, boost::dynamic_bitset<> std::bitset.

+10

:

int similarity(unsigned int a, unsigned int b)
{
   unsigned int r = a & b;
   r = ( r & 0x55555555 ) + ((r >> 1) & 0x55555555 );
   r = ( r & 0x33333333 ) + ((r >> 2) & 0x33333333 );
   r = ( r & 0x0f0f0f0f ) + ((r >> 4) & 0x0f0f0f0f );
   r = ( r & 0x00ff00ff ) + ((r >> 8) & 0x00ff00ff );
   r = ( r & 0x0000ffff ) + ((r >>16) & 0x0000ffff );
   return r;
}

int main() {
        unsigned int a = 19 ;//10011
        unsigned int b = 18 ;//10010
        cout << similarity(a,b) << endl; 
        return 0;
}

:

2

: http://www.ideone.com/bE4qb

+3

, boost::dynamic_bitset std::bitset.

operator& ( &=) boost::dynamic_bitset::count().

. , , , , , . @Nawaz - Bit Twiddling Hacks / sse/popcount/etc.

, , , llvm, gcc icc , / .

+2

Use std::bitsetif your function set is less than the number of bits in the long (I think it's long), you can get an unsigned long representation of the bits, then two values, and use the twidling bit tricks here to count.


If you want to continue using strings to represent your bit pattern, you can do something like the following using zip_iteratorfrom boost.

#include <iostream>
#include <string>
#include <algorithm>

#include <boost/tuple/tuple.hpp>
#include <boost/iterator/zip_iterator.hpp>

struct check_is_set :
  public std::unary_function<const boost::tuple<char const&, char const&>&, bool>
{
  bool operator()(const boost::tuple<char const&, char const&>& t) const
  {
    const char& cv1 = boost::get<0>(t);
    const char& cv2 = boost::get<1>(t);
    return cv1 == char('1') && cv1 == cv2;
  }
};

size_t count_same(std::string const& opt1, std::string const& opt2)
{
  std::string::const_iterator beg1 = opt1.begin();
  std::string::const_iterator beg2 = opt2.begin();

  // need the same number of items for end (this really is daft, you get a runtime
  // error if the sizes are different otherwise!! I think it a bug in the
  // zip_iterator implementation...)
  size_t end_s = std::min(opt1.size(), opt2.size());
  std::string::const_iterator end1 = opt1.begin() + end_s;
  std::string::const_iterator end2 = opt2.begin() + end_s;

  return std::count_if(
  boost::make_zip_iterator(
    boost::make_tuple(beg1, beg2)
    ),
  boost::make_zip_iterator(
    boost::make_tuple(end1, end2)
    ),
    check_is_set()
  );
}

int main(void)
{
  std::string opt1("1010111");
  std::string opt2("001101");

  std::cout << "same: " << count_same(opt1, opt2) << std::endl;

  return 0;
}
+1
source

Source: https://habr.com/ru/post/1787919/


All Articles