C # bitray index of positive bits

Question

C # bitray index of positive bits

I have a C # bitbyte, which is quite large (500,000) in length, and I'm trying to get the index of all the positive bits set in the array. I am currently achieving this:

public int[] GetIndexesForPositives() { var idIndexes = new int[GetPositiveCount + 1]; var idx = 0; for (var i = 0; i < Length; i++) { if (Get(i)) { idIndexes[idx++] = i; } } return idIndexes; }

I create an empty array from the number of known positive bits, then I lopp over the bitarray and add the index value to the returned array.

This means that I have to do 500,000 loops on the array, which is not so fast. (takes about 15 ms).

I know that BitArray uses an integer array under covers (I used it to write the GetPositiveCount function - through alogrithm, I got the stack), I wonder if there is an algorithm for this?

+6

performance c # algorithm

Aaronhs Sep 14 '11 at 9:45

source share

4 answers

If you can replace the BitArray from BCL in favor of quitting your own, you can do better than that. Here are a few things you can do:

Skip fragments of 64 that do not have bits.
For fragments of 64 that have bits, list only 1 bit, not all bits using x & (x - 1) , and your favorite fast 2log found here (using the naive 64-step method will not give any acceleration)
Store additional bitarray, which stores for each 64-bit fragment whether it is nonzero. Apply the method from bullet 2 to this bitarray to skip entire ranges of zero at a time.
Apply bullet 3 recursively for giant bitracks

All four of them help only if bitarray is expected to be sparse, and O (n) remains the worst case if it is not sparse. If bullet 3 is used until the upper part becomes one ulug, then it can determine in O (1) whether the entire bitarray is empty or not.

+6

harold Sep 14 '11 at 10:27

source share

I would do something like this:

 public int[] GetIndexesForPositives() { var idIndexes = new LinkedList<int>(); for (var i = 0; i < Length; i++) { if (Get(i)) { idIndexes.Add(i); } } return idIndexes.ToArray(); }

If this is not yet acceptable (because you are looking at the indexes again when executing a ToArray), just use the same size for your result array and return the length of the found indexes:

 public int GetIndexesForPositives(out int[] indizes) { indizes = new int[Length]; var idI = 0; for (var i = 0; i < Length; i++) { if (Get(i)) { indizes[idI++] = i; } } return idI; }

Depending on whether you really need all the indexes or just the parts, you might even think of something like this (but it will be less efficient if you need each part - do some profiling, please):

 public IEnumerable<int> GetIndexesForPositives() { for (var i = 0; i < Length; i++) { if (Get(i)) { yield return i; } } }

this assumes that your Get (i) is doing this work and that your array is immutable.

+2

Carsten Sep 14 '11 at 9:53

source share

You are pretty stuck with iteration 500,000 times. Intuitively: if each bit is set, you need to create an array with 500,000 elements. You can reduce the number of external iterations by 8 or 32 times by accessing the base byte or int, but then you still have to iterate over the bits. Even a lookup table with 256 elements for each possible byte value will not help, since you need to add a bit index.

However, if you have a lot of zero bits (hope you do), the optimization will simply set the loop counter to 8 or 32 if the base byte / int is 0. Since you know that Get () will return 0. Also, use List <> to avoid GetPostitiveCount overhead.

Completely eliminating the need for this array and lazily getting the next bit set to 1 would certainly be the best approach.

+1

Hans passant Sep 14 '11 at 11:29

source share

Seph · Accepted Answer · 2011-09-14T11:54:54+0000

If you can get the int array underlying BitArray, this should provide much better performance:

Assuming you don't know the number of bits that are set:

 public static int[] GetIndexesForPositives() { var idIndexes = new List<int>(); System.Reflection.FieldInfo field = data.GetType().GetField("m_array", System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance); int[] values = field.GetValue(data) as int[]; for (var i = 0; i < values.Length; i++) { int _i = values[i]; if (_i != 0) { for (var j = 0; j < 32; j++) { if ((_i & (1 << j)) != 0) { idIndexes.Add(i * 32 + j); } } } } return idIndexes.ToArray(); }

If you know the number of bits that are set, you can do this instead:

 public static int[] GetIndexesForPositives(int length) { var idIndexes = new int[length]; var idx = 0; System.Reflection.FieldInfo field = data.GetType().GetField("m_array", System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance); int[] values = field.GetValue(data) as int[]; for (var i = 0; i < values.Length; i++) { int _i = values[i]; if (_i != 0) { for (var j = 0; j < 32; j++) { if ((_i & (1 << j)) != 0) { idIndexes[idx++] = i * 32 + j; } } } }

In my tests, these two are faster than your method, even one that does not know how large the returned array will be.

My results were tested using a random BitArray of 50 million entries:

 1) 25001063 records found in 50000000, took 1415.5752ms 2) 25001063 records found in 50000000, took 1099.67ms 3) 25001063 records found in 50000000, took 1045.6862ms 4) 25001063 records found in 50000000, took 745.7762ms" 1) is your code but using an arraylist instead of using some `GetPositiveCount` to get the output length. 2) is your code 3) is my (revised) first example 4) is my (revised) second example

edit: it’s also worth noting that this is a problem that can really benefit from multithreaded processing. Divide ByteArray into 4 parts, and there you have 4 threads that could check data at the same time.

Edit: I know that this is already accepted, but here is one more bit you can do to improve performance if you know that most of the time your list will be very scarce:

 for (var j = 0; j < 32; j++) { if (_i == 0) break; if ((_i & (1)) != 0) { idIndexes.Add(i * 32 + j); } _i = _i >> 1; }

it's a little slower if the list is> 40% or more, but if you know that the list will always be 10% 1 s and 90% 0, then this will work even faster for you.

C # bitray index of positive bits

More articles: