Currently, we have a scenario in which one table has several (from 10 to 15) Boolean flags (non-zero bit fields). Unfortunately, it is actually impossible to simplify this at the logical level, because any combination of logical values is acceptable.
The table in question is a transactional table that can contain tens of millions of rows, and both the insertion and the choice of performance are quite critical. Although we are not completely sure about the distribution of data at this time, the combination of all the flags should provide a relative good power, that is, make it a “worthwhile” index for SQL Server to use.
Typical query selection scenarios may be the selection of records based on only 3 or 4 flags, for example. WHERE FLAG3=1 AND FLAG7=0 AND FLAG9=1 . It would be impractical to create separate indexes for all flag combinations used by these selective queries, since there will be many of them.
Given this situation, what would be the recommended approach for effectively indexing these fields? The table is new, so there is no data to worry about yet, and we have sufficient flexibility in the actual implementation of the table.
We are currently considering two main options:
- Create a single index that includes all bit fields (this will probably include 1 or 2 other
int fields that have always been used). My concern is that, given the typical use of only the included few fields, this approach skips the index and resorts to table scans. Let me call this option A (after reading some answers, it seems that this approach will not work well, since the order of the fields in the index will matter, which makes it impossible to index ALL fields efficiently). - Effectively do what I believe SQL Server does internally and encodes bit fields into a single field using binary operators (AND-IN and OR-IN numbers together: 1, 2, 4, 8, etc.) . My concern is that we will need to do some kind of calculation for the request in this encoded field, which will again skip the index. The maintenance and complexity of this solution is also a concern. Let me call this option B. Additional information: The argument for this approach is that we could have a relatively simple and short index that includes one or two other fields from the table and this field. Other fields have narrowed the number of records that need to be estimated, and since the encoded field will contain all of our bit fields, SQL Server will be able to perform calculations using data obtained directly from the index (i.e., index scan) as opposed to a table (i.e., scan tables).
At the moment, we are strongly inclined towards Option B. For completeness, this will work on SQL Server 2008.
Any advice is appreciated.
Edit: spelling, clarity, sample request, additional information on Option B.
source share