Multiple Bit Field Index in SQL Server

Currently, we have a scenario in which one table has several (from 10 to 15) Boolean flags (non-zero bit fields). Unfortunately, it is actually impossible to simplify this at the logical level, because any combination of logical values ​​is acceptable.

The table in question is a transactional table that can contain tens of millions of rows, and both the insertion and the choice of performance are quite critical. Although we are not completely sure about the distribution of data at this time, the combination of all the flags should provide a relative good power, that is, make it a “worthwhile” index for SQL Server to use.

Typical query selection scenarios may be the selection of records based on only 3 or 4 flags, for example. WHERE FLAG3=1 AND FLAG7=0 AND FLAG9=1 . It would be impractical to create separate indexes for all flag combinations used by these selective queries, since there will be many of them.

Given this situation, what would be the recommended approach for effectively indexing these fields? The table is new, so there is no data to worry about yet, and we have sufficient flexibility in the actual implementation of the table.

We are currently considering two main options:

  • Create a single index that includes all bit fields (this will probably include 1 or 2 other int fields that have always been used). My concern is that, given the typical use of only the included few fields, this approach skips the index and resorts to table scans. Let me call this option A (after reading some answers, it seems that this approach will not work well, since the order of the fields in the index will matter, which makes it impossible to index ALL fields efficiently).
  • Effectively do what I believe SQL Server does internally and encodes bit fields into a single field using binary operators (AND-IN and OR-IN numbers together: 1, 2, 4, 8, etc.) . My concern is that we will need to do some kind of calculation for the request in this encoded field, which will again skip the index. The maintenance and complexity of this solution is also a concern. Let me call this option B. Additional information: The argument for this approach is that we could have a relatively simple and short index that includes one or two other fields from the table and this field. Other fields have narrowed the number of records that need to be estimated, and since the encoded field will contain all of our bit fields, SQL Server will be able to perform calculations using data obtained directly from the index (i.e., index scan) as opposed to a table (i.e., scan tables).

At the moment, we are strongly inclined towards Option B. For completeness, this will work on SQL Server 2008.

Any advice is appreciated.

Edit: spelling, clarity, sample request, additional information on Option B.

+6
source share
3 answers

Although there are probably ways to solve your indexing problem regarding the existing table schema, I would reduce it to the normalization problem:

For example, I highly recommend creating a series of new tables:

  • A lookup table for the names of these bit flags. e.g. CREATE TABLE Flags (id int IDENTITY(1,1), Name varchar(256)) (you do not need to specify the column identifier identifier-seed if you want to manually manage the identifier - for example, 2,4,8,16,32, 64,128 as binary flags.)
  • Create a new link table containing the identifier of the original data table and a new link table, for example. CREATE TABLE DataFlags_Link (id int IDENTITY(1,1), MyFlagId int, DataId int)

Then you can create an index in the DataFlags_Link table and write queries such as:

 SELECT Data.* FROM Data INNER JOIN DataFlags_Link ON Data.id = DataFlags_Link.DataId WHERE DataFlags_Link.MyFlagId IN (4,7,2,8) 

In terms of performance, where good DBA maintenance is required. You need to set the fill and fill factor in your tables accordingly and start the usual defragmentation of the index or rebuild your indexes on a schedule.

Performance and maintenance go hand in hand with databases. You cannot have one without the other.

+3
source

A single BIT column is usually not selective enough to even be considered for use in the index. Thus, an index in a single BIT column really does not make sense - on average, you will need to search for about half the records in the table (50% selectivity), and therefore the SQL Server query optimizer will use table scanning.

If you create one index in all 15 BIT columns, then you do not have this problem - since you have 15 yes / no parameters, your index will become quite selective.

The problem is the importance of the sequence of column bits. Your index will only ever be examined if your SQL statement uses at least 1-n columns with the highest BIT .

So if your index is on

 Col1,Col2,Col3,....,Col14,Col15 

then it can be used for a query that uses

  • Col1
  • Col1 and Col2
  • Col1 and Col2 and Col3 ....

etc. But it cannot be used for a query that indicates Col6,Col9 and Col14 .

Because of this, I really don't think the index in your BIT column collection really makes much sense.

Are these 15 BIT columns the only columns you use for queries? If not, I would try to combine those BIT columns that you use most to select with other columns, for example. have an index on Name and Col7 or something (then your BIT columns can add some extra selectivity to another index)

+6
source

While I think Neil Fenwick’s answer is probably right, I think the real answer is to try different options and see which one is fast enough.

Option 1 is probably the easiest solution, so it is probably the most convenient to maintain - and it can be fast enough.

I would build a prototype database using the "1 option" schema and use something like http://www.red-gate.com/products/sql-development/sql-data-generator/ or http: // sourceforge. net / projects / dbmonster / to create twice as much data as you expect, and then create the queries you requested. Accept the acceptable response time and consider only the “faster” circuit if you exceed the response time (and you cannot drop the equipment if a problem occurs).

Neil's solution is probably as obvious and supported as “option 1,” and should be easily indexed. However, I am still testing it by creating a prototype circuit and creating a lot of test data ...

+1
source

Source: https://habr.com/ru/post/895394/


All Articles