Database Optimization: Hashing All Values

Typically, databases are designed as follows to allow multiple types for an object.

Object Name Type Additional Information

The name of the object can be something like an account number, and the type can be like savings, current, etc. in a banking database, for example.

Basically, the type will be some kind of string. Additional information may exist related to the type of object.

Typically, queries will be defined as follows. Find account numbers of this type? Find type X account numbers with a balance of over 1 million?

To answer these queries, the query analyzer scans the index if the index is associated with a particular column. Otherwise, it will perform a full scan of all rows.

I am thinking about optimization below. Why don’t we keep the hash or integral value of each column in the actual table in order to preserve the ordering property so that it is easy to compare.

This has the advantages below. 1. The size of the table will be much smaller because we will keep small size values ​​for each column. 2. We can build a cluster index of the B + tree for hash values ​​for each column to get the corresponding rows matching either more or less than some value. 3. The corresponding values ​​can be easily obtained using the B + tree index in the main memory and obtaining the corresponding values. 4. Rare values ​​will never be restored.

I still have more optimizations. I will send them based on feedback on this.

, , .

, .

-

Update:

, . . , , .

: ( )

? ; , .

. , - , - . , , . , .

( ). , .

, . . .

- . , . , ? , - ( SQL Server, , )?

, .

.

Row1 - OrderedHash (Column1), OrderedHash (Column2), OrderedHash (Column3)

+3
3

Google "-". , SQL Server CHECKSUM.

, , , . varchars, 100 - .

+1

? ; , .

( ). , .

- . , . , ? , - ( SQL Server, , )?

0

, .

/, /, , .

(in) - 100% - , , , - , , .

, . , , - (, A , , B , ), .

, , , - , , (coulmns , ), .

Etc.

0
source

Source: https://habr.com/ru/post/1729505/


All Articles