The fastest way to calculate the hash of the entire table

Question

The fastest way to calculate the hash of the entire table

We need to be able to calculate the hashes of the table for the external environment and compare them with the previously calculated hash from the internal environment. This is to ensure that data in the external environment is not tampered with by the rogue database administrator. Users insist on this feature .

Currently, we do this by calculating the individual hashes of each column value, execute bit-xor in the column hashes to get a row hash, and then run-bit-xor in all row hash hashes to get a hash table. Pseudo- script below:

cursor hash_cur is select /*+ PARALLEL(4)*/ dbms_crypto.mac(column1_in_raw_type, HMAC_SH512, string_to_raw('COLUMN1_NAME')) as COLUMN1_NAME ... from TABLE_NAME; open hash_cur; fetch hash_cur bulk collect into hashes; close hash_cur; for i in 1..hashes.count loop rec := hashes(i); record_xor = rec.COLUMN1; record_xor = bit_xor(record_xor, rec.COLUMN2); ... record_xor = bit_xor(record_xor, rec.COLUMNN); table_xor = bit_xor(table_xor, record_xor); end loop;

The pseudo-script above will run in parallel with dbms_job.

The problem is that we have terabytes of data for certain tables, and currently the performance does not match the performance that we want to achieve. Hashing has to be done on the fly, as users want to perform a hash check.

Do you guys have a better way to hash the entire table or basically compare tables from different environments that are connected by a network with low latency and relatively low bandwidth?

It seems to me that the operation is more related to the processor than I / O binding. I am going to instead save the table data in a blob where the data is correctly ordered by record and then by column. Then execute the hash in the output file. This should result in a complete I / O operation.

What is the fastest way to do this? In any case, to do this in the select clause of a query to remove any PL / SQL-SQL command line utility switch?
- I was thinking of modifying the global blob for this
- I would also like to remove the I / O overhead for mass collection of results.

Any suggestions that might lead me to a more efficient script would be greatly appreciated. Thanks.

+5

sql oracle plsql oracle12c database-performance

user3367701 Nov 20 '15 at 10:39

source share

2 answers

Matthew McPeak · Answer 1 · 2015-11-20T21:40:28+0000

First of all, I think the rogue admins approach involves a combination of Oracle audit trails and Database Vault features.

So here is what I can try:

1) Create your own aggregate ODCI function to calculate the hash of multiple rows as a collection. 2) Create a VIRTUAL NOT NULL column in the table, which was the SHA hash of all the columns in the table, or whatever you need for protection. You will keep this for the whole time - mainly trading some insert/update/delete results in exchange to calculate hashes faster. 3) Create an imperfect index in this virtual column 4) SELECT my_aggregate_hash_function(virtual_hash_column) FROM my_table to get the results.

Here is the code:

Create an aggregate function to calculate the SHA hash over a group of strings

 CREATE OR REPLACE TYPE matt_hash_aggregate_impl AS OBJECT ( hash_value RAW(32000), CONSTRUCTOR FUNCTION matt_hash_aggregate_impl(SELF IN OUT NOCOPY matt_hash_aggregate_impl ) RETURN SELF AS RESULT, -- Called to initialize a new aggregation context -- For analytic functions, the aggregation context of the *previous* window is passed in, so we only need to adjust as needed instead -- of creating the new aggregation context from scratch STATIC FUNCTION ODCIAggregateInitialize (sctx IN OUT matt_hash_aggregate_impl) RETURN NUMBER, -- Called when a new data point is added to an aggregation context MEMBER FUNCTION ODCIAggregateIterate (self IN OUT matt_hash_aggregate_impl, value IN raw ) RETURN NUMBER, -- Called to return the computed aggragate from an aggregation context MEMBER FUNCTION ODCIAggregateTerminate (self IN matt_hash_aggregate_impl, returnValue OUT raw, flags IN NUMBER) RETURN NUMBER, -- Called to merge to two aggregation contexts into one (eg, merging results of parallel slaves) MEMBER FUNCTION ODCIAggregateMerge (self IN OUT matt_hash_aggregate_impl, ctx2 IN matt_hash_aggregate_impl) RETURN NUMBER, -- ODCIAggregateDelete MEMBER FUNCTION ODCIAggregateDelete(self IN OUT matt_hash_aggregate_impl, value raw) RETURN NUMBER ); / CREATE OR REPLACE TYPE BODY matt_hash_aggregate_impl IS CONSTRUCTOR FUNCTION matt_hash_aggregate_impl(SELF IN OUT NOCOPY matt_hash_aggregate_impl ) RETURN SELF AS RESULT IS BEGIN SELF.hash_value := null; RETURN; END; STATIC FUNCTION ODCIAggregateInitialize (sctx IN OUT matt_hash_aggregate_impl) RETURN NUMBER IS BEGIN sctx := matt_hash_aggregate_impl (); RETURN ODCIConst.Success; END; MEMBER FUNCTION ODCIAggregateIterate (self IN OUT matt_hash_aggregate_impl, value IN raw ) RETURN NUMBER IS BEGIN IF self.hash_value IS NULL THEN self.hash_value := dbms_crypto.hash(value, dbms_crypto.hash_sh1); ELSE self.hash_value := dbms_crypto.hash(self.hash_value || value, dbms_crypto.hash_sh1); END IF; RETURN ODCIConst.Success; END; MEMBER FUNCTION ODCIAggregateTerminate (self IN matt_hash_aggregate_impl, returnValue OUT raw, flags IN NUMBER) RETURN NUMBER IS BEGIN returnValue := dbms_crypto.hash(self.hash_value,dbms_crypto.hash_sh1); RETURN ODCIConst.Success; END; MEMBER FUNCTION ODCIAggregateMerge (self IN OUT matt_hash_aggregate_impl, ctx2 IN matt_hash_aggregate_impl) RETURN NUMBER IS BEGIN self.hash_value := dbms_crypto.hash(self.hash_value || ctx2.hash_value, dbms_crypto.hash_sh1); RETURN ODCIConst.Success; END; -- ODCIAggregateDelete MEMBER FUNCTION ODCIAggregateDelete(self IN OUT matt_hash_aggregate_impl, value raw) RETURN NUMBER IS BEGIN raise_application_error(-20001, 'Invalid operation -- hash aggregate function does not support windowing!'); END; END; / CREATE OR REPLACE FUNCTION matt_hash_aggregate ( input raw) RETURN raw PARALLEL_ENABLE AGGREGATE USING matt_hash_aggregate_impl; /

Create a test chart to work (you will skip this since you have a real table)

 create table mattmsi as select * from mtl_system_items where rownum <= 200000;

Create a hash of a virtual data column of each row. Make sure it is `NOT NULL`

 alter table mattmsi add compliance_hash generated always as ( dbms_crypto.hash(to_clob(inventory_item_id || segment1 || last_update_date || created_by || description), 3 /*dbms_crypto.hash_sh1*/) ) VIRTUAL not null ;

Create an index in a virtual column; this way you can calculate your hash with a full scan of a narrow index instead of a full scan of the fat table

 create index msi_compliance_hash_n1 on mattmsi (compliance_hash);

Put it all together to calculate the hash

 SELECT matt_hash_aggregate(compliance_hash) from (select compliance_hash from mattmsi order by compliance_hash);

A few comments:

I think it’s important to use a hash to compute the population (and not just do SUM() on line level hashes, because an attacker can easily fake the correct amount.
I don’t think that you can (easily?) Use a parallel query, because it is important that the lines are loaded into the aggregate function in a consistent order, otherwise the hash value will change.

are · Answer 2 · 2015-11-20T14:34:05+0000

you can use ORA_HASH and pass multiple columns as an expression

 select sum(ORA_HASH(col1||col2||col3)) as hash from my_table

but here on AskTom there is a similar discussion about why this is not a very good way: Creating a unique HASH value for table contents

The fastest way to calculate the hash of the entire table

Create an aggregate function to calculate the SHA hash over a group of strings

Create a test chart to work (you will skip this since you have a real table)

Create a hash of a virtual data column of each row. Make sure it is NOT NULL

Create an index in a virtual column; this way you can calculate your hash with a full scan of a narrow index instead of a full scan of the fat table

Put it all together to calculate the hash

More articles:

Create a hash of a virtual data column of each row. Make sure it is `NOT NULL`