Testing SQL Queries on Multiple Database Systems

I am involved in a migration project from Oracle to PostgreSQL, and I am looking for a way to automate testing a large number of queries converted from Oracle syntax to PostgreSQL. It is assumed that the data has been successfully transferred, so there is no need to verify this. I can crack the solution from scratch using Perl or Python, but there may be simpler ways. I looked at the database testing framework, lke Test :: DBUnut or pgTap, but they assume that the user supplies the results for verification, and in my case they are obtained from the database from which we are moving. The question is, is there an existing database tool or testing environment for querying old (Oracle) and new (PostgreSQL) databases, get the results and compare them, highlighting the differences and any errors that may occur in the process?

+4
source share
2 answers

How about creating a JUnit project that runs the appropriate query for different schemas (one Oracle is another PostgreSQL)?

Alternatively, you can create two simple Maven projects (one for each provider), each project will use an SQL plug-in to run your queries (insert them in the same order in pom.xml). You can automate these tests later using a continuous integration server that supports Maven (Hudson?) And sets up a scheduled run.

Good luck

+2
source

I ended up writing a custom tool to run queries on both databases and collect results using python psycopg2 and cx_oracle. Comparing them is a matter of computing hashes for each line and checking for the presence of the oracle entity in the postgresql line hash. A pair of pitfalls:

  • floating point numbers may lose precision when converting from Oracle / PostgreSQL to python. Use the specific hook types in the drivers (see the Documentation) to make sure you convert them to Decimal, not float.

  • It is mysteriously simple to read one row at a time from both databases, compare its values ​​and move on. However, this will not work if the SQL result is not explicitly ordered (with ORDER BY). Unfortunately, reading the results immediately means that you need a lot of memory for queries that produce many rows.

  • you need to distinguish between queries that give equal results and those that produce 0 rows in both databases. The latter should be and if the queries contain parameters, their values ​​should be reviewed.

0
source

Source: https://habr.com/ru/post/1380167/


All Articles