How do you verify the correctness of data in a data file?

I am working on a data warehouse, and I am trying to figure out how best to verify that the data from our data cleansing database (normalized) brings it to our data marts correctly. I performed some search queries, but the results so far are more about providing things like restrictions, and that you need to perform data validation during the ETL process (for example, dates are valid, etc.). The sizes were quite light, since I could easily either use the primary key or write a very simple and verified query to get the data. Fact tables are more complex.

Any thoughts? We are trying to make it very easy to export items in order to fulfill a couple of queries, see some data from both the data cleansing database and data marts, and visually compare them to make sure they are correct.

+3
source share
2 answers

Testing the downloads of your fact table by introducing a simplified subprogrammed subset of the same data manipulation elsewhere and comparing the results.

You calculate the same totals, counts, or other numbers at least twice. Once from the fact table itself, after it has finished loading, and once from another source:

  • ,
  • , , ,
  • .

, , , . , , : count of x by (y, z) .

. ConcernedOfTunbridgeWells .

+4

, , , , , (-) . . , ( , - ).

, , , ETL, , . , , , - , .

ETL , , DW, , , unit test .

0

Source: https://habr.com/ru/post/1736305/


All Articles