Paste in star chart

I read a lot about star schemes, fact / disinfestation tables, selection operations to quickly report data, but the question of entering data into a star-like scheme seems alienated to me. How to "theoretically" enter data into a db schematic diagram? keeping a fact table. Is a series of INSERT INTO statements inside a giant stored process with 20 parameters my only option (and how to populate a fact table). Many thanks.

+4
source share
2 answers

Start with the measurements again - one at a time. Use the ECCD approach (extraction, cleaning, reconciliation, delivery).

Be sure that in each dimension there is a BusinessKey that uniquely identifies the β€œobject” that describes the dimension string β€” for example, email for a person.

With the dimensions loaded, prepare the key search pipeline. In general, for each dimension table, you can prepare a key lookup table (BusinessKey, PrimaryKey). Some designers prefer to look directly at the size table, but key searches can often be easily cached into memory, which leads to faster loading of facts.

Use ECCD for these facts. The ECC part takes place in the intermediate area, you can select (auxiliary) tables or flat files for each ECC step, as you prefer.

When delivering fact tables, replace each BusinessKey in the fact line with the corresponding PrimaryKey, which you will get from the key lookup table. After all BusinessKeys are replaced with the corresponding PrimaryKeys, insert a row in the fact table.

Do not waste time using the ETL tool. You can download the Pentaho Kettle (Community Edition) for free β€” it has everything you need to achieve this.

+6
source

Normally, do not insert data into the asterisk scheme in the same way as you could in the normal form, i.e. with a stored procedure that inserts / updates all relevant tables in a single transaction. Remember that a star schema is usually a read-only denormalized data model β€” it is (rarely) processed transactionally and is usually loaded from data that is already denormalized flat β€” usually one flat file for each star.

As Damir points out, usually you download all sizes (handle slowly changing, etc.), then load the facts, joining the corresponding current sizes to find the size identifiers (using business keys).

+3
source

Source: https://habr.com/ru/post/1304823/


All Articles