Data Aggregation - Daily SQL Script vs Data Warehouse

Excuse me if this has already been asked (I know very little about Data Warehouse / BI and have not yet mastered the keywords).

I have a table that grows by more than 100,000 rows per day, each row has a timestamp and multiple information about the element (size, weight, color, etc.). Some data may be useful about a month after this period, we are only interested in clusters. I have special software that allows me to visualize individual lines in more detail and mainly use PowerPivot for my reporting needs.

I could find an SQL query that populated a new table daily: In which I would have a row for each hour / point / batch, and I would summarize the information (sum / average / stddev / etc.)

During the day, my script will be running and I can use powerpivot against this new table. All this, staying where I feel comfortable: plain old SQL.

From a few of the information I gathered to read about DataWarehouse and BI, what I'm going to do sounds just like creating dimensions and facts. So my question is: is it worth investigating further in this direction (BI), or since my problem is relatively simple, I would rather stay in a relational database.

NB Reports that are produced are usually linked to another database for more informative information. A task that is very well performed by Powerpivot.

+6
source share
3 answers

Data warehouses are usually implemented in relational databases, so your existing skills will still be used.

Given that you have shown interest in the size / fact approach regarding data, the canonical books on this approach are usually considered:

  • Data Warehouse Toolkit (Kimball, Ross)
  • Data Warehouse Lifecycle Toolkit (Kimball, Ross, Thornthwaite, Mandy, Becker)

(The former has more technical attention, while the latter approaches this subject from a broader perspective in life cycle management.)

Implementing DWH can take a lot of time, so it might be worth continuing your existing approach, even if you decide to create DWH.

+3
source

Good news: it looks like you already have a data warehouse. "Data Warehouse" is a very general term, without a real formal definition - it pretty much means what you want.

Common Features:

  • Data warehouses do not work in operational databases
  • Datastore schemes are optimized for queries, not "normal form" matching
  • Data warehouses are populated with Extraction, Transformation, Loading (ETL) processes.

It looks like you are already doing all this. If there are no business requirements for change, I would leave it as it is. If your business users are asked to create their own queries using different levels of aggregation, filtering, or granularit, a star schema can be a way.

+2
source

The most effective solutions are those that are simple, adequate to meet existing needs, and remain within the available skill sets.

I agree that this approach is well suited to your situation, if it provides the reports and information that you need, then it is worth starting this path. If you need more complex functionality, then you can move on to more complex BI

+1
source

Source: https://habr.com/ru/post/911499/


All Articles