Storage issues: when and why?

A bit of background here:

I know what a data warehouse is , more or less. I read dozens of data warehouse guides, I played with SSAS, I know what a star chart, a measurement table and a fact table are, I know what ETL is and how to do it. This is not a “how” question or a textbook request.

My problem is that all the material that I read on the data warehouse seems to be silent about the rationale for creating the data warehouse. All of them are figuratively, or in some cases literally begin with the phrase “so you decided to build a data warehouse ...” In addition, I have not yet made this decision.

So, I hope SO members can point me out or help come up with some kind of semi-objective test. Something that I can adapt to a specific system and, as a result, “yes, we need a data warehouse” or “no, today the gain will be too small.” I think the specific questions that I have to answer are as follows:

  • At what point is the data warehouse being built, is it worth considering the issue? In other words, what control features, metrics or other criteria should be sought, this may indicate that the standard transactional environment is no longer enough?

  • What are the alternatives to a full-featured data warehouse? Denormalization in a transactional database and a replicated "report server" are two that come to mind; are there any others that i should learn before committing a DW?

  • Why is a data warehouse better than the alternatives mentioned? If the answer "depends on", then what does it depend on?

  • When shouldn't I try to create a data warehouse? I am skeptical of anything declared as "best practice," regardless of context. Of course, there should be some scenarios where DW is the wrong choice - what are they?

  • Are there any practical examples that I could consider regarding systems that have been improved by implementing a data warehouse? Something that would explain to me, cross-cutting, what solutions or analysis did they need in the warehouse, how did they decide what to put in it, and how did the warehouse end up in a larger environment? I don’t want the far-fetched “letting you make a cube from the AdventureWorks database” - the implementation is not related to me, I'm interested in the specifications and designs and the general thinking process that were involved.

Usually I try not to ask the multicomponent, but I think that they are all very closely connected. I am ready to accept any answer that affects at least the first 4 questions, although the latter would really help crystallize this in my mind. Links are good if someone has already written about this, if they are concise and specific enough (link to Ralph Kimball's home page = not useful).

I hope that I have set the question clearly - in advance for your answers!

+43
database database-design data-warehouse
Jan 02 '09 at 19:44
source share
7 answers

I will see if I can do my best to answer your questions succinctly.

1. At what point is the data warehouse being built, is it worth considering the issue? In other words, what are the control signs, metrics, or other criteria, should I indicate that the standard transactional environment is no longer enough?

but. If you find that reporting and monitoring are degrading the performance of your production system and / or offline data warehouse.

b. If you find that getting answers to your business questions requires creating a lot of complex SQL every time.

from. If you find that every time you make changes to your transactional scheme, you need to go back and process all your requests for reporting.

e. If you want to combine data from several sources.

2. What are the alternatives to a full-featured data warehouse? Denormalization in the transactional database and the standard swamp replicated "report server" - two that come to mind; are there any others i should investigate moving to dw?

3. Why is the data warehouse better than the alternatives mentioned? If the answer is "it depends", then what does it depend on?

I will answer them together. I would not think of a data warehouse as a particular business. It’s just a short phrase, which means “storing your data so that you can more easily and quickly answer business questions.”

Transactional databases are designed for effective interaction with applications. Data warehouses, data marts, operational data warehouses, and reporting tables are designed to work effectively with people, if that makes sense.

4. When should I not try to build a data warehouse? I am skeptical of everything that is declared “best practice,” regardless of context. Of course, there should be some scenarios where DW is the wrong choice - what are they?

Good question. If your transaction system provides you with a good idea of ​​your business, you probably do not need to store.

If you have only one data source and performance, this is not a problem, you can get an idea of ​​creating simple report tables.

5. Are there any practical examples that I could consider in systems that have been improved by entering warehouse data? Something that explain to me, from end to end, what types of decisions or analysis they need in the warehouse, how they decided what to put in it, and how the warehouse got into a big environment? I don’t want the far-fetched ", let the cube from the AdventureWorks database" - the implementation does not matter to me. I am interested in the specifications and design and the general idea that were involved.

This is a big question that will take up much more space than I am here.

In this case, I can point you to a few places that could provide the insight that you seek.

  • Implementing a Data Warehouse: The Methodology That Worked by Bruce Ulry is a book that describes a one-way journey to creating a data warehouse. He is not very polished, which gives him more realism. It reads like a magazine with a lot of models and other visual effects that well illustrate its efforts.
  • Business Intelligence Roadmap by Larisa Moss. Standard price. Guides you through the process of building a high-level BI practice.
  • "The Impact of Business Intelligence Profits" Steve Williams provides a series of case studies that show the value of creating data warehouses.
+35
Jan 02 '09 at 21:33
source share
— -
  • The main goal of DW is to speed up (simplify) reporting and analytics. It allows you to cut and slice data in any way that a business user can think of.

  • For the first step of DW, you can simply implement the Kimball star schema and run SQL queries against it. If this turns out to be too slow, start thinking about pre-calculated aggregates (cubes).

  • Cutting and processing information against DW is easier than against a normalized database. The replicated report server will improve performance but not simplify slicing and processing. Also keep in mind that DW belongs to business users, so at any time they can come up with different ideas for pieces / cubes - IT people just need to provide an environment in which something like this is possible.

  • If you just run several reports from time to time on your operating system and are satisfied with the performance, there is no need for a DW.

  • All my experience has been with systems in which business users endlessly complain about slow reports and the inability to write “complex queries,” while production people complain that the database is bogged down due to reporting. In all cases, a fairly good Kimball star and report server with cache and snapshots.

+4
Jan 02 '09 at 21:12
source share
  • You should consider creating a data warehouse when two of the following criteria are met:

    • Huge amount of data
    • Many large complex selects (possibly compared to a few inserts, updates, and deletes) that simply take a lot of time to complete (and fill out for writing)
    • Data from different systems must be combined.
  • It really is a question of what you consider a data warehouse. In many cases, you can gradually move from OLTP systems with some reports to a full-blown data warehouse, if you can stick to a relational database management system. First, you could create the first fact table and continue to use normalized tables for measurement. Then add more facts, more fact tables or highlighted dimension tables to the game. First, in one database (or in one of the databases of the systems involved), perhaps later we will move to a separate database.

  • A complete set of data (a separate database, a star chart) offers the best options for setting up operators of choice, from the transition to a specialized system. It is also completely separate from the oLTP system (s). Think of a circuit, but also of resources such as a processor, I / O and memory, and organizational functions such as scheduling new releases. Of course, this is a big job that you do not need.

  • In the answers above: just because you have several complex queries does not mean you have to create DWH, the same is true for other criteria if they are isolated.

    / li>
  • You can’t offer much here, but a tip: go fast. DWH requirements are dependent on user capabilities. There, the requirements are likely to change. Automating tests with databases is a pain, but losing weight in a production system without proper tests is worse.

+3
Jan 02 '09 at 20:38
source share

At what point is the data warehouse being built, is it worth considering the issue? In other words, what control features, metrics or other criteria should be sought, this may indicate that the standard transactional environment is no longer enough?

I would recommend a data warehouse when you notice that running reports and analytic operations in a transactional data warehouse is bad for both.

What are the alternatives to a full-featured data warehouse? Denormalization in a transactional database and a replicated "report server" are two that come to mind; are there any others that I should research before committing a DW?

I have nothing to offer here. I would say that storing transactional and reporting databases seems reasonable to me, regardless of whether you call it a warehouse or not. Data mining can be very intense in working with the processor.

Why is a data warehouse better than the alternatives mentioned? If the answer is "dependent", then what does it depend on?

I have nothing to offer here.

When should you not try to create a data warehouse? I am skeptical of anything declared as "best practice," regardless of context. Of course, there should be some scenarios where DW is the wrong choice - what are they?

I would say that if you do not need to keep a long history, you do not conduct intensive data analysis, and your reporting needs are limited by a special request from time to time, and then, perhaps, a data warehouse is not required.

Are there any practical examples that I could consider regarding systems that have been improved by implementing a data warehouse? Something that would explain to me, cross-cutting, what solutions or analysis did they need in the warehouse, how did they decide what to put in it, and how did the warehouse end up in a larger environment? I don’t want the far-fetched “letting you make a cube from the AdventureWorks database” - the implementation is not related to me, I'm interested in the specifications and projects and the general thinking process that were involved.

My employers all used data warehouses many years before my arrival, so I can’t talk about what happened before I arrived.

+2
Jan 02 '09 at 19:54
source share

In my experience, the first sign that you started thinking about a data warehouse is that you have (or are developing) a transactional database, and users begin to add a lot of reports and data history requirements. It is almost always. It is always easier to have a separate data warehouse or report database than to try to develop a transactional system that processes the reporting needs that end users always have. Storing history (for business objects) in a transaction system adds complexity and inflates the database, which should be as flexible as possible.

On the other hand, I was in large companies where many groups created data warehouses, because the data of interest was distributed in many systems, so it was difficult to request. The problem was that each group created its own data warehouse, because all existing warehouses in the company did not have the necessary subset of information or had a data model that was considered suboptimal or incorrect. This worsened the situation by creating even more disparate data systems that are difficult to compare.

+2
Jan 02 '09 at 20:04
source share

DW might be considered if you are using a Transactional System from a long period. Later, they realize that they need to do some data mining to define different data models for the business. And finally, with the help of certain data templates, a person wants to help top management make further decisions in the interests of the company.

The following steps must be taken to create a data warehouse:

  • For the database, you must define the platform and ETL database.
  • For visualization, you need to select a reporting tool such as SSRS, Tableau, etc.
  • You can use the Data Analytical data language, such as R, for future reference.
  • Finally, all of this will help in developing a data warehouse and reporting tool.
0
Jul 20 '15 at 5:04
source share

"I think why some projects fail?"

There are five main reasons:

  • lack of partnership between the IT department and business users;
  • Incorrect data warehouse architecture
  • insufficiently experienced people;
  • improper planning, for example, refusing to use a proven methodology and a plan to ensure that no details are omitted;
  • and depending on overflow technology.
-one
Aug 08 '13 at 15:55
source share



All Articles