We have data with two different sources: some come from the client, some come from different suppliers. Currently, we are physically “combining” this data into a massive table with almost a hundred columns, tens of thousands of rows and without a formal separation of the two dimensions. Therefore, we cannot actually use this table.
I am going to remake this mess into a correct, but small star scheme.
Two dimensions are obvious. One of them, for example, is time.
The data provided by the customer provides a number of actual values. Each supplier may (or cannot) provide additional fact values that correspond to the same dimensions.
All data of this fact have the same degree of detail. It can be called "rare" because we do not often receive information from all suppliers.
Here is my dilemma.
Is this fact table - with some zeros - populated from different sources?
Or is it n + 1 fact tables - one is populated from the client, and the rest from each supplier?
Each design has pros and cons. I need some opinions on the choice between merging or downloading separately.
The client provides the revenue, cost, quantity, weight and other things that they know about the completion of the transaction.
The provider provides additional information about some transactions - weights, costs, duration. Other transactions will not have value from the provider.
2 - , , , . .
. .
? ?