Data Warehouse Solution?

I am developing a system that removes information from other systems. For example, there is a Customer database, and customers can be deleted if their most recent order is six years or more.

I can imagine two approaches to this:

  • Creating classes for each system, for example. Customer (for customer database), Order (for order database), etc. The function has a class "Delete" in each class for polymorphism, that is, the client can call "Delete" in each class, which will delete the necessary entries.
  • Copy all the information needed to make a decision about an object, for example. Order the copy database using SSIS and create one large query for everything you need to delete. This is a data warehouse type approach.

I can’t decide which option is better.

+1
source share
3 answers

I would take a problem with your first approach. I'm not sure it makes sense that all individual business classes somehow use the same interface for the Delete method. The parameters required to delete the customer may not coincide with the parameters necessary to delete the order. It seems strange to make them use the same interface. Performing this method may unnecessarily limit the flexibility of your code in the future. In addition, it seems strange that the business logic for deleting a client will include blind passing through other objects, in a certain order, and calling the Delete method for all of them. I would think that you want more control over the order in which they are deleted, and how it is processed when one of the removal methods does not work.

I would suggest first thinking about it at a higher level. What business methods do you really need? Based on the scenario you described, I see two business methods:

  • Get a list of all customers without orders for the last 6 years.
  • Delete customer (together with all his orders)

Since both of these methods are related to customers, it would be advisable to combine them into one business class for customers. For instance:

 Public Interface ICustomerBusiness Function GetStaleCustomers(timeSinceLastOrder As TimeSpan) As IList(Of CustomerDto) Sub DeleteCustomer(customerId As Integer) End Interface 

Once you encapsulate business logic, it doesn’t matter how the data access layer is implemented. The client makes calls to these simple business-level methods and does not care about how logic works behind the scenes. With business logic encapsulated in your own layer, you can freely rewrite it in different ways without rewriting any of your client codes.

So, what would the logic look like inside this business class? This can be either a single call to the data access method that does all the work using a massive SQL command, or it can make many calls for each step separately. This is really for you. Obviously, the former will be more effective, but the latter will be more flexible. Everything will depend on your needs. Here is a simple example of how everyone might look:

 Public Sub DeleteCustomer(id As Integer) _customerDataAccess.DeleteCustomerAndOrders(id) End Sub ' or... Public Sub DeleteCustomer(id As Integer) For Each i As OrderDto In _orderBusiness.GetOrdersByCustomer(id) _orderBusiness.DeleteOrder(i.Id) Next _customerDataAccess.DeleteCustomer(id) End Sub 

The second option will be more flexible for several reasons. For instance:

  • You will have finer control over what happens and when. This will allow you to provide detailed status updates, if necessary, during the process. It will also allow you to provide a detailed trace log and more accurate error messages when something fails.
  • The business logic for deleting orders will be split into a separate reusable business class. If you need to delete only the order, somewhere in another place in the code, you can do this using a common code.
+2
source

Both seem to have their advantages and disadvantages, I personally go to the second route, especially if you are dealing with very large volumes of records, so as not to associate the database with continuous calls to delete records.

+1
source

A third approach is to use a multi-agent messaging system. This approach is ideal for very complex scenarios.

Here's the script:

The user runs a command to delete the object (order, customer, etc.). The tool the user is working with creates a message in the work queue that represents the user's intention (for example, "Delete Client 123")

A message is processed by one or more agents. Each agent is specific to part of a larger operation, and listens only to the appropriate Messages. All agents work in a single distributed transaction. This means that each agent has a very narrow specific scope, but any agent can reject the general operation. If the agent must perform other sub-tasks, it can queue additional messages for these operations (such as deleting each order owned by the customer).

This approach is very well suited, especially for very complex interactions. This avoids any one system that needs to be aware of all the others. Each agent knows which messages to process, and processes a very specific task associated with this message.

At first it is configurable, but very extensible (you can add new agents, messages, etc., without affecting existing ones).

If you decide to use this approach, look at MassTransit for the framework (there are others). If you work in .NET, this is a very good system, powerful, but affordable. These sagas are especially good for coordinating complex interactions between several agents.

+1
source

Source: https://habr.com/ru/post/1499014/


All Articles