Best way to manage multiple tenant repositories in Azure?

We are creating an application with several tenants, which should share data between tenants. Each tenant will save different documents, each of which can fall into several different categories of documents. We plan to use the Azure blob repository for these documents. However, given our user base and the number of documents and the size of each, we don’t know how best to manage the accounts in our current Azure subscription.

Here are some numbers to consider. With 5,000 users at 27,000, 8 MB of documents per year per user, which is 1080 TB per year. The storage container is maximum at 500TB per storage account.

So my question is, what would be the most efficient and economical way to store this data and stay within Azure?

Here are a few things we covered:

  • Create a vault account for each customer. THIS DOES NOT WORK because you can only have 100 subscription accounts (this would be the most ideal solution).

  • Create a blob container for each client. A storage account can have up to 500 TB, so this can potentially work, except in the end we will have to split up to other accounts. I am not sure how this will work if the user ultimately has data in two accounts. May be messy.

Perhaps we are missing something fundamentally simple here.

UPDATE At the moment, we are thinking of using Azure table storage with a table for each type of document. In each table, the section key will be the tenant identifier, and the row key will be the document identifier. Each line will also contain information such as metadata for the document, as well as a URI (or something else) that associates with the blob itself.

+8
source share
2 answers

Not quite the answer, but think of it as "food for thought" :). Basically, your architecture should be based on the fact that each storage account has scalability targets , and your design should be such that you do not exceed these to ensure high availability of storage for your application.

Some recommendations:

  • Start by creating multiple accounts (say 10 for starters). Name them Pods .
  • Each tenant will receive one of the containers. You can choose the case of storing the storage in random order or use some predefined logic. Information about the container is stored along with information about the tenant's side.
  • From the description, it seems that at the moment you save the file information in only one table. This would greatly affect only one table / storage, which is not a scalable IMHO design. Instead, when a tenant is created, you assign a pod to the tenant, and then create a table for each tenant that will store the file information in this table. This will have the following advantages: 1) you have well isolated the data of each tenant, 2) the read requests are now balanced by load, which allows you to stay within the scope of scalability goals, and 3) since the data of each tenant is in a separate table, your PartitionKey become free, and if necessary, you can assign a different value.

Now let's start storing files:

  • Again, you can go with the Pod concept, in which the files for each tenant are in the Pod storage account for that tenant.
  • If you see problems with this approach, you can accidentally select a Pod storage account and put the file there and save the blob URL in the Files table.
  • You can either go with only one blob container (say tenant-files ), or separate containers for each tenant.
  • With just one blob container for all tenants, management overhead is less since you just need to create this container when the new Pod is introduced. However, the disadvantage is that you cannot logically separate files from the tenant, so if you want to provide direct access to files (using a shared signature), this will be problematic.
  • With separate blob containers for each tenant, the overhead of management is greater, but you get good logical isolation. In this case, when the tenant is delivered on board, you will need to create a container for this tenant on each storage account. Similarly, when a new block is ordered, you must make sure that a blob container is created for each tenant in the system.

Hope this gives you some insight into how you can develop your solution. We use some of these concepts in our solution (which explicitly uses Azure Storage as the data store). It would be really interesting to see which architecture you came up with.

+9
source

I'm just going to talk about my topic, and she has some kind of redundant information for Gaurava Mantry's answer. This is based on the design that I came up with, making something very similar to my current job.

Azure Blob Storage

  • Randomly select a pod from the pod pool when the tenant is created and save its namespace along with the tenant information.

  • Provide api for creating containers where container names are compound for the tenant identifier Guid::ToString("N") + <resourcename> . You do not need to sell to your users as containers, I can be folders, work sets or a file field, you will find a name.

  • Provide api for storing documents in these containers.

This means that you can simply increase the pod pool , if you get more tenants, ect delete those pods that are filling.

The advantages of this are that you do not need to store two systems for your data, using both table storage and memory storage. Blob already has a way to present data as a directory / file hierarchy.

Extension points

Blob storage api broker

In addition to the design above, I made Owin middleware that is exchanged between clients and the blob repository, basically just redirecting requests from clients to the blob repository. This step is disabled, so it is not required, since you can delegate normal sas tokens and talk directly to the block storage from clients. But it makes it easy to connect when actions are performed in files. Each tenant will receive its endpoint files/teantid/<resourcename>/

Using such an API will also allow you to connect to any token authentication system that you can already use to authenticate and authorize incoming requests, and then sign the requests in this API.

Blob Memory Metadata

Using the api broker extension listed above, combined with metadata, you can actually take one more step and modify incoming requests to always include metadata and add filters to xml returned to the blob repository before sending them to clients for filtering containers or bunches. For example, when users delete blob and then set x-ms-meta-status:deleted and filter them when blobs / containers return. Thus, you can add various procedures to delete data behind the scenes.

You need to be careful here, since you do not want to introduce a lot of logic here, since it adds a penalty to all requests, but doing it smart can make this work very enjoyable.

These extensions will also allow your users to create “empty” subfolders inside the container, but place a zero byte file with the status: hidden, which will also be filtered out. (remember that the blob repository can only show virtual folders if there is anything in them). This can also be achieved by storing tables.

Azure Search

Another important point of expansion is that for each blob you can save it in Azure Search to be able to find content, and this is most likely my favorite. I don’t see any good solution using only a memory store or table store, which can give you good search functionality or, to some extent, even good filtering. Thanks to Azure Search, this will give users a truly rich experience for finding their content again.

snapshots

Another extension is that snapshots can be taken every time a file changes automatically. It becomes even easier with the api broker, otherwise log monitoring is an option.

These ideas come from a project that I started with what I wanted to share, but since I’m busy at work in the coming months, I don’t see me release my project before the summer holidays give me time to finish. The project’s motivation is to provide the nuget package, which allows other developers to quickly configure this api broker, which I mentioned above, and configure a solution for storing multi-user repositories.

I ask you to vote for this answer if you read this and believe that such a project could save your time in your current development process. That way, I can see if I can use more time on the project or not.

I think that the answer of Gaurara Mantris is more suitable for the question above, but I just wanted to share my ideas on this topic.

+5
source

Source: https://habr.com/ru/post/984096/


All Articles