Keyboard key for storing Azure tables

Two somewhat related questions.

1) Is there anyway to get the identifier of the server on which the table object lives? 2) Will using a GUID give me a better partition key distribution? If not, what will happen?

For a long time, we fought for table storage performance. In short, this is very bad, but at an early stage we realized that using a randomized partition key would spread entities on many servers, which is what we want to do when we try to reach 8000 views per second. Apparently, our partition key was not random enough, so for testing purposes I decided to use a GUID. The first impression is waaaaaay faster.

Really bad performance is & lt; 1000 per second. The section key is Guid.NewGuid (), and the line key is the "UserInfo" constant. Get is performed using the TableOperation function with pk and rk, and nothing happens as follows: TableOperation retrieveOperation = TableOperation.Retrieve (pk, rk); return cloudTable.ExecuteAsync (retrieveOperation). We always use indexed reads and never scan tables. In addition, the size of the virtual machine is medium or large, nothing less. Parallel no, asynchronously yes

+4
source share
3 answers

As other users note, Azure tables are strictly controlled by the runtime, and therefore you cannot control / check which specific storage nodes process your requests. In addition, any dedicated partition is served by a single server , that is, objects belonging to the same partition cannot be shared between several storage nodes (see HERE )

In a Windows Azure table, the PartitionKey property is used as the partition key. All objects with the same PartitionKey value are grouped together and served from a single node server. This allows the user to control the locality of entities by setting PartitionKey values ​​and performing Entity Group transactions on objects in the same section.

Do you mention that you are targeting 8000 queries per second? If so, you can get a threshold that requires a very good table / section design. See " Azure Storage Abstractions and Scalability Targets ".

The following example applies to your situation:

This will provide the following scalability goals for the individual vault account created after June 7, 2012.

  • Capacity - up to 200 TB
  • Transactions Up to 20,000 objects / messages / blocks per second.

As other users have pointed out, if the PartitionKey numbering follows an incremental pattern, the Azure runtime recognizes this and groups some of your partitions into a single node storage.

Also, if I understood your question correctly, are you currently assigning partition keys using a GUID? If so, this means that each PartitionKey in your table will be unique, so each partition will have no more than 1 object. In accordance with the above articles, the way to scale an Azure table is to group objects in their partition keys inside independent storage nodes. If your partition keys are unique and therefore contain no more than one object, this means that the Azure table will scale only one entity at a time! Now we know that Azure is not so dumb, and it groups partition keys when it discovers the template in the form in which it was created. Therefore, if you click this trigger in Azure and Azure groups your partitions, it means that your scalability is limited by the cleverness of this grouping algorithm.

In accordance with the scalability goals outlined above in 2012, each section should be able to give you 2,000 transactions per second. Theoretically, in this case, you should have no more than 4 partition keys (provided that the workload between the four is distributed equally).

I would suggest that you create your own partition keys to group entities in such a way that no more than 2000 objects per second for each partition are reached, and deleted using the GUID as partitions. This will allow you to better support features such as the Entity Transaction Group, reduce the complexity of your table design, and get the performance you're looking for.

+9
source

Answer # 1: there is no concept of a server on which a specific table object lives. There are no specific servers on the server, as Table Storage is a massive multi-user storage system. So ... there is no way to get the server ID for this table object.

Answer # 2: select a partition key that makes sense for your application. just remember that it splits the + line to access the given object. If you do, you will have a quick, direct read. If you try to scan a table or partition, your performance will certainly succeed.

0
source

See http://blogs.msdn.com/b/windowsazurestorage/archive/2010/11/06/how-to-get-most-out-of-windows-azure-tables.aspx for more information on choosing a key ( note that the numbers are 3 years old, but the manual is still good).

Also this conversation can be useful in terms of best practice: http://channel9.msdn.com/Events/TechEd/NorthAmerica/2013/WAD-B406#fbid=lCN9J5QiTDF .

In general, this section can support up to 2000 tps, so distributing data across sections will help to achieve a greater number. Something to keep in mind is that atomic lot transactions only apply to objects that use the same partition key. In addition, for small requests, you can disable Nagle, as small requests may be delayed at the client level.

From the end of the client, I would recommend using the latest client libraries (2.1) and Async methods, since you have literally thousands of requests per second. (in the conversation there are several slides on the best practices of clients)

Finally, the next release of the repository will support JSON and JSON without metadata, which will significantly reduce the size of the response body for the same objects, and then the processor cycles needed to parse them. If you use the latest client libraries, your application will be able to use these behaviors without changing the code.

0
source

Source: https://habr.com/ru/post/1501986/


All Articles