Azure Table Partitioning Strategy

Question

Azure Table Partitioning Strategy

I'm trying to come up with a DateTime-based partition key strategy that does not lead to a bottleneck that can only be written to Append, which is often described in best practice guidelines.

Basically, if you share something like YYYY-MM-DD, all of your recordings for a given day will end up being the same section, which will decrease recording performance.

Ideally, a partition key should even distribute entries to as many partitions as possible.

To achieve this, while still basing the key on a DateTime value, I need to come up with a way to assign what constitutes a bucket of dateline values, where the number of buckets is a predefined number per time interval - say, 50 per day. The purpose of the data row in the bucket should be like possible more random, but always the same for a given value. The reason for this is because I need to be able to always get the correct section, given the original DateTime value. In other words, it looks like a hash.

Finally, and critically, I need a partition key that is consistent at some aggregate level. Thus, although the DateTime values for a given interval, say, 1 day, will be randomly distributed between X partition keys, all partition keys for that day will be between the range query. This would allow me to query all the rows for my aggregate interval, and then sort them by DateTime to get the correct order.

Thoughts? This should be a pretty well-known problem that has already been resolved.

+2

nosql scalability azure azure-table-storage

RMD Jul 16 '13 at 3:07

source share

2 answers

To add Eoin to the answer, below is the code I used to simulate its solution:

var buckets = new SortedDictionary<int,List<DateTime>>(); var rand = new Random(); for(int i=0; i<1000; i++) { var dateTime = DateTime.Now; var bucket = (int)(dateTime.Ticks%53); if(!buckets.ContainsKey(bucket)) buckets.Add(bucket, new List<DateTime>()); buckets[bucket].Add(dateTime); Thread.Sleep(rand.Next(0, 20)); }

Thus, this should simulate approximately 1000 requests arriving in each place from 0 to 20 milliseconds.

This led to a pretty good / even distribution between 53 "buckets". This also led, as expected, to the avoidance of the anti-template only for add or preind.

0

RMD Jul 17 '13 at 13:45

source share

Eoin campbell · Accepted Answer · 2013-07-16T09:51:27+0000

How about using the millisecond component of the date timestamp, mode 50. This will give you a random distribution throughout the day, the value itself will be consistent, and could you easily calculate the PartitionKey in the future, given the original timestamp?

Azure Table Partitioning Strategy

More articles: