Thursday, July 30, 2020

Azure Cosmos DB - Partition keys


General
Just come out of a gig where Cosmos DB was the persistence engine of choice, using the SQL API. If you don't know much about Cosmos, see here.

Partition keys
One of the architectural decisions made by Microsoft confuses me. They have a notion of a logical partition, which can have a maximum size of 10GB. It is expected that your Cosmos DB usage, assuming non trivial, has to arrange for objects/documents to be partitioned across multiple logical containers.

Therein lies the rub. Cosmos DB won't do any of this partitioning for you, it is entirely up to you to arrive at some satisfactory scheme, which involves your implementation generating a partition key that potentially reflects some guidelines that Microsoft share.

For the domain I was in, a number of external organisations submitted many documents to the client, and these submissions would be distributed over a number of years, and easily exceed the 10GB logical partition limit. One of the key guidelines from Microsoft is to avoid 'hot partitions' - that is, a partition that gets used heavily to the exclusion of almost any other. This has quite serious performance implications.

So, given we don't want hot partitions, that rules out using a partition key that uses the year for any submission, as there is a strong locality of reference in play - that is, the external organisations tend to focus on the most recent temporal activity and hence Cosmos action would tend to focus on one partition for a year!

In the end, knowing that each external organisation had a unique 'organisation number', and using a sequence/modulo scheme, an effective partitioning approach was implemented. It's operation is simple, and works as below:

  • An external organisation submits a JSON document via a REST API
  • On receipt, a Cosmos stored document is found or created based on the organisation number
  • This document has a sequence number and a modulo. We calculate sequence mod modulo.
  • We increment the sequence, and save the organisation specific document
  • We now have a pro forma partition key, for organisation 7633, we might have: 7633-1, 7633-2 and so on

What this provides is for bounded yet not meaningfully limited partition counts. By judicious selection of modulo (in the case of my client, this was an integer), scalability is "assured". 

No comments: