In this article, I am going to explain how global distribution works and how outage been handles in Cosmos DB. In my previous article discus about basics of cosmos DB.
How global distribution is works:
Azure Cosmos DB is classified as a foundational service in Azure it means Cosmos DB is available in all new regions by default. So, we can distribute our database in upcoming region also.
Azure Cosmos DB separate Databases in two types Primary database (also called as Write database) and Secondary (also called as Read database). Based on our business we can select primary and secondary regions. We should be more conscious while selecting regions because each single click will end up with excessive cost.
What is Primary Database:
Primary database is your main data base all insert, update and delete done in primary database. Primary database is default and mandatory that’s the reason service location acted as write region.
Primary database is chosen automatically when you select location in new Cosmos DB service creation. Select your primary database where you have more client.
What is Secondary Database:
The secondary database means replica of primary data base. It is used to speed up the data accessing in retrieval. This is not a mandatory we can pick based on our needs. We can select any number of read data region in a just single click it is up to you. But be cautious every single selection is cost oriented. Data replication in various region across globe very fast and durable within a minute. It is the beauty of Cosmos DB.
Failover means recover the fails. Natural disaster is unpredictable and we cannot escape from that. But our precaution plans and steps helps to recover from that without any loss. In my above picture I selected South India as my primary(write) region. Unfortunately, for example, my region affected by Natural disaster and my data center is in outage. Your application will be down because data center region also same. To avoid this type of problem, we must set up failover feature in Cosmos DB.
To handle this situation, we go for Replicate data globally. Once again, I must mention that Data center outage is a rare event.
How Failover works:
As I explained, we can have any number of read data center as much as we need. One of the read data center working as write data center as per our priorities of read data center. Priorities working in top to bottom approach (Priority 1,2 up to n). If priority data center not working then go to next priority data center.
We can accomplish failover in two ways. They are
1. Manual Failover
2. Automatic Failover
Let me explain how to implement this feature step by step.
1. Create one Cosmos DB service
2. Set one write data region and read data region.
3. Click on Replicate data globally
4. Then click Automatic failover
5. Click enable automatic failover ON
6. Drag and drop our priorities and click Ok
It is addition to Automatic failover. We can change write region manually in specific account. We can do the same using azure portal or programmatically.
Why we need Failover:
In enterprises application we need to provide compliance certification with Business Continuity and Disaster Recovery (BCDR) and High Availability and Disaster Recovery(HADR).
We can test the BCDR readiness of our applications that use Cosmos DB for storage by triggering a manual failover of your Cosmos DB account and/or adding and removing a region dynamically
Predictable clock model:
If our applications have predictable traffic patterns based on the time of the day, you can periodically change the write status to the most active geographic region based on time of the day.
In this article, we have seen details of how global distribution is working and how it handles failover. Types of failover methods and steps to implement the same. I plan to write next article on Cosmos DB consistency level and how to set up multiple write data center and how to handle the data duplication on multiple write.