Cosmos DB Replicas With Private Endpoint

This post explains how to make Cosmos DB replicas available using Private Endpoint.

The Problem

A lot of (most) Azure documentation and community content assumes that PaaS resources will be deployed using public endpoints. Some customers have the common sense not to use public endpoints – who wants to be a zero-day target for well-armed attackers?!

Cosmos DB is a commonly considered database for born-in-the-cloud workloads. One of the cool things about Cosmos DB is the ability to use any number of globally dispersed read-only or write replicas with pretty low replication latency.

But there is a question – what happens if you use Private Endpoint? The Cosmos DB account is created in a “primary” region. That Private Endpoint connects to a virtual network in the primary region. If the primary region goes offline (it does happen!) then how will clients redirect to another replica? Or if you are building a workload that will exist in many regions, how will a remote footprint connection to the local Cosmos DB replica?

I googled and landed on a Microsoft forum post that asked such a question. The answer was (in short) “The database will be available, how you connect to it is your and Azure Network’s problem”. Very helpful!

Logically, what we want is:

What I Figured Out

I’ve deployed some Cosmos DB using Private Endpoint as code (Terraform) in the recent past. I noticed that the DNS configuration was a little more complex than you usually find – I needed to create a Private DNS Zone for:

  • The Cosmos DB service type
  • Each Azure region that the replica exists in for that service type

I fired up a lab to simulate the scenario. I created Cosmos DB account in North Europe. I replicated the Cosmos DB account to East US. I created a VNet in North Europe and connected the account to the VNet using a Private Endpoint.

Here’s what the VNet connected devices looks like:

As you can see, the clients in/peered with the North Europe VNet can access their local replica and the East US replica via the local Private Endpoint.

I created a second VNet in East US. Now the important bit: I connected the same Cosmos Account to the VNet in East US. When you check out the connected devices in the East US VNet then you can see that clients in/peered to the North America VNet can connect to the local and remote replica via the local Private Endpoint:

DNS

Let’s have a look at the DNS configurations in Private Endpoints. Here is the one in North Europe:

If we enable the DNS zone configuration feature to auto-register the Private Endpoint in Azure Private DNS, then each of the above FQDNs will be registered and they will resolve to the North Europe NIC. Sounds OK.

Here is the one in East US:

If we enable the DNS zone configuration feature to auto-register the Private Endpoint in Azure Private DNS, then each of the above FQDNs will be registered and they will resolve to the East US NIC. Hmm.

If each region has its own Private DNS Zones then all is fine. If you use Private DNS zones per workload or per region then you can stop reading now.

But what if you have more than just this workload and you want to enable full name resolution across workloads and across regions? In that case, you probably (like me) run central Private DNS Zones that all Private Endpoints register with no matter what region they are deployed into. What happens now?

Here I have set up a DNS zone configuration for the North Europe Private Endpoint:

Now we will attempt to add the East US Private Endpoint:

Uh-oh! The records are already registered and cannot be registered again.

WARNING: I am not a Cosmos DB expert!

It seems to me that using the DNS Zone configuration feature will not work for you in the globally shared Private DNS Zone scenario. You are going to have to configure DNS as follows:

  • The global account FQDN will resolve to your primary region.
  • The North Europe FQDN will resolve to the North Europe Private Endpoint. Clients in North Europe will use the North Europe FQDN.
  • The East US FQDN will resolve to the East US Private Endpoint. Clients in East US will use the East US FQDN.

This means that you must manage the DNS record registrations, either manually or as code:

  1. Register the account record with the “primary” resource/Private Endpoint IP address: 10.1.04.
  2. Register the North Europe record with the North Europe Private Endpoint IP: 10.1.0.5.
  3. Register the East US record with the East US Private Endpoint IP: 10.2.0.6.

This will mean that clients in one region that try to access another region (via failover) will require global VNet peering and NSG/firewall access to the remote Private Endpoint.