I recently deployed a new domain for a customer. Initially it had only one domain controller because we had to rush the network into production. We had a machine on order which I built up a few days later and promoted last Friday. I was doing some remote work on Saturday when I got some alerts about AD replication. I’d already had some of those after the initial DCPromo but I expect those after that first reboot while the directory/SYSVOL replicates and FRS allows the server to become a DC for the first time.
I logged into the DC’s and could find nothing wrong in the event logs. Absolutely nothing! AD replication, on the face of it, appeared fine. I logged into DC2 and tried to force replication between DC1 and DC2 and that’s when I found I got an error: "Naming context is in the process of being removed or is not replicated from the specified server." Uh-Oh!
How did SCOM notice this problem but the event logs didn’t? Installing a SCOM agent creates some containers and objects in the directory. SCOM agents update objects and measure to see how long it takes to replicate those changes between DC’s with agents on them. If it exceeds a pre-determined time then replication isn’t working correctly. It’s a perfect test.
I tried to delete the KCC generated link object that was failing and replace it with a manual one via the GUI. That failed. So I resorted to using the Support tools (SUPPORT.MSI on the Server 2003 CD) and REPADMIN. The complete guide is available to read.
The high level steps were as follows:
- DC1 was OK. DC2 was failing to replicate from DC1. I logged into DC2.
- I installed the support tools on DC2.
- I opened the support tools CMD.
- I ran "readmin /showreps DC2" on DC2 to retrieve the GUID’s of DC1 and DC2.
- I used AD Sites and Services to manually delete any connection objects to replicate from DC1 to DC2 (found under DC2).
- I ran "repadmin /add "cn=configuration,dc=mydomain,dc=internal" <DC1 GUID>._msdcs.mydomain.internal <DC2 GUID>._msdcs.mydomain.internal".
- I forced a full replication using "repadmin /sync cn=configuration,dc=mydomain,dc=internal DC1 <DC2 GUID> /force /full".
- I refreshed AD Sites and Services under D2 and forced a full replication there – everything was OK.
- I monitored SCOM for any more events. Everything was good.
SCOM to the rescue!
Credit: Gary Olsen.