After running a failover, it is common to find that your domain controller (DC) is not operating properly. Because these servers control the overall domain and DNS functionality, this can cause the rest of your protected services or applications to fail. You may not even be able to log in to your other servers, as they cannot authenticate or do not have a trust relationship with your domain controller.
Unfortunately, there can be various causes for issues that impact your domain on failover. It is recommended to view and export any relevant event viewer logs and open a case with Microsoft. However, before doing so, you may want to run through the following steps.
The first, and easiest, thing to check once logged in to a failed over DC server is the network profile. You can do this by hovering over the network icon in the system tray in the bottom right corner. Alternatively, you can open the Network and Sharing Center (pictured below) and view your active networks. In your production environment, you will most likely see that this is set to something like, “domain.local” and is using the domain network profile.
After a failover, you may see that this has changed in your DC to the public network profile. This can be troublesome as it is a good indicator that the domain cannot be reached and there may be other problems. Moreover, you may have the Windows firewall enabled on the public network profile. This may block remote access to this server, as well as any communication to the rest of your failed over environment.
You may be able to resolve this rather easily by restarting the network location awareness (NLA) service. This will also automatically restart the network list service. Restarting these services allows Windows to try and reidentify the network it is attached to and hopefully join the domain network profile. Ensuring that your DC is on the domain network profile will help the rest of the environment boot up with the domain network profile, as well.
Before actually running a failover, it is recommended to change the NLA service startup type to “Automatic (Delayed Start)” (pictured below).
This gives the services extra time before assigning which network profile for the server. Many times, the public network profile is assigned because DNS and AD services have not started yet. So, by setting a delayed start, you have a better chance of booting up to the correct domain network profile after a failover. Another recommendation to is to make sure that the DC has itself configured as the primary DNS server. This may be the first server booting up in the failover environment and it is likely that you would not have connectivity to other DNS or DC servers at this point. So, pointing to itself for DNS is the easiest way to enable hostname resolution for this server.
In order for your domain controller to work properly, you will need to make sure that DNS is functioning as expected. As mentioned above, you may want to change the primary DNS server of the DC to its own IP address. This ensures the server is not trying to use a DNS server it has no communication with after a failover. If you have confirmed this domain controller has itself configured as the primary DNS server, next you will want to confirm the DNS is actually working. You can run an NSLOOKUP command on your domain to see if you get a correct hostname resolution.
If this fails, you can open the DNS tool through the server manager and ensure that it connects and starts up as expected. If not, you may need to review the DNS server configuration in your production environment. You may have stale DNS or forwarder entries that are no longer in use that may be causing issues during the failover. However, while failed over, you can try restarting the DNS service to see if that resolves the issue. If that continues to fail, you can check the event viewer logs under “Applications and Services > DNS Server”.
Assuming you have confirmed that you are on the domain network profile and the DNS is working as expected, you can check to confirm that the SYSVOL folder has populated. The SYSVOL directory is the DC server’s copy of the domain public files and contains the user login scripts, NETLOGON information, group policy data and more. If this folder has not populated, the domain will not have full functionality. You can verify this folder has populated by opening a file browser and typing in:\\<DomainControllerIP/Hostname>\SYSVOL. This should have a directory with your domain name. If you are not able to reach this folder or if there is nothing populated, your domain controller may not be in authoritative mode.
To resolve this, you can do a quick registry change to manually force this DC server into authoritative mode. Before you continue, please use caution when making changes to the registry. Also, this change is only necessary to do on the failed over domain controller. You will not need to run through this process on the production domain controller. First, open a command prompt as an administrator and run the following command: Net Stop NTFRS. Once that completes, open regedit and browse to the following location at startup: HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NtFrs\Parameters\Backup/Restore\Process. Right-click on the burflag key and choose “Modify.” Set the value data to D4 and then click “OK.”
Back at your command prompt, type in the following command: Net Start NTFRS.
Once you have completed these steps, check the SYSVOL directory to confirm that the domain folder has populated. If it has not populated, you may want to try rebooting the domain controller and then review the event viewer logs to find more information. The bottom line here is that if the SYSVOL folder is not populating, you will not have a full domain functionality. You will want to ensure that this folder is populated before failing over the rest of the environment.
While the above topics are very common issues that can impact domain on failover, there are other factors that may cause disruption or errors. If you have multiple domain controllers in your production environment, you may need to replicate and include all servers in your failover plan. If you are unable or do not want to include all DC’s, you will want to ensure that you are at least failing over the FSMO master. If you are unsure which DC has which roles, you can run the following command: NetDOM /query fsmo. This will display the server that is the holder of each FSMO role. This server needs to be replicated and failed over first to allow domain functionality.
You may also want to review the DNS servers that are configured on your other servers in your production environment. While we have already gone over the importance of your domain controller pointing to itself for DNS, you also want to ensure that the rest of your servers are pointing to your DC. After you failover the DC, you may then failover an Exchange or file server. However, if they are using a different DNS server that has not been failed over, they may not be able to join the domain in the failover environment. Before failing over, it is best to confirm the DC/DNS server you are failing over, is the same DNS server used by other protected servers.
As mentioned at the beginning of this article, there are numerous issues or misconfigurations that can impact a domain during a failover event. While the checks and considerations can be helpful in many situations, it may not resolve all of your issues. A reboot is always a good idea, but the logs in event viewer will be very helpful in determining the root cause and help steer you to a resolution.
Below is a diagram to recap all of the steps we discussed: