Exchange 2013 DAG quorum lost

Today some maintenance had to be done on a Exchange 2013 mailbox server, which was in a 2-node cluster using a fileserver share as witness.

The particular Exchange server was disabled on our load balancer to drain connections. Next, the StartDagServerMaintenance.ps1 script was used to prevent new sessions and to failover the mailbox databases to the other Exchange server.

These actions were performed OK and the server was ready to be shut down and perform maintenance. After shutting down, the mailbox databases were dismounted on the second Exchange server and could not be mounted anymore. Uh-oh..

The reason for not being able to mount the mailbox databases was due to the fact that quorum was lost. I saw the following error when opening up the Microsoft Failover Cluster Manager:

The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk.
Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.

The strange thing was, that the fileserver running the witness share was fine and reachable.
Because the offline Exchange server could not be brought online in a matter of minutes, I had to override the quorum safety and bring the Cluster Service back online using the ForceQuorum command:

net start clussvc /fq

I got this command from the following Microsoft TechNet Article: http://technet.microsoft.com/en-us/library/cc770620(v=ws.10).aspx

After running the command, the cluster was back online and mailbox databases were abled to be mounted again. Just before maintenance was completed on the Exchange server and before booting it up again, I disabled the Cluster Service on the secondary server because of the fact that this server was running in ForceQuorum state. This to prevent data loss or corruption.

When the server was booted up again, I started the Cluster Services on both servers and everything returned back to normal.

The reason for the lost quorum is probably due to the fact that the Cluster Service is configured with “Node Majority”, which isn’t a setting you want with 2 nodes =)
Tomorrow we will investigate if the “Node and File Share Majority” is a better choice, which probably is due to the fact that we are using a file server share as witness.