Exchange 2013 DAG quorum lost

Today some maintenance had to be done on a Exchange 2013 mailbox server, which was in a 2-node cluster using a fileserver share as witness.

The particular Exchange server was disabled on our load balancer to drain connections. Next, the StartDagServerMaintenance.ps1 script was used to prevent new sessions and to failover the mailbox databases to the other Exchange server.

These actions were performed OK and the server was ready to be shut down and perform maintenance. After shutting down, the mailbox databases were dismounted on the second Exchange server and could not be mounted anymore. Uh-oh..

The reason for not being able to mount the mailbox databases was due to the fact that quorum was lost. I saw the following error when opening up the Microsoft Failover Cluster Manager:

The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk.
Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.

The strange thing was, that the fileserver running the witness share was fine and reachable.
Because the offline Exchange server could not be brought online in a matter of minutes, I had to override the quorum safety and bring the Cluster Service back online using the ForceQuorum command:

net start clussvc /fq

I got this command from the following Microsoft TechNet Article: http://technet.microsoft.com/en-us/library/cc770620(v=ws.10).aspx

After running the command, the cluster was back online and mailbox databases were abled to be mounted again. Just before maintenance was completed on the Exchange server and before booting it up again, I disabled the Cluster Service on the secondary server because of the fact that this server was running in ForceQuorum state. This to prevent data loss or corruption.

When the server was booted up again, I started the Cluster Services on both servers and everything returned back to normal.

The reason for the lost quorum is probably due to the fact that the Cluster Service is configured with “Node Majority”, which isn’t a setting you want with 2 nodes =)
Tomorrow we will investigate if the “Node and File Share Majority” is a better choice, which probably is due to the fact that we are using a file server share as witness.

Merging a 140GB Hyper-V 2008 R2 snapshot

Last week I was notified that one of the production LUNs of a customer using Hyper-V 2008 R2 was filling up and the reason for this was a ‘deleted’ snapshot of a production system.

Deleting snapshots in Hyper-V 2008 R2 requires a shutdown of the VM in order to completely remove the snapshot (AVHD file) on your storage system. Just removing the snapshot/checkpoint using the Virtual Machine Manager is not sufficient. The AVHD file will still exist and keeps growing until you shut the VM down. This is a feature according to Microsoft.

This growing had been going on for a few weeks and the AVHD file has reached a size of 140GB. We made a rough estimation that the storage system would support a minimum of 15 MB/s throughput and with the size we had to process, this would’ve taken 2 to 3 hours. That meant 2 to 3 hours downtime for this particular VM.

Some people on the net were arguing whether extra space was required on the Cluster Shared Volume to merge the snapshot. This is not true.

Just to be sure, I created a backup of the VM just before starting the merge / shutting down the VM. After office hours I shut down the VM and kept an eye on the merge progress using the following PowerShell command:

Get-WmiObject -Namespace "rootvirtualization" -Query "select * from Msvm_ConcreteJob" | Where {$_.ElementName -eq 'Merge in Progress'}

The merge started within 5 minutes after shutting the VM down and within 15 minutes it reached about 5 percent. In just 90 minutes the merge was completed and the VM was booted back up to restore functionality!

So, snapshotting in Hyper-V 2008 R2 is still shit. It still requires downtime but not as much as calculated. This ‘feature’ is removed in Hyper-V 2012 and will automatically clean up after itself 🙂

Buggy DNS resolution using Microsoft ForeFront TMG 2010

I was experiencing very weird DNS issues with a Windows Server 2008 R2 machine.
While resolving external domain names, it would sometimes come back with a response and some times with a timeout.

I tested this using nslookup and using the server parameter to point to the Google public DNS server. I am trying to resolve http://www.microsoft.com

nslookup
server 8.8.8.8
http://www.microsoft.com

> http://www.microsoft.com
Server: google-public-dns-a.google.com
Address: 8.8.8.8

DNS request timed out.
timeout was 2 seconds.
*** Request to google-public-dns-a.google.com timed-out
> http://www.microsoft.com
Server: google-public-dns-a.google.com
Address: 8.8.8.8

DNS request timed out.
timeout was 2 seconds.
DNS request timed out.
timeout was 2 seconds.
*** Request to google-public-dns-a.google.com timed-out
> http://www.microsoft.com
Server: google-public-dns-a.google.com
Address: 8.8.8.8

DNS request timed out.
timeout was 2 seconds.
Non-authoritative answer:
Name: lb1.www.ms.akadns.net
Address: 65.55.57.27
Aliases: http://www.microsoft.com
toggle.www.ms.akadns.net
g.www.ms.akadns.net

As you can see, 1 out of 4 requests succeeded. Something was corrupting my DNS query.

In this scenario, Microsoft ForeFront Threat Management Gateway 2010 (TMG 2010) was used.
The client, in this case a DNS server, was placed in the internal network and was NAT’d thru the external interface of the TMG, which was an interface with public IP addresses.

Somehow, the query was not arriving at the external DNS server.
Testing the same queries directly from the TMG, no issues were active.

It had to do with the internal-external NAT translation and specific for DNS traffic, because HTTP/S traffic was working without any trouble.

After some investigation NIS (Network Inspection System, part of the Intrustion Prevention System) was doing something with the queries. In our case NIS was dropping these queries.
We added our DNS server to the NIS exclusion list and the resolution issue was gone!

Since we are yet preparing to implement an alternative to TMG we didn’t see the urge to research this issue further.

Hopefully this will help some people resolve DNS issues with their clients behind TMG.

We will add NIS exclusions to all of our internal DNS servers to prevent DNS issues to arrise in the future.

Windows Server 2012 DFS not replicating all files

In my testlab, some files were not replicating between two Windows Server 2012 fileservers with the DFS Namespace and DFS Replication role installed.

This was caused by files with the temp attribute which can be done by some applications or when you download files from the internet.
You can check if this is the case by using the PowerShell command below and use your own path name.

Get-ChildItem "C:FolderX" -Recurse | ForEach-Object -process {if(($_.attributes -band 0x100) -eq 0x100) {write-output $_}}

The command will show you all files with the attribute, per folder.

Next, if you want to change these files and remove the attribute, use the command below.

Get-ChildItem "C:FolderX" -Recurse | ForEach-Object -process {if (($_.attributes -band 0x100) -eq 0x100) {$_.attributes = ($_.attributes -band 0xFEFF)}}

After modifying my files, replication kicked in within the second and all files were replicated.

Exchange 2013 Management Shell from Windows 8

I’m testing out the new functionalities of Exchange 2013 in my testlab and get familiar with the product as we are probably going to use this in production.

While testing, I was wondering if it would be possible to manage Exchange 2013 remotely from my Windows 8 client.
Ofcourse you can use the ECP (Exchange Control Panel) but managing your environment with Powershell is something ‘more compliant’ with the management ways Microsoft sees it (And it’s cooler).

Installing the Exchange Management Shell on Windows 8 is not going to work (unless you are in the same AD domain as the Exchange server, correct me if I’m wrong).

So here I am, wanting to remotely manage Exchange 2013 and having a Windows 8 client in a workgroup.

Back in the office, my Exchange-guru colleague sees me stumble and mumble and quietly sent me a mail message with a script included. He created it on the fly and was hoping my cranky face would turn into a happy face =)
He succeeded! And that only by using 11 lines of code.. What a boss!

With all credits going to my colleague Jens Giessler, I am posting his created script on my blog, hoping other people’s faces will turn into happy ones.
Replace the bold parts with your own credentials and Exchange server FQDN, and save as a .ps1 file to run it with PowerShell.

Before running the script, you also need to enable Basic Authentication on the PowerShell virtual directory, using the ECP (Servers menu, virtual directories tab).

Oh; as with all my previous and future script postings; use them at your own risk.

#Functions
Function Query-Credentials
{
$Global:Cred = Get-Credential -Credential <b>domainuser</b>
}

Function Connect-Exchange
{
$Session = New-PSSession -ConfigurationName Microsoft.Exchange -ConnectionUri https://exchangeserverfqdn/powershell/ -Credential $Cred -Authentication Basic -AllowRedirection
Import-PSSession $Session
}

#Establish connection
Query-Credentials
Connect-Exchange

Corrupt Forefront TMG disk cache

While examining the event logs of one of our Forefront TMG servers, I noticed an error stating that the disk cache failed to initialize.

Event ID: 14176
Type: Error
Source: Microsoft Web Proxy
Description:
Disk cache Drive:urlcacheDir1.cdat failed to initialize. Some errors were encountered when ISA Server restored specific data cache files. ISA Server will now attempt to recover these files. These errors may have occurred because there was not enough time to complete all necessary shutdown operations, when ISA Server was previously shut down. To avoid this in the future, you can increase the value of the HKEY_LOCAL_MACHINESystemCurrentControlSetControlWaitToKillServiceTimeout registry key. Identify the reason for cache failure by examining previous recorded events, or the error code. The error code in the Data area of the event properties indicates the cause of the failure (internal code: 503.6333.3.0.1200.166).

No functionality was lost, but the error caught my attention and I found a Microsoft KB that described this error:

http://support.microsoft.com/?scid=kb;en-us;887311

In my case, McAfee Antivirus was active and as described by the KB, you should exclude the disk cache directory within your virus scanner. I already had an exclusion for the on-access scanner but no exclusion was yet active for the on-demand scan. The time that this error occurred, was about 30 minutes after the on-demand scan was executed.

I just added the exclusion for the on-demand scan and hopefully this will prevent the error from appearing.

Windows Server 2012 template for VMware

Some months ago, I created a Windows Server 2012 template for testing purposes and logged all of my actions to a notepad file.

I think this file may help some people with creating a clean template for Windows Server 2012 on VMware. Most of the steps can be used in a Hyper-V environment as well. If you have any comments I’m happy to hear them so I can improve the template. Have fun using it!

VM Configuration
– VMXNET3 Network Adapter
– Installing using en_windows_server_2012_x64_dvd_915478.iso
– 4 vCPU (2 sockets, 2 cores)
– 2GB RAM
– 60GB HDD (Thin)
– EFI BIOS
– Enable VMware Tools Time Synchronization
– Advanced Configuration Parameter 1: Isolation.tools.copy.disable false
– Advanced Configuration Parameter 2: Isolation.tools.paste.disable false
– Remove unused hardware (Floppy, USB etc)

BIOS Configuration
– Disable Serial Ports
– Change Boot order 1. CD 2. HDD, leave the rest default
– Disable floppy drive

OS Configuration
– Windows Server Core 2012 (No GUI)
– Language: English, Time and currency format: Dutch (Netherlands), Keyboard or input method: United States-International
– Enter license key

Install GUI for temporary configuration (Using ISO)
– Get-WindowsImage -ImagePath D:Sourcesinstall.wim
– Mkdir C:MountDir
– Dism /mount-wim /WimFile:D:Sourcesinstall.wim /Index:2 /MountDir:C:MountDir /readonly
– Install-WindowsFeature Server-Gui-Mgmt-Infra,Server-Gui-Shell -Restart -Source C:MountDirWindowsWinSXS

Install VMware Tools
– Use Guest>Install VMware Tools function and perform setup inside guest (Typical setup)
– Replace video driver by the VMware wddm_video driver
– Shutdown the VM and enable Accelerate 3D graphics
– Start the VM and verify video performance inside the vSphere Console

Windows Specific Settings
– Disable automatic updates
– Disable Windows Firewall (All profiles)
– Set Power Plan to High Performance
– Disable IE ESC for Administrators
– Verify Date/Time and Timezone
– Check for Windows Updates
– Enable Remote Desktop
– Disable VMware Update Notifications
– Enable Windows SmartScreen (Admin approval)
– Small Memory Dump

Customization Profile
– Yet to configure

Steps to perform after deploying the template
– Configure IP Settings
– Check for Windows Updates
– Configure hostname and description
– Install Roles, Features and Applications