Speaker: Elden Christensen, Microsoft – owner of the failover clustering and network load balancing features.
One of the primary reasons that DR invocations plan fail is the dependence on people. This was the result of a study after Hurricane Katrina in New Orleans. In the event of a disaster people focused on their personal priorities, not on their DR plan actions.
Network Options To Stretch Cluster:
- Stretch the VLAN: 2003 did this
- Dissimilar subnets: 2008 introduced this support.
Longer distance = latency. Windows 2008 allows you to tune the heartbeat time out. Out of the box <500ms is fine but you can tune this. This can be tunes differently for nodes on the same or different subnets within the one cluster.
Windows 2008 inter-node communication can be encrypted for cross WAN clusters.
Client reconnect reconsiderations:
- If the nodes are in different subnets then there are DNS timing issues to consider. A records get cached on DNS and on the client. If there’s a failover then what happens? The client/DNS have cached the old record and clients fail to connect until their purge/timeout the DNS cache. Also consider DNS AD replication between sites.
- Configure a smaller TTL for the record but you need to find the right balance between too frequent and infrequent lookups.
- RegisterAllProcidersIP and/or HostRecordTTL
Alternative 1: Advanced Planning:
Have a local failover in Site A and in Site B. Configure the cluster to failover to a local node first, e.g. a local hardware issue rather than a site failure. If site failure then fail over to site B. This is OK if the DR plan allows for non-instant failover.
Alternative 2: Otherwise stretch the VLAN.
The IP of the clustered resource never changes.
Alternative 3: Abstraction Device
For example, Cisco has a device to abstract and IP address to reroute it as required to the correct server in the correct site.
Storage
You need to have two copies of the data. Single site allows for single copy storage. But that’s not going to fly for DR. You need to replicate the data between site A and B. MS relies on the vendors/partners, e.g. HP LeftHand, HP EVA Controller, HP XP Controller, Compellent, DoubleTake, SteelEye. There is also application stuff such as Exchange CCR.
Synchronous or Asynchronous can both be used – it depends on your application. Synchronous commits data to both sets of storage and then responds to the application to confirm the write. Asynchronous writes to one set of storage and then replicates it to the other site. Obviously the latter is good in limited bandwidth scenarios. There is a potential for data loss. It stretches over great distances and has no impact on application performance.
But the former guarantees no data loss but requires more bandwidth between sites. Latency is an issue so the stretch is a short distance (<100KM) and has an impact on application performance with greater latencies.
The storage partner writes DLL’s that integrate into clustering so it ensures consistency of storage ownership/failover during a failover of the clustered resource.
The validation tool is not written for these replicated storage solutions and will fail. This is acknowledged by MS and is documented online.
HP StorageWorks Representative
The speaker is talking about the HP story, CLX for Windows. CLX = Cluster Extension Resource. This is for EVA and XP SAN. There is support now for Hyper-V Live Migration in the new release. This adds W2008 R2 and Hyper-V Server 2008 R2 support. This Live Migration support indicates the speed of failover. EVA support in a month, XP next year. Apparently this does not support CSV at the moment due to the controllers role in the replication process. This costs around €3000 per cluster node so you better be serious about DR – and this doesn’t include SAN replication licensing.
We get a video of this demo based on W2008 R2 Hyper-V live migration on a pair of replicated EVA 4000 SAN’s. We saw 3 failed pings on the grainy video but the HP guy claims they were retransmits, not dropped packets. I’m not convinced that HP have real Live Migration between sites but 2-3 missed pings between sites for DR is pretty good. You have duplicate copies of data in 2 sites in case of a disaster.
Quorum Overview
It’s all about getting a vote majority to decide who owns a resource.
- Disk only: The quorum disk (who ever is the owner of it) decides. Even number of nodes.
- Node and disk majority: Disk owner breaks the vote
- Node majority: No witness disk
- File Share Witness: Instead of the disk
Replicated Witness Disk in DR:
Not to be used unless recommended by the storage vendor Normally not used in replicated storage clustering because it is really 2 disks, one in each site. MS not a fan of it.
Which to use?
- Node Majority: Odd numbers of hosts. The majority of nodes will be in the primary data centre, e.g. 3 nodes in site A and 2 nodes in site B. If there’s a break in comms between sites A and B then the nodes will vote. If node 1 in site A (3 nodes) votes it can talk to itself (1 vote) and the other 2 (now 3 votes). In site B node 4 can talk to itself and it’s neighbour (2 votes). Therefore the resource stays in site A. But in DR if site A burns down then you need to manually override because site B cannot win a vote. This is called Forcing Quorum.
- Multi-Site with File Share Witness: This is normally the best one to use. Place a file (SMB) share in a third (witness) site. There’s nothing special about the share other than a single text file. This allows even node numbers where the file share is the vote breaker. If site A fails then site B can see the file share in the witness site. Site B initiates a failover automatically. But what it site A and site B can both see the witness site but not each other? Seems there’s a solution with the file share but the speaker doesn’t say … I guess it’s something in that text file. This comes up in QA. The node that owns the file in the file share is healthy then it becomes the vote breaker.
Workloads
Hyper-V: If you use DHCP then you can use different VLAN’s. If your VM’s use static IP then stretch the VLAN(s). Live Migration really requires stretched VLAN’s because otherwise the IP must change in the VM and that requires a TCP outage.
CSV: Requires a single VLAN between nodes. CAV assumes all nodes can concurrently access the LUN. SAN replication assumes that only one array has the replicated LUN active at a time. CSV is not a requirement for Live Migration. MS says you should talk to your storage vendor for support statements. The whole scenario depends on how the storage is replicated by the vendor.
SQL: Missed this because it was very quick.
Exchange 2007: It has CCR so you don’t need storage level replication. Change the TTL to 5 minutes. File share witness should be on the hub transport server in the primary site. Exchange 2010 is probably very different because of the possibility of using a DAG.
Q&A
DFS-R: Can you use this for multi-site clustering? Yes and No. DFS-R is supported on 2008 R2 clusters but you cannot use it as the replication mechanism because it only replicates at file level and file close.
Does the HP CLX support CSV? Not in this release. They are working with MS to get this working. HP LeftHand will do this. Compellent does this too – I think Lakeland Dairies (Irish company) are using their solution for inter-building DR for Hyper-V on their “campus”. I believe there’s a whitepaper
on it somewhere on the MS site. I did find this video.