I’ve been doing a little research on this topic lately. Sure, it was from the Hyper-V perspective but watching a video on VMware’s site about SRM confirmed they facts were similar with ESX. If you want to create a multi-site cluster for DR/business continuity then you have two big and expensive considerations:
Storage
The replication will probably be performed by your storage system. These SANs do not come cheap. The one that looks most interesting to me is the HP LeftHand. I can’t see any features on the Dell site. Dell does suck when it comes to educating the market about their products – they provide private briefings only. Strangely higher end systems like EVA/EMC don’t have CSV (and probably VMFS) support yet in this scenario because the LUN can only be active in one site at one time (preventing granular VM migration across the WAN). NetApp appear to use a snapshot feature for replication and this could complicate a multi-site cluster design IMO. Admittedly, I have failed to follow up with opportunities to hook up with them to learn more about their system – apologies if I have things wrong there.
The WAN
The good news is that you will get a very nice lunch/dinner/weekend away from your WAN service provider because of the need to have huge bandwidth. From the Hyper-V perspective, you need 2*1GBPS lines between the primary and secondary site for Live Migration between the sites. You may also need less than 2MS latency on the line for the storage synchronous replication that can support this. You can do Quick Migration (it’s still there luckily) for DR invocation across 100MBPS lines. Quick migration is fine for that emergency scenario. It’s not ideal but this is a bandwidth thing – VM memory needs to transfer quickly.
Software Solutions
A number of them are out there. Some are point solutions for creating failover clusters (using Windows feature) between two hosts in different sites (Steeleye). Some simulate the processes and controls of a Windows Failover Cluster without using the Windows feature (DoubleTake). I’ve seen one solution (can’t remember product name) that installs a service on servers with disk and creates an iSCSI SAN with features similar to that of a HP LeftHand.
Advice
Get your accountants ready to sign some big cheques. No matter what you do, you’re going to need to put in some big bandwidth and that’s going to be a big recurring cost. The benefit is simple: a single fault tolerant solution for disaster recovery that will work when the company is under the stress of a disaster.
The specifics of your design will be totally dependent on the hardware and software you use. Make sure you work with a vendor who really knows this stuff. Look for references. Don’t just use Honest Bob’s PC Sales because the IT manager is having it off with Bob (I’ve seen that one happen and it ended badly).
The Starwind software provides an active-active high availability iSCSI target for Windows.