Managing SharePoint 2010 using System Center

I’ve tuned into a webcast aimed at the System Center Influencers and I’m going to try blog from it live.  Microsoft’s line is that System Center is the way to manage SharePoint because Microsoft understands the requirements.

SharePoint often started as some ad-hoc solution but grew from there to be mission critical and containing urgent business data.  Administration is complex: users, file server admins, web admins, database admins and web developers.

System Center Improves Availability:

  • DPM backs it up the way it should be.
  • Operations Manager monitors health and performance.
  • Virtualisation (VMM managed) can allow for rapid deployment with minimal footprint.

Administration

  • Configuration automates management
  • Service Desk will add more benefits

Centralised Management

This is the norm for System Center.  Centralised management with delegation is how System Center works.  For example, a Sharepoint administrator could deploy a front end server in minutes using the VMM 2008 R2 self service portal.  A quota will control sprawl but the network administrators don’t need to be as involved.

OpsMgr Management Pack

  • There is a new monitoring architecture.  There are physical and logical components where the physical entity rolls up to a logical entity.
  • Monitoring is integrated into SharePoint so the SharePoint admins can see the health in SharePoint
  • There will be a unified management pack instead of the current 2007 split management packs.  The discovery process will identify the roles installed on an agent machine and only utilise the required components.

We’re shown an OpsMgr diagram that shows the architecture of a SharePoint deployment.  If you haven’t seen these, they are hierarchical diagrams that give you a visualisation of some system, e.g. HP Blade farm, Hyper-V cluster, SharePoint farm.

The 2010 management pack allows you to monitor a particular web application in SharePoint 2010.  The management pack is more aware of what components are deployed where and the interdependencies – sorry I’m not a SharePoint guru so I’m missing some of the terminology here.

Rules administration has been simplified.  There is a view in the Monitoring pane to view the health of all rules for the SharePoint 2010 management pack.  I like this.  I’ve not seen it in any other management pack.  The SQL guys should have coffee with the SharePoint folks 🙂

Three are 300% more discoveries and 1293% more classes and 300% more monitors than in 2007.  That is a huge increase in automated knowledge being built into OpsMgr to look after SharePoint 2010.  There are 45% fewer rules.  This is a good thing because there is duplicated effort being reduced for IIS and SQL management pack to reduce noise.  Microsoft assumes you’ll install those other management packs.  approximately 150 TechNet articles are linked in the pack to guide you to fixing certain detected issues.

Data Protection Manager 2010

DPM 2010 is due out around April 2010.  It important to Hyper-V admins because it adds support for CSV.  DPM allows you to backup to disk and then optionally stream to tape.  You can also replicate one DPM server to another for

SharePoint 2003 and WSS 2.0 are backed up basically as SQL.  You need the native SP tool to complete the backup..

SharePoint 2007 and WSS 3.0 is backed up using a SharePoint VSS writer.  Every server (web/content/config/index) gets an agent.  DPM reaches out to “the farm” and can back up everything required.

DPM is designed to know what to back up.  3rd party solutions are generic and don’t have that.  For example, a new server in the farm will be detected.  The DPM administrator needs to authorise this addition.

DPM 2010 does something similar with SharePoint 2010.  However, it is completely automated, allowing your delegated VMM administrators or Configuration Manager administrators (SharePoint administrators) to deploy VM’s or physical machines.

One of the cool things about DPM is that it doesn’t have specialised agents.  It’s using VSS writers.  That means there is 1 agent for all types of protected servers.

We get a demo now and we see the DPM administrator can just select “the farm” and back that up.  There’s no selecting of components or roles.  The speaker only sets up his destination and retention policies.

DPM 2007 is noisy, e.g. data consistency checks.  I’ve seen this when I did some lab work.  The job wizard allows you to either to perform a heal/check if a problem is found, on a scheduled basis or not at all.  This is a self healing feature.

Recoveries can be done at the farm level, an individual content (SQL) database.  SharePoint 2007 can restore a site collection, a site or a document.  This requires a recovery farm, i.e. a server, consuming resources and increasing costs.  SharePoint 2010 with DPM 2010 does not require a recovery farm.  You can directly recover an item into the production farm.  Trust me, that’s huge.

The release candidate for DPM 2010 comes out next week.

Virtualisation

  • Web role, Render Content: Virtualisation ideal
  • Query role, Process Search, Queries: Virtualisation Ideal
  • Application Role, Excel Forms Services: Virtualisation ideal
  • Index role, Crawl Index: Consider virtualisation– small amount of crawling, and drive space used to store the index (VHD = maximum 2TB, although you can go to pass through disks for more).
  • Database role: Consider virtualisation – OK for smaller farms.

My Take

My advice on top of this: Monitor everything using VMM and Operations Manager.  You soon see if something is a candidate for virtualisation or if a VM needs to be migrated to physical.

If you run everything on a Hyper-V 2008 R2 cluster then enable PRO in VMM.  Any performance issues will allow an automatic Live Migration (if you allow it) to avoid performance bottlenecks.

If you are going physical for the production environment then consider virtual for the DR site if reduced capacity is OK.  For example, your production site is backed up with DPM.  You keep a Hyper-V farm in the DR site.  Your DPM server replicates to a DR site DPM server.  During a DR you can do a restoration.  Will it work?  Who knows :)  It’s something you can test pretty cheaply with Hyper-V Server 2008 R2.  Money is tight everywhere and this might be an option.

System Center Influencers Blog Feed

I’m lucky enough to be a part of Microsoft’s System Center Influencers group.  VMM is one of my core things.  I’m a user of OpsMgr and I’ll blog about what I learn and do with that too.  I used to be a ConfigMgr MVP but I’ve fallen out of step with it because of changes in work but I like to stay in touch.  We don’t use DPM as a core product but Hyper-V keeps it interesting for me.  And that’s just the start of System Center!

The folks behind System Center Influencers have a blog feed gathering content from the members.  You can see our blog posts in one central point.  Check it out.

Rough Guide To Setting Up A Hyper-V Cluster

EDIT: 18 months after I wrote it, this post continues to be one of my most popular.  Here is some extra reading for you if this topic is of interest:

A lot of people who will be doing this have never set up a cluster before.  They know of clusters from stories dating back from the NT4 Wolfpack, Windows Server 2000 and Windows Server 2003 days when consultants made a fortune from making things like Exchange and SQL on 5 days per cluster projects.

Hyper-V is getting more and more widespread.  And that means setting up highly available virtual machines (HAVM) on a Hyper-V cluster will become more and more common.  This is like Active Directory.  Yes, it can be a simple process.  But you have to get it right from the very start or you have to rebuild from scratch.

So what I want to do here is walk through what you need to do in a basic deployment for a Windows Server 2008 R2 Hyper-V cluster running a single Cluster Shared Volume (CSV) and Live Migration.  There won’t be screenshots – I have a single laptop I can run Hyper-V on and I don’t think work would be too happy with me rebuilding a production cluster for the sake of blog post screenshots 🙂  This will be rough and ready but it should help.

Microsoft’s official step by step guide is here.  It covers a lot more detail but it misses out on some things, like “how many NIC’s do I need for a Hyper-V cluster?”, “how do I set up networking in a Hyper-V cluster?”, etc.  Have a read of it as well to make sure you have covered everything.

P2V Project Planning

Are you planning to convert physical machines to virtual machines using Virtual Machine Manager 2008 R2?  If so and you are using VMM 2008 R2 and Operations Manager 2007 (R2), deploy them now (yes, before the Hyper-V cluster!) and start collecting information about your server network.  There are reports in there to help you identify what can be converted and what your host requirements will be.   You can also use the free MAP toolkit for Hyper-V to do this.  If your physical machine uses 50% of a quad core Xeon then the same VM will use 50% of the same quad core Xeon in a Hyper-V host (actually, probably a tiny bit more to be safe).

Buy The Hardware

This is the most critical part.  The requirements for Hyper-V are simple:

  • Size your RAM.  Remember that a VM has a RAM overhead of up to 32MB for the first GB of RAM and up to 8MB for each additional GB of RAM in that VM.
  • Size the host machine’s “internal” disk for the parent partition or host operating system.  See the Windows Server 2008 R2 requirements for that.
  • The CPU(s) should be x64 and feature assisted virtualisation.  All of the CPU’s in the cluster should be from the same manufacturer.  Ideally they should all be the same spec but things happen over time as new hardware becomes available and you’re expanding a cluster.  There’s a tick box for disabling advanced features in a virtual machine’s CPU to take care of that during a VM migration.
  • It should be possible to enable Data Execution Prevention (DEP) in the BIOS and it should work.  Make that one a condition of sale for the hardware.  DEP is required to prevent break out attacks in the hypervisor.  Microsoft took security very, very seriously when it came to Hyper-V.
  • The servers should be certified for Windows Server 2008 R2.
  • You should have shared storage that you will connect to the servers using iSCSI or Fibre Channel.  Make sure the vendor certifies it for Windows Server 2008 R2.  It is on this shared storage (a SAN of some kind) that you will store your virtual machines.  Size it according to your VM’s storage requirements.  If a VM has 2GB of RAM and 100GB of disk then size the SAN to be 102GB plus some space for ISO images (up to 5GB) and some free space for a healthy volume.
  • The servers will be clustered.  That means you should have a private network for the cluster heartbeat.  A second NIC is required in the servers for that.
  • The servers will need to connect to the shared storage.  That means either a fibre channel HBA or a NIC suitable for iSCSI.  The faster the better.  You may go with 2 instead of 1 to allow MPIO in the parent partition.  That allows storage path failover for each physical server.
  • Microsoft recommends a 4th NIC to create another private physical network between the hosts.  It would be used for Live Migration.  See my next page link for more information.  I personally don’t have this in our cluster and have not had any problems.  This is supported AFAIK.
  • Your servers will have virtual machines that require network access.  That requires at least a third NIC in the physical servers.  A virtual switch will be created in Hyper-V and that connects the virtual machines to the physical network.  You may add a 4th NIC for NIC teaming.  You may add many NIC’s here to deal with network traffic.  I’ve talked a good bit about this, including this post.  Just search my blog for more.
  • Try get the servers to be identical.  And make sure everything has Windows Server 2008 R2 support and support for failover clustering.
  • You can have up to 16 servers in your cluster.  Allow for either N+1 or N+2.  The latter is ideal, i.e. there will be capacity for two hosts to be offline and everything is still running.  Why 2?  (a) stuff happens in large clusters and Murphy is never far away.  (b) if a Windows 8 migration is similar to a Windows Server 2008 R2 migration then you’ll thank me later – it involved taking a host from the old cluster and rebuilding it to be a host in a new cluster with the new OS.  N+1 clusters lost their capacity for failover during the migration unless new hardware was purchased.
  • Remember that a Hyper-V host can scale out to 64 logical processors (cores in the host) and 1TB RAM.

The Operating System

This one will be quick.  Remember that the Web and Standard editions don’t support failover clustering.

  • Hyper-V Server 2008 R2 is free, is based on the Core installation type and adds Failover Clustering for the first time in the free edition.  It also has support for CSV and Live Migration.  It does not give you any free licensing for VM’s.  I’d only use it for VDI, Linux VM’s or for very small deployments.
  • Windows Server 2008 R2 Enterprise Edition supports 8 CPU sockets and 2TB RAM.  What’s really cool is that you get 4 free Windows Server licenses to run on VM’s on the licensed host.  A host with 1 Enterprise license effectively gets 4 free VM’s.  You can over license a host too: 2 Enterprise licenses = 8 free VM’s.  These licenses are not transferable to other hosts, i.e. license 1 host and run the VM’s on another host.
  • Windows Server 2008 R2 DataCenter Edition allows you to reach the maximum scalability of Hyper-V, i.e. 64 logical processors (cores in the host) and 1TB RAM.  DataCenter edition as a normal OS has greater capacities than this; don’t be fooled into thinking Hyper-V can reach those.  It cannot do that despite what some people are claiming is supported.

All hosts in the cluster should be running the same operating system and the same installation type.  That means all hosts will be either Server Core or full installations.  I’ve talked about Core before.  Microsoft recommends it because of the smaller footprint and less patching.  I recommend a full installation because the savings are a few MB of RAM and a few GB of disk.  You may have fewer patches with Core but they are probably still every month.  You’ll also find it’s harder to repair a Core installation and 3rd party hardware management doesn’t have support for it.

Install The Hardware

First thing’s first, get the hardware installed.  If you’re unsure of anything then get the vendor to install it.  You should be buying from a vetted vendor with cluster experience.  Ideally they’ll also be a reputed seller of enterprise hardware, not just honest Bob who has a shop over the butchers.  Hardware for this stuff can be fiddly.  Firmwares across the entire hardware set all have to be matching and compatible.  Having someone who knows this stuff rather than searches the Net for it makes a big difference.  You’d be amazed by the odd things that can happen if this isn’t right.

As the network stuff is being done, get the network admins to check switch ports for trouble.  Ideally you’ll use cable testers to test any network cables being used.  Yes, I am being fussy but little things cause big problems.

Install The Operating Systems

Make sure they are all identical.  An installation that is done using using an answer file helps there.  Now you should identify which physical NIC maps to which Local Area Connection in Windows.  Take care of any vendor specific NIC teaming – find out exactly what your vendor prescribes for Hyper-V.  Microsoft has no guidance on this because teaming is a function of the hardware vendor.  Rename each Local Area Connection to it’s role, e.g.

  • Parent
  • Cluster
  • Virtual 1

What you’ll have will depend on how many NIC’s you have and what roles you assigned to them.  Disable everything except for the first NIC.  That’s the one you’ll use for the parent partition.  Don’t disable the iSCSI ones.

Patch the hosts for security fixes.  Configure the TCP/IP  for the parent partition NIC.  Join the machines to the domain.  I strongly recommend setting up the constrained delegation for ISO file sharing over the network.

Do whatever antivirus you need to.  Remember you’ll need to disable scanning of any files related to Hyper-V.  I personally advise against putting AV on a Hyper-V host because of the risks associated with this.  Search my blog for more.  Be very sure that the AV vendor supports scanning files on a CSV.  And even if they do, there’s no need to be scanning that CSV.  Disable it.

Enable the Cluster NIC for the private heartbeat network.  This will either be a cross over cable between 2 hosts in a 2 host cluster or a private VLAN on the switch dedicated just to these servers and this task.  Configure TCP/IP on this NIC on all servers with an IP range that is not routed on your production network.  For example, if your network is 172.168.1.0/16 then use 192.168.1.0/24 for the heartbeat network.  Ping test everything to make sure every server can see every other server.

If you have a Live Migratoin NIC (labelled badly as CSV in my examples diagrams) then set it up similarly to the Cluster NIC.  It will have it’s own VLAN and it’s own IP range, e.g. 192.168.2.0/24.

Enable the Virtual NIC.  Unbind every protocol you can from it, e.g. if using NIC teaming you won’t unbind that.  This NIC will not have a TCP configuration so IPv4 and IPv6 must be unbound.  You’re also doing this for security and simplicity reasons.

Here’s what we have now:

image

Once you have reached here with all the hosts we’re ready for the next step.

Install Failover Clustering

You’ll need to figure out how your cluster will gain a quorum, i.e. be able to make decisions about failover and whether it is operational or not.  This is to do with host failure and how the remaining hosts vote.  It’s done in 2 basic ways.  There are actually 4 ways but it breaks down to 2 ways for most companies and installations:

  1. Node majority: This is used when there are an odd number of hosts in the cluster, e.g. 5 hosts, not 4.  The hosts can vote and there will always be a majority winner, e.g. 3 to 2.
  2. Node majority + Disk: This is used when there are an even number of hosts, e.g. 16.  It’s possible there would be an 8 to 8 vote with no majority winner.  The disk acts as a tie breaker.

Depending on who you talk to or what GUI in Windows you see, this disk is referred to either as a Witness Disk or a Quorum Disk.  I recommend creating it in a cluster no matter what.  Your cluster may grow or shrink to an uneven number of hosts and may need it.  You can quickly change the quorum configuration based on the advice in the Failover Clustering administration MMC console.

The disk only needs to be 500MB size.  Create it on the SAN and connect the disk to all of your hosts.  Log into a host and format the disk with NTFS.  Label it with a good name like Witness Disk.

I’m ignoring the other 2 methods because they’ll only be relevant in stretch clusters than span a WAN link and I am not talking about that here.

Use Server Manager to install the role on all hosts.  Now you can set up the cluster.  The wizard is easy enough.  You’ll need a computer name/DNS name for your cluster and an IP address for it.  This is on the same VLAN as the Parent NIC in the hosts.  You’ll add in all of the hosts.  Part of this process does a check on your hardware, operating system and configuration.  If this passes then you have a supported cluster.  Save the results as a web archive file (.MHT).  The cluster creation will include the quorum configuration.  If you have an even number of hosts then go with the + Disk option and select the witness disk you just created.  Once it’s done your cluster is built.  It is not hard and only takes about 5 to 10 minutes.  Use the Failover Clustering MMC to check the health of everything.  Pay attention to the networks.  Stray networks may appear if you didn’t unbind IPv4 or IPv6 from the virtual network NIC in the hosts.

If you went with Node Majority then here’s my tip.  Go ahead and launch the Failover Clustering MMC.  Add in the storage for the witness disk.  Label it with the same name you used for the NTFS volume.  Now leave it there should you ever need to change the quorum configuration.  A change is no more than 2 or 3 mouse clicks away.

Now you have:

image

Install Hyper-V

Enable the Hyper-V role on each of your hosts, one at a time.  Make sure the logs are clean after the reboot.  Don’t go experimenting yet; Please!

Cluster Shared Volume

CSV is seriously cool.  Most installations will have most, if not all, VM’s stored on a CSV.  CSV is only supported for Hyper-V and not for anything else as you will be warned by Microsoft.

Set up your LUN on the physical storage for storing your VM’s.  This will be your CSV.  Connect the LUN to your hosts.  Format the LUN with NTFS.  Set it to use GPT so it can grow beyond 2TB.  Label it with a good name, e.g. CSV1.  You can have more than 1 CSV in a cluster.  In fact, a VM can have its VHD files on more than one CSV.  Some are doing this to attempt to maximise performance.  I’m not sold that will improve performance but you can test it for yourself and do what you want here.

DO NOT BE TEMPTED TO DEPLOY A VM ON THIS DISK YET.  You’ll lose it after the next step.

Use the Failover Clustering MMC to add the disk in.  Label it in Failover Clustering using the same name you used when you formatted the NTFS volume.  Now configure the the CSV.  When you’re done you’ll find the disk has no drive letter.  In fact, it’ll be “gone” from the Windows hosts.  It’ll actually be mounted as a folder on the C: drive of all of your hosts in the cluster, e.g. C:ClusterStorageVolume1.  This can be confusing at first.  It’s enough to know that all hosts will have access to this volume and that your VM’s are not really in your C: drive.  They are really on the SAN.  C:ClusterStorageVolume1 is just a mount point to a letterless drive.

Now we have this:

image

Virtual Networking

Hopefully you have read the previously linked blog post about networking in Hyper-V.  You should be fully educated about what’s going on here.

Here’s the critical things to know:

  • You really shouldn’t put private or internal virtual networks on a Hyper-V cluster when using more than one VM on those virtual networks.  Why?  A private or internal virtual network on host A cannot talk with a private or internal network on host B.  If you set up VM1 and VM2 on such a virtual network on host A what happens when one of those VM’s is moved to another host?  It will not be able to talk to the other VM.
  • If you create a virtual network on one host then you need to create it on all hosts.  You also must use identical names across all hosts.  So, if I create External Network 1 on host 1 then I must create it on host 2.

Create your virtual network(s) and bind them to your NIC’s.  In my case, I’m binding External Network 1 to the NIC we called Virtual 1.  That gives me this:

image

All of my VM’s will connect to External Network 1.  An identically named external virtual network exists on all hosts.  The physical Cluster 1 NIC is switched identically on all servers on the physical network.  That means if VM1 moves from host 1 to host 2 it will be able to reconnect to the virtual network (because of the identical name) and be able to reach the same places on the physical network.  What I said for virtual network names also applies to tags and VLAN ID’s if you use them.

Get Busy!

Believe it or not, you have just built a Hyper-V cluster.  Go ahead and build your VM’s.  Use the Failover Clustering MMC as much as possible.  You’ll see it has Hyper-V features in there.  Test live migration of the VM between hosts.  Do continuous pings to/from the VM during a migration.  Do file copies during a migration (pre-Vista OS on the VM is perfect for this test).  Make sure the VM’s have the integration components/integration services/enlightenments (or additions for you VMware people) installed.  You should notice no downtime at all.

Remember that for Linux VM’s you need to set the MAC in the VM properties to be static or they’ll lose the binding between their IP configuration and the virtual machine NIC after a migration between hosts.

Administartion of VM’s

I don’t know why some people can’t see or understand this.  You can enable remote desktop in your VM’s operating system to do administration on them.  You do not to use the Connect feature in Hyper-V Manager to open the Virtual Machine Connection.  Think of that tool as your virtual KVM.  Do you always use KVM to manage your physical servers?  You do?  Oh, poor, poor you!  You know there’s about 5 of you out there.

Linux admins always seem to understand that they can use SSH or VNC.

Virtual Machine Manager 2008 R2

VMM 2008 R2 will allow you to manage a Hyper-V cluster(s) as well as VMware and Virtual Server 2005 R2 SP1.  There’s a workgroup edition for smaller clusters.  It’s pretty damned powerful and simplifies many tasks we have to do in Hyper-V.  Learn to love the library because that’s a time saver for creating templates, sharing ISO’s (see constrained delegation above during the OS installation), administration delegation, self service portal, etc.

You can install VMM 2008 R2 as a VM on the cluster but I don’t recommend it.  If you do, then use the Failover Clustering and Hyper-V consoles to manage the VMM virtual machine.  I prefer that VMM be a physical box.  I hate the idea of chicken and egg scenarios.  Can I think of one now?  No, but I’m careful.

To deploy the VMM agent you just need to add the Hyper-V cluster.  All the hosts will be imported and the agent will be deployed.  Now you can do all of your Hyper-V management via PowerShell, the VMM console and the Self Service console.

You also can use VMM to do a P2V conversion as mentioned earlier.  VSS capable physical machines that don’t run transactional databases can be converted using a live or online conversion.  Those other physical machines can be converted using an offline migration that uses Windows PE (pre-installation environment).  Additional network drivers may need to be added to WinPE.

You can enable PRO in your host group(s) to allow VMM to live migrate VM’s around the cluster based on performance requirements and bottlenecks.  I have set it to fully automatic on our cluster.  Windows 2008 quick migration clusters were different: automatic moves meant a VM could be offline for a small amount of time.  Live Migration in Windows Server 2008 R2 solves that one.

Figure out your administration model and set up your delegation model using roles.  Delegated administrators can use the VMM console to manage VM’s on hosts.  Self service users can use the portal.

Populate your library with hardware templates, VHD’s and machine templates.  Add in ISO images for software and operating systems.  An ISO create and mounting tool will prove very useful.

Operations Manager 2008 R2

My advice is “YES, use it if you can!”.  It’s by using System Center that makes Hyper-V so much better.  OpsMgr will give you all sorts of useful information on performance and health.  Import your management packs for Windows Server, clustering, your hardware (HP and Dell do a very nice job on this.  IBM don’t do so well at all – big surprise!), etc.  Use the VMM integration to let OpsMgr and VMM to work together.  VMM will use performance information from OpsMgr for intelligent placement of VM’s and for PRO.

I leave the OpsMgr agent installation as a last step on the Hyper-V cluster.  I want to know that all my tweaking is done … or hopefully done.  Otherwise there’s lots of needless alerts during the engineering phase.

Backup

Deploy your backup solution.  I’ve talked about this before so check out that blog post.  You will also want to backup VMM.  Remember that DPM 2007 cannot backup VM’s on a CSV.  You will need DPM 2010 for that.  Check with your vendor if you are using backup tools from another company.

Pilot

Don’t go running into production.  Test the heck out of the cluster.  Deploy lots of VM’s using your templates.  Spike the CPU in some of them (maybe a floating point calculator or a free performance tool) to test OpsMgr and VMM PRO.  Run live migrations.  Test P2V.  Test the CSV coordinator failover.  Test CSV path failover by disconnecting a running host from the SAN – the storage path should switch to using the Ethernet and route via another host.  Get people involved and have some fun with this stage.  You can go nuts while you’re not yet in production.

Go Into Production

Kick up your feet, relax, and soak in the plaudits for a job well done.

EDIT #1:

I found this post by a Microsoft Failover Clustering program manager that goes through some of this if you want some more advice.

My diagrams do show 4 NIC’s, including the badly named CSV (Live Migration dedicated).  But as I said in the OS installation section, you only need 3 for a reliable system: (1) parent, (2) heartbeat/live migration, and (3) virtual switch.

EDIT #2

There are some useful troubleshooting tips on this page.  Two things should be noted.  Many security experts advise that you disable NTLM in group policy across the domain.  You require NTLM for this solution.  There are quotes out there about Windows Server 2008 failover clusters not needing a heartbeat network. But “If CSV is configured, all cluster nodes must reside on the same non-routable network. CSV (specifically for re-directed I/O) is not supported if cluster nodes reside on separate, routed networks”.

Hyper-V Backup Strategies

Because you are dealing with virtual machines you have more options available to you than you did when backing up traditional tin servers.  What approach you take depends on whether you need to recover files, databases or just an entire server, what your budget is and how you configure the storage of your VM’s.  Oh yes, and your budget.

In-VM or On-Host Backup?

What does that mean?  There are two places you can do your backup from. 

On-Host Backup

This allows you to capture selected VM’s on the host as they are running.  There’s some catches to that which I’ll come back to later.  The benefit of this approach is that it’s a simple hammer that can hit everything.  If you need to recover all of your VM’s then you can do it.  But you have no knowledge of the VM’s contents nor the ability to recover single files from within the VM.  To do that with this approach you have to recover the entire VM to an isolated network, log into the VM, and then grab the files you need.

This on-host backup really needs to be Hyper-V aware, i.e. use the Hyper-V VSS (volume shadow copy service) writer.  When your backup software tries to backup the VM then the VM will be quickly brought to a “quiescent” state.  This is accomplished at two levels.  The parent partition uses VSS to access the VHD files.  The integration components feature a backup integration.  This allows the VSS writers in the VM to bring file services, Exchange, SQL (and any other VSS aware services) into a brief restful state too.  A snapshot of the VM can then be taken using VSS and the backup software get’s the VM’s running state.  Note that this quiescent state is not noticeable.  Odds are you are already using this VSS technology to backup Windows file servers, Exchange and SQL and haven’t noticed a thing.

You probably noticed a catch here.  The backup causes no noticeable downtime to the VM if (a) VSS is available in the VM operating system and (b) the backup integration component service is running in the VM.  That means you must be running Windows Server 2003 SP3 or later in the VM and you have installed the IC’s and left the backup integration service enabled.  All of the volumes in the VM must also have VSS enabled.

If you have VM’s that don’t meet both of those requirements then they must be stopped (saved state) before a backup can commence.  This will include, for example, VM’s that meet these conditions:

  • VM’s that do not have the VSS service, e.g. Linux or Windows 2000
  • VM’s that do not have the IC’s installed and the backup integration service enabled
  • VM’s that do not have VSS enabled on all of their volumes.

Some types of storage cannot be backed up in this way.  Passthrough storage is not a file like a VHD so that is excluded from this approach.  And you need to be aware of remote storage that is directly connected to the VM.  It is not connected to the parent partition so it cannot be backed up with this approach.

You should also be aware that virtual network configurations are reportedly not backed up with this approach.

However the two big benefits are:

  • You can do an “iron”-level backup of a VM.  If you lose the VM then you can instantly restore it to a known state with no need to build new VM’s, install software, patches, etc.
  • As I’ve mentioned you should not use snapshots in production.  Using a VSS backup on the host you effectively get snapshot functionality.

In-VM Backup

The second approach is to do an in-VM backup.  This is pretty much doing what you’ve always done with your physical servers.  You log into the VM and do the backup from there.  Here are the benefits:

  • You can use whatever backup tool you want that is installed in the VM.  It does not need to be Hyper-V VSS aware.  Although it doesn’t need to be Hyper-V VSS aware you should take steps to ensure you can still backup open files and backup databases (mail, Oracle, MySQL, etc) consistently.
  • You can backup remote storage that is not connected to the host, e.g. where a VM directly connects to iSCSI storage.
  • You can use this approach for Linux/Windows 2000/etc and where you do not (or cannot) install Integration Components, do/cannot not have VSS enabled on all volumes or do not/cannot enable the backup integration service.
  • Best of all, this approach allows you to selectively backup files and allows you to selectively recover files or databases.  This is because the backup is in the VM and thus is aware of the data in the VM.

Recovering a lost VM with just this approach will be time consuming.  You would have to:

  • Build a new VM and set up the operating system to be identical to the previous version including service pack.
  • Do a complete restoration of the backup data.
  • Test like crazy to ensure everything is OK.

Best of Both Worlds

The best solution is to do both types of backup.  You can do an on-host backup maybe once a day, once a week or once a month for all VM’s, depending on major changes on those VM’s.  Identify those VM’s that you need to backup/recover on a granular level, e.g. shared SQL servers, Exchange, file servers, etc.  For those machines you should configure in VM backup.  Of course, there are those VM’s that don’t meet the requirements for on-host backup.  Exclude them from the backup set and set up in-VM backups for them.  It might make sense to do an on-host backup once in a while for these VM’s.  This will require a scheduled maintenance window where you put the VM’s into a saved state to run the backup.  This will allow quicker recoveries in a major disaster for these VM’s.

Here’s how you can handle various recoveries now:

  • VM destroyed: Recover the last backup of the VM from the host level.  Restore data from in-VM backup that has changes since that on-host backup.  This will bring the VM back up to date, e.g. SQL databases.
  • Data lost from a VM, e.g. SQL database, files, etc: Recover the data from the in-VM backup.
  • Host destroyed/Office Destroyed: Recover the complete on-host backups to another host or another host in another office.  Remember to configure the virtual networks.

Backup Tools

If you are operating on a shoestring then the solution for you is Windows Server Backup.  You can use this to backup your host and VM’s.  It’s not the prettiest solution but it works.  VM’s that are backed up at the host level that are not compliant with all the requirements will need to be put into a saved state either manually or via a (PowerShell) script.  In VM backup is complicated because you need to provide storage for the backups.  That means using either iSCSI or VHD’s and that adds complexity to your storage solution.

The ideal solution in a Microsoft centric network is Data Protection Manager.  DPM 2007 SP1 can backup Windows Server 2008 hosts and clusters.  It can also backup Windows Server 2008 R2 hosts and clusters.  However the caveat for Windows Server 2008 R2 clusters is that it cannot backup VM’s that are stored on Cluster Shared Volumes (CSV) and it is not Live Migration aware.  DPM 2010 (expected to RTM in Q2 2010 and in beta now) will resolve that.

DPM installs agents on the host and in the VM’s.  Licensing costs are reduced with System Center Enterprise (host and 4 VM’s on the host) and Datacenter (host and all VM’s on the host) CAL’s/SAL’s.  You can configure protection sets with schedules of your choice and your hosts/VM’s/data will be backed up to the disk storage set(s) on the DPM server.  For those VM’s that are not compliant with the Hyper-V VSS/IC requirements, DPM will automatically put them into a saved state and do the backup.  A nice touch with DPM is that it will allow replication of the backed up data to another DPM server.  This could be in a remote location, e.g. a hosting company, and have a tape drive attached to stream data from disk to tape for archival purposes.  DPM is quite clever with backups.  It backs up at a block level.  It only backs up differences rather than entire files.  It can also compress data on the wire.

What if you’ve made an investment in other backup technologies and want to keep it simple or you have lots of non-Microsoft technology?  You have a few options:

  • If your backup vendor has Hyper-V VSS compliance then do what I’ve talked about above, picking and choosing between in-VM and on-host backups.  Windows Server 2008 R2 CSV is still pretty new so verify that the vendor also has compliance for that if you are deploying an R2 Hyper-V cluster.
  • If your backup vendor does not have Hyper-V VSS compliance then you can only do in-VM backups.  It’s not ideal but it’s what you’ve been doing up to now with your physical servers so nothing has changed.  You’re just not able to take advantage of snapshot style functionality at the host level for your VM’s.
  • Maybe add DPM into the mix for host-level backups only and do daily/weekly/monthly backups.  That way you get an “iron” level backup of the VM for those dreaded scenarios when you have to do a complete recovery.

Things To Watch Out For

  • Patches.  No matter what your backup solution is, get all of the latest patches.  DPM 2007 SP1 requires a hot fix for W2008 Hyper-V support.  Install the June 2009 rollup.  DPM 2010 requires a hotfix on W2008 R2 Hyper-V RTM clusters too.
  • DPM 2007 SP1 isn’t the completed solution for W2008 R2 clusters due to the lack of support for CSV and lack of Live Migration awareness.  If you are deploying DPM 2007 SP1 on W2008 R2 clusters then have your licensing set up to upgrade to DPM 2010 next year.
  • The Windows Server Backup approach requires a registry change on the host.  Complete instructions are on the MS site.
  • Even if you only do in-VM backups, ensure your vendor will support it.  Just because it’s in VM and should be pretty much identical to backing up a physical box, it doesn’t mean the vendor will actually support a VM backup.
  • Test the crap out of this stuff once you have a lab or a pilot set up.

DPM 2010 Storage Sizing Beta

Microsoft has released a beta for sizing storage requirements for System Center Data Protection Manager 2010.  It’s 3 Excel spread sheets. 

“These DRAFT storage calculators are for use with those planning DPM 2010 (beta) deployments – with specific calculators for Hyper-V, SharePoint and Exchange environments”

Technorati Tags: ,

Bare Metal Recovery of Windows Server 2008 with DPM SP1

Microsoft has released guidance on how to perform a bare metal or iron level recovery of W2008 using System Center Data Protection Manager Service Pack 1.

“This technical article outlines the steps of using DPM 2007 SP1 alongside the Windows Server Backup (WSB) utility to provide a supported bare metal recovery of Windows Server 2008.

System Center Data Protection Manager (DPM) 2007 is a key member of the Microsoft System Center family of management products designed to help IT professionals manage their Windows Server environments. DPM is the new standard for Windows Server backup and recovery – delivering continuous data protection for Microsoft applications, virtualization, file servers, and desktops using seamlessly integrated disk and tape media, as well as cloud repositories. DPM enables better backups with rapid and reliable recoveries for both the IT professional and the end-user. DPM helps significantly reduce the costs and complexities associated with data protection through advanced technology for enterprises of all sizes. Using complimentary technologies in addition to DPM’s actual software, DPM 2007 SP1 can perform a bare metal recovery (BMR) to restore an entire server without an operating system”.

Technorati Tags: ,,