Why I Dislike Dynamic VHD in Production

With this post, I’m going to try explain why I recommend against using Dynamic VHD in production.

What is Dynamic VHD?

There are two types of VHD you may use in production:

  • Fixed: This is where all of the allocated storage is consumed at once.  For example, if you want 60 GB of virtual machine storage, a VHD file of around 60 GB is created, consuming all of that storage at once.
  • Dynamic: This is where the VHD will only consume as much as is required, plus a little buffer space.  If you allocate 60 GB of storage, a tiny VHD is created.  It will grow by small chunks to accommodate new data, always leaving a small amount of free space.  It kind of works like a SQL Server database/log file.  Eventually the VHD will reach 60 GB and you’ll run out of space in the virtual disk.

With Windows Server 2008 we knew that Dynamic VHD was just too slow for production.  The VHD would grow in very small amounts, and often lots of growth was required at once, creating storage write latency.

Windows Server 2008 R2

We were told that was all fixed when Windows Server 2008 R2 was announced.  Trustworthy names stood in front of large crowds and told us how Dynamic VHD would nearly match Fixed VHD in performance.  The solution was to increase the size of the chunks that were added to the Dynamic VHD.  After RTM there were performance reports that showed us how good Dynamic VHD was.  And sure enough, this was all true … in the perfect, clean, short-lived, lab.

For now, lets assume that the W008 R2 Dynamic VHD can grow fast enough to meet write activity demand, and focus on the other performance negatives.

Fragmentation

Let’s imagine a CSV with 2 Dynamic VHDs on it.  Both start out as small files:

image

Over time, both VHDs will grow.  Notice that the growth is fragmenting the VHDs.  That’s going to impact reads and overwrites.

image

And over the long term, it doesn’t get any better.

image

Now imagine that with dozens of VMs, all with one or more Dynamic VHDs, all getting fragmented.

The only thing you can do to combat this is to run a defrag operation on the CSV volume.  Realistically, you’d have to run the defrag at least once per day. Defrag is an example of an operation that’s going to kick in Redirected Mode (or Redirected Access).  And unlike backup, it cannot make use of a Hardware VSS Provider to limit the impact of that operation.  Big and busy CSVs will take quite a while to defrag, and you’re going to impact on the performance of production systems.  And you really need to be aware of what that impact would be on multi-site clusters, especially those that are active(site)-active(site).

Odds are you probably should be doing the occasional CSV defrag even if you use Fixed VHD.  Stuff gets messed up over time on any file system.

Storage Controllers

I am not a storage expert.  But I talked with some Hyper-V engineers yesterday who are.  They told me that they’re seeing SAN storage controllers that really aren’t dealing well with Dynamic VHD, especially if LUN thin provisioning is enabled.  Storage operations are being queued up, leading to latency issues.  Sure, Dynamic VHD and thin provisioning may reduce the amount of disk you need, but at what cost to the performance/stability of your LOB applications, operations, and processes?

CSV and Dynamic VHD

I became aware of this one a while back thanks to my fellow Hyper-V MVPs.  It never occurred to me at all – but it does make sense.

In scenario 1 (below) the CSV1 coordinator role is on Host1.  A VM is running on Host1, and it has Dynamic VHDs on CSV1.  When that Dynamic VHD needs to expand, Host1 can take care of it without any fuss.

image

In scenario 2 (below) things are a little different.  The CSV1 coordinator role is still on Host1, but the VM is now on Host3.  Now when the Dynamic VHD needs to expand, we see something different happen.

image

Redirected Mode/Access kicks in so the CSV coordinator (Host1) for CSV1 can expand the Dynamic VHD of the VM running on Host3.  That means all storage operations for that CSV, on Hosts2-3 must travese the CSV network (maybe 1 Gbps) to Host1, and then go through its iSCSI or fibre channel link.  This may be a very brief operation, but it’s still something that has a cumulative effect on latency, with potential storage I/O bottlenecks in the CSV network, Host1, Host1 HBA, or Host1 SAN connection.

image

Now take a moment to think bigger:

  • Imagine lots of VMs, all with Dynamic VHDs, all growing at once.  Will the CSV ever not be in Redirected Mode? 
  • Now imagine there are lots of CSVs with lots of Dynamic VHDs on each.
  • When you’re done with that, now imagine that this is a multi-site cluster with a WAN connection adding bandwidth and latency limitations for Redirected Mode/Access storage I/O traffic from the cluster nodes to the CSV coordinator.
  • And then imagine that you’re using something like a HP P4000/LeftHand where each host must write to each node in the storage cluster, and that redirected storage traffic is going back across that WAN link!

Is your mind boggled yet?  OK, now add in the usual backup operations, and defrag operations (to handle Dynamic VHD fragmentation) into that thought!

You could try to keep the VMs on CSV1 running on Host1.  That’ll eliminate the need for Redirected Mode.  But things like PRO, and Dynamic Optimization of SCVMM 2012 will play havoc with that, moving VMs all over the place if they are enabled – and I’d argue that they should be enabled because they increase service uptime, reliability, and performance.

We need an alternative!

Sometimes Mentioned Solution

I’ve seen some say that they use Fixed VHD for data drives where there will be the most impact.  That’s a good start, but I’d argue that you need to think about those System VHDs (the ones with the OS).  Those VMs will get patched. Odds are that will happen at the same time and you could have a sustained level of Redirected Mode while Dynamic VHDs expand to handle the new files.  And think of the fragmentation!  Applications will be installed/upgraded, often during production hours.  And what about Dynamic Memory?  The VMs paging file will increase, thus expanding the size of the VHD: more Redirected I/O and fragmentation.  Fixed VHD seems to be the way to go for me.

My Experience

Not long after the release of Windows Server 2008 R2, a friend of mine deployed a Hyper-V cluster for a business here in Ireland.  They had a LOB application based on SQL Server.  The performance of that application went through the floor.  After some analysis, it was found that the W2008 R2 Dynamic VHDs were to blame.  They were converted to Fixed VHD and the problem went away.

I also went through a similar thing in a hosting environment.  A customer complained about poor performance of a SQL VM.  This was for read activity – fragmentation would cause the disk heads to bounce and increase latency.  I converted the VHDs to fixed and the run time for reports was immediately improved by 25%.

SCVMM Doesn’t Help

I love the role of the library in SCVMM. It makes life so much easier when it comes to deploying VMs, and SCVMM 2012 expands that exponentially with the deployment of a service.

If you are running a larger environment, or a public/private cloud, with SCVMM then you will need to maintain a large number of VM templates (VHDs in MSFT lingo but the rest of the world has been calling them templates for quite a long time). You may have Windows Server 2008 R2 with SP1 Datacenter, Enterprise, and Standard. You may have Windows Server 2008 R2 Datacenter, Enterprise, and Standard. You may have W2008 with SP1 x64 Datacenter, Enterprise, and Standard. You may have W2008 with SP1 x86 Datacenter, Enterprise, and Standard. You get the idea. Lots of VHDs.

Now you get that I prefer Fixed VHDs.  If I build a VM with Fixed VHD and then create a template from it, then I’m going to eat up disk space in the library.  Now it appears that some believe that disk is cheap.  Yes, I can get 1TB of a disk for €80.  But that’s a dumb, slow, USB 2.0 drive.  That’s not exactly the sort of thing I’d use for my SCVMM library, let alone put in a server or a datacenter.  Server/SAN storage is expensive, and it’s hard to justify 40 GB + for each template that I’ll store in the library.

The alternative is to store Dynamic VHDs in the library.  But SCVMM does not convert them to Fixed VHD on deployment.  That’s a manual process – and that’s one that is not suitable for the self-service nature of a cloud.  The same applies to storing a VM in the library; it seems pointless to store Fixed VHDs for an offline VM, but there’s a manual conversion process to convert the stored VMs to Dynamic VHD.

It seems to me that:

  • If you’re running a cloud then you realistically have to use Fixed VHDs for your library templates (library VHDs in Microsoft lingo)
  • If you’re a traditional IT-centric deploy/manage environment, then store Dynamic VHD templates, deploy the VM, and then convert from Dynamic VHD to Fixed VHD before you power up the VM.

What Do The Microsoft Product Groups Say?

Exchange: “Virtual disks that dynamically expand are not supported by Exchange”.

Dynamics CRM: “Create separate fixed-size virtual disks for Microsoft Dynamics CRM databases and log files”.

SQL Server: "Dynamic VHDs are not recommended for performance reasons”.

That seems to cover most of the foundations for LOB applications in a MSFT centric network.

Recommendation

Don’t use Dynamic VHD in production environments.  Use Fixed VHD instead (and passthrough in those rare occasions where required).  Yes, you will use more disk for Fixed VHD for all that white space, but you’ll get the best possible performance while using flexible and more manageable virtual disks. 

If you have implemented Dynamic VHD:

  • Convert to Fixed VHD (requires VM shut down) if you can. Defrag, and set up a less frequent defrag job.
  • If you cannot convert, then figure out when you can run frequent defrag jobs.  Try to control VM placement relative to CSV coordinator roles to minimize impact.  The script will need to figure out the CSV coordinator for the relevant CSV (because it can failover), and Live Migrate VMs on that CSV to the CSV coordinator, assuming that there is sufficient resource and performance capacity on that host.  Yes, the Fixed VHD option looks much more attractive!

So … I Prefer No Antivirus on Hyper-V Hosts?

Waiver: What you do following reading this post is up to you. 

After my earlier post on “Top Hyper-V Implementation Issues” I had some feedback on my preference to keep antivirus (AV) off of the Hyper-V hosts.

The configuration that you should have is in KB961804.  That article also says what can happen if you do install AV on your hosts, not follow that guidance, and scan everything.  One day you’ll end up with nasty errors such as 0x800704C8, 0x80070037 or 0x800703E3 and find lots of VMs (with their business apps and data) have:

  • Disappeared from your Hyper-V console
  • Disappeared from your VMM console
  • Are not running

The files are still there but, damn, the VMs will not start up or appear in a management tool.  That’s because AV has gotten in the way and screwed up with things.  I blogged about this back during the W2008 Hyper-V beta (can’t find the post now) in early 2008.  It happened to me.  I was unlucky; I set the required exclusions and restarted the host in question (a lab machine).  My VM configuration files were corrupted.  The solution was the recreate the VM’s and point them at the existing VHD’s containing the safe OS, programs, and data.  Time consuming – and how many people document/remember their VM configurations?  And come to think of it, how many businesses would be OK with their LOB applications being offline for half a day or more while admins do this?

I learned something in 2004.  There is a balancing act between security and business.  Sometimes it has to swing one way, sometimes another.  This is one of those cases.

I do not trust any antivirus product completely.  They are stupid assassins.  They are given rules of engagement, get a target list, and they attack.  But all too often, program updates, definition file updates, or dumb human operator error make mistakes.  It is not unknown for one of these to reset the exception list.  Yes; it has happened – and even happened recently.  Do you really want one of these things to undo the necessary configurations of your Hyper-V cluster – a thing that is effectively a mainframe running many/most/all of your LOB applications, and putting them at risk?

So I say: do not install AV on the parent partition or host OS.  Sure, go ahead and install it in the VMs.  If you can, choose an AV product that is aware of things like virtualisation and minimises redundant scanning.  On the host, make sure you apply security fixes.  Keep the service pack up to date.  And keep the Windows Firewall running.  Finally, restrict who has logon rights to the hosts.  If you can, prevent the hosts from having proxy/web access.  People should never browse from a server but I just don’t trust human nature.  All that should secure the parent pretty well.

Now let’s get back to why you’re installing AV on the parent partition.  Odds are there is a security officer who has a list of things that [booming voice] “must be done to all Windows computers” [/booming voice].  And if you do not do these things you will be fired!   One of them is: “you must install anti virus and scan everything because Windows is a threat to life itself”.  Hmm, someone’s been reading the SANS website again!  I hate checklist security experts.

Here’s my response to that person:

  • I’d point them to KB961804.  In fact, you might even want to show them the Microsoft required exceptions list.  It says “recommended” in the title but try having that argument with a MSFT support engineer when your SYSVOL is corrupted!
  • If they insist, then say you’ll comply but you have one requirement.  Never say “no” because that’s career suicide.  Give them a waiver form.  This form will clearly state that you the operator/administrator/engineer/consultant will not be held responsible for any corruption or loss of virtual machines because of the mandate to scan all things on the Hyper-V hosts.  All responsibility will lie with the undersigned security officer – and demand their signature.  Then keep a copy for yourself, give one to your boss, and one to the CIO.  At least then you know who will get fired when incorrectly configured AV causes your VMs to disappear.

It’s funny; security officers are usually career politicians.  And politicians do not like being nailed down to a something like that.  Taking responsibility is not in a politician’s nature.  I bet you get your way after that.

Maybe as a compromise, you might offer to take a host offline once in a while to perform a complete system scan of the C: drive.

Anyway, that’s my opinion on the matter.

Going to BUILD

Assuming the USA lets me in, I’ll be going to the BUILD conference in September.  This is where Microsoft will be opening the taps on Windows 8 information.  It’s mainly aimed at developers and hardware manufacturers but I’m pretty sure there’ll be lots more information.  With no TechEd Europe this Autumn/Winter, I guess this’ll be our only event full of info this side of the new year.

I’ll try to live blog the good stuff, where possible, like I did at TechEd 2008 in Barcelona.  We were given a monstrous amount of info about Windows 7 & Server 2008 R2 back then.

Technorati Tags: ,

Carbonite on my Windows Home Server

When I set up my Windows Home Server I configure the normal Windows Server Backup task to backup the server folders to a USB disk.  That’s nice for normal backup/recovery.  But that doesn’t protect my data (documents, books, whitepapers, and thousands of photos) against fire and theft.  Sure, I could probably swap disks and store them offsite.  But I know how poor my discipline with doing that in the past was.  I need something automated for off-site backup.

So I decided to try Carbonite.  It’s one of the few online personal backup solutions that will work on WHS.  There’s a 15 day free trial so I signed up for that, and I added the offer code from the TWiT Security Now podcast – that gives you an extra 2 months free in addition to your 12 month subscription (unlimited storage for less than $60/year!!!!).

The install was easy.  The configuration wizard walks you through the few steps.  You’re warned that files like video will not be backed up.  I’m OK with that – I have no personal/holiday videos because I’m a still photo man.  Targeting a folder is easy – use Windows Explorer, right-click, and select the add to backup option.  I had two schedule choices: constantly backup changes or schedule.  I went for the first option.

OK, the flaw: I have 20GB per month limit and I’m on ADSL.  It’s going to take a very long time to get all of my photo collection backing up to the cloud.  I’ve been incrementally adding folders, starting with My Documents, and then I added some of my older photo folders to test.  All worked well.  I’ll continue testing, and then decided next week if I’ll pay for the service.

Technorati Tags: ,,

Recommended Updates or Hotfixes for W2008 R2 SP1 Hyper-V

It used to be that we had an official page on TechNet for updates for Windows Server 2008 R2 Hyper-V.  It has since been decided to move the Windows Server 2008 R2 Service Pack 1 Hyper-V recommended updates list over to the TechNet wiki where it is community driven.

Presentation – Top Issues I’ve Encountered with Hyper-V Implementations

I gave a presentation earlier today on the subject of issues I’ve encountered, been asked about, or read about with Hyper-V implementations.  Just about all of them are related to operators or consultants not knowing any better.  Sometimes that’s caused by lack of education and sometimes it’s lack of documentation.  And sometimes … I am left exasperated!

Book Review – Daemon by Daniel Suarez

The story of Daemon  is that a games development genius dies, but that doesn’t stop him from wreaking havoc on the world.  Before he dies, he uses the AI from his games to create a distributed network to enact his will.

This book has what Zero Day didn’t: a hook, something to keep you turning the pages.  In fact, I found it quite addictive.  I was reading it before work, at lunch, and going to bed early to read more.  I finished it this morning and immediately ordered/downloaded the sequel, Freedom.

Whereas Zero Day featured an extremely believable scenario, Daemon goes a little bit more into the sci-fi end of things to add an element of danger.  However, it is still rooted in the believable.  I can’t watch a movie or read a book that features “go hack now” scenarios.  But this book was based on things like trojans, in-game AI, RSS feeds, GPS, and so on.  It just stretched what we know about a little to enable the plot, but kept this acceptable an acceptable limit for me.

Over and over, in this book, you’ll see how hacks take advantage of poor patch control.  Spotting a trend?

I reckon that if you work in IT, or find computers interesting, then there’s a really good chance that you’ll like Daemon.  This book can be ordered on Amazon.com.

Technorati Tags: ,

New Book: Windows Sysinternals Administrator’s Reference

Here’s a new book by Mark Russinovich and Aaron Margosis that you can order on Amazon.com.  If you’re a Windows admin, and find yourself needing to troubleshoot difficult issues, then this is essential reading.

“Get in-depth guidance—and inside insights—for using the Windows Sysinternals tools available from Microsoft TechNet. Guided by Sysinternals creator Mark Russinovich and Windows expert Aaron Margosis, you’ll drill into the features and functions of dozens of free file, disk, process, security, and Windows management tools. And you’ll learn how to apply the book’s best practices to help resolve your own technical issues the way the experts do.

Diagnose. Troubleshoot. Optimize.

  • Analyze CPU spikes, memory leaks, and other system problems
  • Get a comprehensive view of file, disk, registry, process/thread, and network activity
  • Diagnose and troubleshoot issues with Active Directory®
  • Easily scan, disable, and remove autostart applications and components
  • Monitor application debug output
  • Generate trigger-based memory dumps for application troubleshooting
  • Audit and analyze file digital signatures, permissions, and other security information
  • Execute Sysinternals management tools on one or more remote computers
  • Master Process Explorer, Process Monitor, and Autoruns“
Technorati Tags:

Software Benefits as a Microsoft Partner

Another common question that is popping up in my day job so I reckon it’s another subject that I need to blog about.

Microsoft partners are consumers of the technology too.  They face all the same challenges as their customers: money is tight and software can be expensive.  Good news: you can get it either cheap or even free.  What you get, and how much you get all depends on what type of partner you are and what grade and type of competency you have as a Microsoft partner company.

Piracy

A lot of Microsoft partners are using Microsoft software illegally.  That is a fact, and I suspect that it is quite common in the smaller/medium sized partner companies.  They can get a certain allocation of software, but often it is not enough. 

What is it that they are doing to be illegal?  They get their MSDN or TechNet subscription for a handful of users and start using it to deploy production desktops, applications, and servers all over the shop.  MSDN and TechNet have explicit usage rights, and they do not include widespread production usage, e.g your domain controller, file server, everyone’s PC/Office, etc.  The directors may not know this is happening, they may turn a blind eye to it (sticking fingers in ears and repeatedly shouting LAH-LAH-LAH-LAH when the sys-admin tells them the truth – been there), or they may even instruct it to happen (been there too, many years ago).

So how can you, as a Microsoft partner company, get a chunk of software legally for next to nothing?

Microsoft Partner Action Pack

This is an excellent bundle for small companies that are even at the most basic level in the Microsoft Partner Network: a registered partner.  In fact, you cannot have a silver or gold competency and subscribe to this pack!  The eligibility requirements are online.  The Irish rate (per year) is €289 and that includes a big list of software, really for that partner with up to 10 users.  Highlights include:

  • Office Professional Plus (10) + Project (5) + Visio Professional (10)
  • Exchange Standard: 1 servers + 10 CALs
  • SQL Enterprise: 1 server + 10 CALs
  • Window Server: Enterprise (1), CALs (10), Storage Server Essentials (1), SBS Standard (1), SBS CALs (10)
  • Windows 7: Pro (10), Ultimate (1)

A handful of Office on OVS will cost more than all that!

Silver and Gold Competency Holders

These folks tend to be bigger companies and are not suitable for the Partner Action Pack, nor are the elligible.  But don’t worry if you’re here, you get a much bigger allocation of software.  If you qualify for a competency, then you get an allocation of software that you are free to download and use.  What you get will depend on:

  • The competency: developers will get more relevant stuff for them, and systems management people will get more relevant stuff for them.
  • The grade: The gold competency rewards you more software than the silver one.

Microsoft could have published a nasty matrix.  Instead there, is a simple graphical calculator that allows you to punch in the competencies that your company has, as well as the grades, and it tells you what you are eligible to download and use.

For example, a company with Silver Systems Management and Silver Virtualisation competencies gets stuff including:

  • 2 Exchange Enterprise + 25 CALs + 25 ForeFront for Exchange (and SharePoint)
  • 25 Windows 7 Enterprise + 25 AD RMS + 25 Office Professional Plus
  • 2 Windows Server Datacenter + 4 Windows Server Standard/Enterprise
  • 15 Visio Professional + 5 Project Professional
  • All the System Center stuff
  • And LOTS more

Go Gold with those competencies and you get 100 copies of Office Pro Plus and Windows 7 Enterprise.  There is work to become a partner but you can see there is money to be saved.

Technorati Tags: ,