A Week Off

I have a week off between jobs so I’m making the most of it.  Obviously, there is going to be a lot of work done on the book.  But I spent this past weekend up in the Cairngorn Mountains in Scotland.  I drove up via ferry.  That’s a long trip but it allowed me to bring a lot of photo gear, and damn, I used it.  Over 1.5 days, I spent 14.5 hours sitting in hides (or blinds as they are called over the pond) to do some photography.

Osprey were my target:

Rothiemurchus 223

I also got some Red Squirrels:

Rothiemurchus 192 

I’m back home now.  I’ll have a slow start this morning and then dive straight back into a Hyper-V architecture chapter for the book.

0 … I’m Leaving the Hosting/Cloud Business

Today was my last day in the hosting business.  It’s been 3 years, and to be honest, I’m happy to be moving into other areas.  I’ve had the opportunity to do a lot of stuff in those 3 years but I thought it was time to move on.  Hosting/Cloud is too much like helpdesk for my liking.

I have a week off, giving me time to work on the book and do a little bit of photography, hopefully getting up close with some Osprey in Scotland.

After that, I return to the consultancy business as a team leader with a company in Dublin.  I’ve probably spent half of my career in the consulting world so it’ll be a return to familiar territory.  I’ll be working with all the technologies that I find interesting and get the opportunity to stretch a little.  Of course, Windows, Hyper-V and System Center will all be in the mix, especially considering that those are hot button subjects around here these days.

Powered Down Virtual Machines on a Hyper-V Cluster

From time to time, I’ll be asked to power down virtual machines in our production environment.  I also run a test virtual machine on the cluster to test things like Live Migration after doing upgrade work.  Normally, I’d like to keep it powered down, just to save 512MB of RAM and the occasional CPU cycle.  But it seems to me, that Microsoft does not like us to keep powered down virtual machines on the cluster.

My first clue was in VMM.  VMM tries to protect the cluster reserve in a Hyper-V cluster.  In other words, VMM will change the status of a cluster object to a warning if you overcommit the resources.  For example, if you have 58GB of RAM for VM’s across your N+1 3 node cluster, then it’ll complain when you deploy 58GB+ of VM RAM.  One would assume that VMM would only calculate the running VM’s.  However, I can confirm that it does include the RAM assignments to powered down VM’s as well.  I can understand this conservative approach … it’s the sort of thing a banker would do if they didn’t want to bankrupt their bank’s loan book ;-)  You have to allow for a scenario where the VM will be powered up.  Who’s to say that there isn’t a tester or developer at the other end of a Self-Service Portal, consuming their quota points, and eager to power up the VM’s at any moment.

The next clue is in OpsMgr.  I’ve imported the Microsoft Windows Cluster management pack.  A highly available virtual machine is a resource from clustering’s point of view.  Surely you deployed it on a cluster (as a highly available virtual machine) for a reason?  Shouldn’t it be running?  That’s how the management pack sees it.  An object is created in OpsMgr for every monitored cluster resources, i.e. virtual machine, and its status will go to critical if the resource is stopped, i.e. the virtual machine is powered down.  You’ll get an alert and notifications will go out.  If you are running SLA reporting then you’ll get a nice red mark all over your SLA.  Whoops!

So what should you do with those powered down VM’s?  If it is going to be down for a long time then you should move it to the VMM library.  There you have cheaper storage, and hopefully lots of it.  Importantly, the VMM cluster reserve will be OK.  OpsMgr will stop complaining after a little while about a failed cluster resource.

What if this power down is a short term thing?  You should obviously add resources to the cluster to resolve the VMM cluster reserve warning because you won’t have an N+1 (or greater) cluster with enough resources to handle a failed host (or hosts).  You can use the Health  Explorer in OpsMgr to put the critical resource (the powered down VM) in the cluster into maintenance mode, thus eliminating alerts.  You should do that before powering down the VM.

Long term, if lots of VM’s will be powered down and up, you might want to create a dedicated, lower priority, cluster for this.  You can customize the monitoring not to care about cluster resources being up or down.  You can probably safely ignore warnings about VMM cluster reserve being exceeded too.

Are VMware Getting Desperate?

I just received this spam email from a VMware sales rep.  It was sent to a mail list <undisclosed recipients>:

“I tried to call you this morning in relation to your interest in our virtualization technology. I just wanted to know if you have any virtualization projects at present, or planned for the future. In Vmware we can help in terms of technical assistance, commercial advice, hardware setup etc. if required.

Please drop me a line when you have a moment and don’t hesitate to ask any questions that may be helpful to your objectives”.

So, VMware is cold calling people and then emailing them to drum up business.  Hmmm ….

Technorati Tags: ,

Run WebSphere in a Virtual Machine?

I know nothing about WebSphere other than it exists; as you may have gathered, I try to avoid IBM products.  I was talking to an engineer yesterday and he mentioned that WebSphere was a product he was never able to run successfully as a virtual machine on ESX 3.X.  The memory was constantly being paged and the hypervisor couldn’t keep up.

That got me to thinking: would W2008 R2 Hyper-V’s SLAT (second level address translation) feature help with this?  I suspect it probably would make virtualising this application more feasible.  SLAT leverages AMD RVI and Intel EPT to remove the hypervisor from the role of mapping physical memory to virtual machine memory.  The CPU is able to do it more efficiently than software can.

It’s something that might be worth testing in a lab if a WebSphere server has appeared on your radar as a candidate for conversion.  Just make sure you are using SLAT capable CPU’s in your host servers.

ESX 4.0 has something similar to SLAT so I guess it is probably worth trying there too if that is your hardware virtualisation platform.

As usual, check with the vendor for virtualisation support and recommendations.  Then balance the risks and decide for yourself.

W2008 R2 SP1 Dynamic Memory Explained

Dynamic Memory can be a little confusing at first so I thought I’d give it a go.  It’s a new feature in Windows Server 2008 R2 Service Pack 1, the beta of which will appear some time in July 2010.  Dynamic Memory will support virtual machines running:

  • Windows Server 2003, 2003 R2, 2008, and 2008 R2 Web, Standard, Enterprise, and Datacenter editions.
  • Windows Vista/7 Ultimate and Enterprise editions.

The process is a mixture of memory hot-add and ballooning.  First …

Each VM will have a bunch of settings:

  • Do you want statically defined or dynamic memory for the VM?
  • What will be the minimum amount of memory in the VM?
  • What will be the maximum amount of memory in the VM?
  • How much free memory/buffer should the VM have?

OK.  Now the tricky bit.  How does it work?

The VM will boot up with the minimum amount of RAM.  Let’s say this is 1GB.  As the VM’s requirements grow, a VSC (Virtual Server Client) driver running in kernel mode in the VM will pull in memory from the host and supply it to the VM.  This consumes more physical memory from the host.

It is important to note that memory doesn’t magically appear from thin air.  The host must have the memory that is required by the VM – otherwise we get into a nasty performance situation.

What happens on the way down when the memory requirements of the VM reduce?

As the VM no longer needs memory, the ballooning process kicks in.  Memory cannot be physically removed from the VM.  Windows wouldn’t like that!  Instead, the VSC “tricks” it.  This driver simultaneously:

  • Reports to the VM that it is consuming the RAM that is being freed up.  It isn’t really.  But this prevents Windows in the VM from trying to allocate those blocks of memory that won’t really be there.  This is the balloon.
  • The VSC returns the free/ballooned memory to the host so that it can be made available for other memory hungry VM’s.

As you can see in some of the demo’s that I’ve linked to recently, it’s a pretty simple, rapid and easy thing to use.

The trick here is not to abuse what Dynamic Memory can do.  There’s no point in over committing host servers.  If you know that you have an average of 75% memory utilisation across your VM’s then don’t try to get twice as much out of the host.

I think the sizing of the settings will be tricky.  I think OpsMgr reporting will prove very useful in figuring out what is best to do and how to configure the settings.

Where will Dynamic Memory be useful?

  • VDI: No doubt!  The big cost here are the GB’s of RAM and that’s usually the bottleneck in a host, not the CPU.
  • VM Sizing: Just like in the physical world, it’s hard to accurately size memory for VM’s.  Software vendors can be very conservative with requirements and you can end up with too much RAM in a VM.  Now you can set a range and let the VM consume what it really needs.
  • Labs: We usually have limited budgets so squeezing that little bit more for a couple extra VM’s will be very nice.  I wish I could do this right now for my book lab!!!

As the Service Pack 1 beta and RC releases develop, I’m sure MS will release more information about Dynamic Memory and engineering recommendations.

VMM Integration Setup with OpsMgr Fails

This deals with a scenario where you run the Configuration Operations Manager setup routine for VMM 2008 R2.  The setup fails at the Administrator Console stage with the following error:

<Problem>

"Setup was not able to retrieve the service account from the specified Virtual Machine Manager server.

Specify a different VMM server and then try the operation again.

ID: 10207"

I had this issue while setting up my lab environment for the book.  I’d never seen it before.  No matter what I did, it kept repeating and stuffed the schedule for a chapter, forcing me to move to the next chapter (after wasting 4 or 5 days).

I searched all over and found no help at all, but plenty of people who had seen this problem.  Some solutions included setting up the prerequisites.  I was 100% sure that I had done that: management packs, console installs, OpsMgr admin rights to the VMM service account, etc.  AD and accounts were all healthy.  I uninstalled and reinstalled VMM (retaining and reusing the database) and even created a new VMM server from scratch.

Eventually, I opened a call with MS PSS.  Within a couple of hours an engineer named Ganesh called me up – well ahead of the 8 hours SLA (nice!).  We fired up an easy assist session and went through all the steps.  We were both sure of a problem on the VMM side; OpsMgr was behaving perfectly.

Note: A handy troubleshooting step is to install a VMM Admin console on another machine and try that.  That can ID a cause of this issue – there appear to be many including dodgy DNS records and missing prereqs.   The setup log in the hidden (by default) ProgramData folder doesn’t give very much detail on the integration setup failure so you have to go through everything.

Ganesh wanted to look at the service account in ADSIEdit.  We browsed to it.  The first suspect was the SPN records.  SPNSETUP –L led us to believe everything was OK there.  We opened up the properties of the VMM service account and confirmed that.  However, when we expanded the service account we did notice something.  The SCP (it appears like a sub-folder in ADSIEdit under the VMM service account and is called CN=MSVMM) was missing.  This should be created by the VMM setup on the VMM server.

We reinstalled VMM on the VMM server once again.  Still no sign of the SCP.  This was a very unusual one.  Ganesh needed some time to do some research and to contact Redmond.

<FIX>

A day later I got an email from Ganesh.  There was a way to create the SCP.  Pop the VMM 2008 R2 media into the VMM server.  Browse to the <architecture>Setup folder using command prompt.  Run CONFIGURESCPTOOL.EXE –INSTALL from that folder.  This will create the SCP; I confirmed this in ADSIEdit.  I reran the integration setup and it completed perfectly.  After a while all of the VMM content started appearing in OpsMgr.

Note: If you do uninstall/reinstall VMM then make sure you patch it to the level it was at before.  Up-to-date VMM agents cannot communicate with out-of-date VMM servers.

Thanks to Ganesh over in PSS for your help!

Clustered Hyper-V Host Virtual Machine Capacity Increases

Last week at TechEd, Microsoft announced an increase in the number of supported virtual machines in a Hyper-V Cluster.  You may know that a Hyper-V host supports up to 384 running VM’s.  But up to now, only 64 VM’s were supported on a clustered Hyper-V host.

That changes now.  Microsoft supports up to 1000 VM’s in a cluster, regardless of how many Hyper-V hosts are in the cluster.  With one exception,of course: You’ll see that a 2 node cluster is limited to 384 VM’s (N+1 hosts) because of the Hyper-V limit of 384 per host.

  • 16 nodes: ~62 VM’s/node
  • 8 nodes: 125 VM’s/node
  • 4 nodes: 250 VM’s/node
  • 3 nodes: 333 VM’s/node
  • 2 nodes: 384 VM’s/node

This is nicely timed with Dynamic Memory and new CPU’s allowing for greater numbers of VM’s per host.  Now we can host them in a supported manner.

Book Progress

Work continues on Mastering Hyper-V Deployment.  We’ve hit a few snags along the way.  We had a bit of a project shakeup over the last 2 weeks.  I hit a technical issue (VMM and OpsMgr integration) last week that has delayed one chapter while I engage support from MS.

But work continues.  One chapter might be pushed out but others come forward.  It’s all hands on deck right now as I enter the most intense part of my schedule.  Sleep is at a premium: I have a day job and the book is worked on at night and at the weekends.

This blog is proving to be critical.  One of its purposes is to act as my notebook: I collect bits of information from other sources and I record things that I’ve learned along the way.  As part of my day job, I often refer back to it.  For the book, I find myself coming back here to pull out bits of information.  The book allows me to tie them together, order them, experiment a bit, and expand on the information.

The current chapter I’m working on is a perfect example of this.  I’m pulling in information that is anywhere from 2 weeks to 2 years old.  A lot of it is plain text, discussing architecture and configurations.  The second half of it is pure step-by-step of some very new stuff. 

Another key source of information is the MVP community.  I’ve had help and continue to get help from countless people.  Sometimes it’s stuff I’ve picked up in general conversation.  Sometimes they’ve been kind enough to answer questions.  When the book comes out, you’ll see how big a role MVP’s have played in its writing.  I’ll have say a BIG “thank you” to a big bunch of people who’ve helped shape it.

In a way, the timing of the book has worked out well.  Sure, we are well into the life of W2008 R2 Hyper-V.  Typically a book tries to be released in the first few months of a product in order to maximise its lifecycle.  We’ll be about 1 year into the life of W2008 R2 when we hit the shelves.  But this book is about enterprise deployments: and a lot of the accompanying products in that sort of deployment have just hit the shelves.  Heck, some of our content isn’t even available in an RTM form yet!