Today I was working with a customer who needed to grow their hosted presence with us due to performance and scaling requirements. OpsMgr ProTips alerts made us aware of certain things that got the customer and us working. A VMM library template machine was quickly deployed to meet the sudden requirements. That got me thinking about how OpsMgr and VMM could be used in a large virtualised (and even physical) application environment to scale out and in as required. All of this is just ideas. I’m sure it’s possible, I just haven’t taken things to this extreme.
Let’s take the above crude example. There are a number of web servers. They’re all set up as dumb appliances with no content. All the content and web configurations are on a pair of fault tolerance content servers. The web servers are load balanced, maybe using appliances or maybe by reverse proxies. It’s possible to quickly deploy these web servers from VM templates. That’s because the deployed machines all have DHCP addresses and they store no content or website configuration data.
The next tier in the application is typically the application server. This design is also built to be able to scale out or in. There is a transaction queuing server. It receives a job and then dispatches that job to some processing servers. These transaction servers are all pretty dumb. They have an application and know to receive workloads from the queuing server. Again, they’re built from an image and have DHCP addresses.
All VM templates are stored in the VMM library.
All of this is monitored using Operations Manager. Custom management packs have been written and distributed application monitoring is configured. For example, average CPU and memory utilisation is monitored across the web farm. An alert will be triggered if this gets too high. A low water mark is also configured to detect when demand is low.
The web site is monitored using a captured web/user perspective transaction. Response times are monitored and this causes alerts if they exceed pre-agreed thresholds.
The Queuing server’s queue is also monitored. It should never exceed a certain level, i.e. there is more work than there are transaction servers to process it. A low water mark is also configured, e.g. there is less work than there are transaction servers.
So now OpsMgr knows when we have more work than resources, and when we have more resources than we have work for. This means we only need a mechanism to add VM’s when required and to remove VM’s when required. And don’t forget those hosts! You’ll need to be able to deploy hosts. I’ll come back to that one later.
Deploying VM’s can be automated. We know that we can save a PowerShell job into the library when we create a VM, etc. Do that and you have your VM. You can even use the GUIRunOnce option to append customisation scripts, e.g. naming of servers, installation of updates/software, etc. Now you just need a trigger. We have one.
When OpsMgr fires an alert it is possible to associate a recovery task with the alert. For example, the average CPU/Memory across the web farm is too high. Or maybe the response time across the farm is too slow. Simple – the associated response is to run a PowerShell script to deploy a new web server. 10 minutes later and the web server is operational. We already know it’s set to use DHCP so that’s networking sorted. The configuration and the web content are stored off of the web server so that’s that sorted. The load balancing needs to be updated – I’d guess some amendment to the end of the PowerShell script could take care of that.
The same goes for the queuing server. Once the workloads exceed the processing power a new VM can be deployed within a few minutes and start taking on tasks. They’re just dumb VM’s. Again, the script would need to authorise the VM with the queuing process.
That’s the high water mark. We know every business has highs and lows. Do we want to waste Hyper-V host resources on idle VM’s? Nope! So when those low water marks are hit we need to remove VM’s. That one’s a little more complex. The PowerShell script here will probably need to be aware of the right VM to remove. I’d think about this idea: The deploy VM’s would update a file or a database table somewhere. Thing of it like a buffer. The oldest VM’ should then be the first one removed. Why? Because we Windows admins prefer newly built machines – they tend to be less faulty than ones that have been around a while.
With all that in place you can deploy VM’s to meet demands and remove VM’s when they are redundant to free up physical resources for other applications.
What about when you run out of Hyper-V server resources? There most basic thing you need to do here is know that you need to buy hardware. Few of us have it sitting around and we run on budgets and on JIT (just in time) principles. Again, you’d need to do some clever management pack authoring (way beyond me to be honest) to detect how full your Hyper-V cluster was. When you get to a trigger point, e.g. starting to work on your second last host, you get an alert. The resolution is buy a server and rack it. You can then use whatever build mechanism you want to deploy the host. The next bit might be an option if you do have servers sitting around and can trigger it using Wake-On-Lan.
ConfigMgr will run a job to deploy an operating system to the idle server. It’s just a plain Windows Server installation image. Thanks to task sequences and some basic Server Manager PowerShell cmdlets, you can install the Hyper-V role and the Failover Clustering feature after the image deployment. A few reboots happen. You can then add it to the Hyper-V cluster. You can approach this one from other angles, e.g. add the host into VMM which triggers a Hyper-V installation.
Now that is optimisation and dynamic IT! All that’s left is for the robots to rise – there’s barely a human to be seen in the process once its all implemented. I guess your role would be to work on the next generation of betas and release candidates so you can upgrade all of this when the time comes.
I’ve not read much about Opalis (recently aquired by Microsoft) but I reckon it could play a big role in this sort of deployment. Microsoft customers who are using System Management Suite CAL’s (SMSE/SMSD) will be able to use Opalis. Integration packs for the other System Center products are on the way in Q3.