Counting SALs in a Multi-Tenant Hosting Environment

Right now, 99% of people are going “Don’t you mean CALs?”.  Nope.  In the hosting world licensing via SPLA is very different – it makes every other kind of licensing from every vendor look easy.  I reckon SPLA is managed in MSFT by these guys.  SALs are a way of licensing some products in hosting on a per user basis.  Some products have a choice of per-proc or per-user, and some, like RDS, only support the per-user SAL.

I worked in hosting in the past for several years but never had to count SALs … they weren’t popular because:

  • They’re near impossible to count in a multi-tenant environment: firewalled, no-trust, maybe no domains, and a significant chance that the hoster has no access to the VMs.
  • You can’t rely on the customer to count the users correctly … hell, they’ll probably ask why the should be counting anything or claim that they don’t know what a user account is.

I was asked if I had a solution to this earlier this week.  I didn’t at the time, but a conversation on Windows Weekly got me thinking.

Step 2: Use Hyper-V

Of course!  There is a reason as you will see.

Step 2: Contractually Require VM Access

Anyone signing up a contract using SALs must allow the hoster local admin rights in the VMs.  That’s probably there by default, but the tenant might delete those rights.  Maybe you have an Orchestrator task to verify rights.  If that fails, it sends emails to the tenant and puts the VMs into a saved state after X days.  Naughty!

Step 3: Audit via Scheduled PowerShell Script

Each VM will have a scheduled task to scan for user accounts, usage, and numbers.  Maybe this is a script that is scheduled via the VM template.  That’s a bit static and very messy to change – Forget ConfigMgr in this environment.  Maybe you use your local admin rights via Orchestrator to do this scan?

I wonder if you can kick off runbooks with dynamically queried VM names (pulled by querying a customer’s currently deployed VMs?

Step 4: Get the Results

Honestly, I don’t know Orchestrator beyond the marketing – so maybe it can pull out results from the VM.  Maybe not.  If not, or if you do use an internally scheduled script, you could write the results to pre-defined keys in the VMs registry.  You could then query those results via KVP via the integration components.

Anyway … that’s one way.  I’m sure there’s lots others?

Technorati Tags: ,,

Forcefully Removing a VM From VMM 2012

I was on a customer site and was asked to help remove a virtual machine from VMM that had failed to V2V correctly from vSphere.  VMM locks the VM in a V2V state and won’t let you repair/undo/anything to the VM.  Not very helpful!

Any attempt to delete the VM was met with the following failure in Jobs:

Error (2604)
Database operation failed.

Recommended Action
Ensure that the SQL Server is running and configured correctly, and try the operation again.

First, I verified that I was targeting the correct VM … don’t want to accidentally delete the source VM from vSphere before a correct V2V:

Get-SCVirtualMachine | where { $_.Name -EQ "Bad-V2V-VM"} | fl name, status

Name   : Bad-V2V-VM
Status : V2VCreationFailed

Next up I took the above code and replaced the FL cmdlet with Remove-SCVirtualMachine, and forced the removal to complete:

Get-SCVirtualMachine | where { $_.Name -EQ "Bad-V2V-VM"} | Remove-SCVirtualMachine –Force

HTC One After One Week

It’s now a week after I bought a HTC One to replace my iPhone 4.  I can summarize the experience: I’m getting my photos off of my iPhone 4 now so I can wipe it to sell it on.

What I Like

It didn’t take me too long to figure out Android.  My only experiences with Android had been in airport electronics stores and that never went well … come to think of it … no “hands on demo” seems to work well outside of a Microsoft or an Apple store.

All the apps I wanted are there plus more.  As you might know, podcasts are important to me, and I was sorted in minutes, managing my subscriptions with no arbitrary regional, fictitious licensing, or technical issues. 

I connected up to Office 365 and all my mail, calendar, and contact items were on the phone in minutes.  I switched from the default browser to Chrome and was good.  I’ve not synced that with my PCs yet – where I’ve been switching to Chrome from the crash-happy IE10. 

Live tiles?  I have a page with some widgets pinned for Audible, my alarms, sleep mode, airplane mode, and the podcast player.  Anything else, an old fashioned icon does just nicely.

32 GB of storage is nice to have.  No worrying about filling up 16 GB … my iPhone 4 is a 16 GB model.  I’d hate to be buying that sort of phone now … partly why I did rule out the otherwise nice looking Nokia Lumia 925, and absolutely why I ruled out the S4 (Google variant won’t be released outside the USA due to h/w & network requirements).

The handset is good.  Reception is good, it works perfectly well with my Parrot bluetooth, and although it’s around 4.8” (no comic book store guy-style corrections, thank you) in size, it’s a slim phone and it fits nicely in my pocket.  T|he only issue was that it was hard to distinguish from my HTC 8x WP8 handset for work.

What I Don’t Like

The battery life could be better.  It does drain pretty damned quickly when being used.  It also doesn’t correctly predict the Euro Millions lottery numbers.  I should hate the phone for that, really.

So far, so good.

I’m Blogging On 3 Sites Now

OK, so obviously this site is my main destination for content.  I am blogging additional and different stuff for the Petri IT Knowledgebase, giving me the chance to do some KB style things while I’ve covered more architectural and concept stuff here more recently.  And now …

I’m also blogging on the MicroWarehouse (Irish distributor and Microsoft Value Added Distributor) site.  My first article has gone live: An Office 365 Opportunity – Public Website For An SME.  Content on that site will be for Microsoft partners, etc, and will more than likely be boring (more than usual for me) for the normal techie.

*Exhausted*

VMM 2012 R2 Release Notes

Microsoft has published the release notes for System Center 2012 R2 – Virtual Machine Manager (VMM).  There are some important notes there, but I thought I’d highlight a few that stick out:

  • For file server clusters that were not created by VMM, deploying the VMM agent to file server nodes will enable Multipath I/O (MPIO) and claim devices. This operation will cause the server to restart. Deploying the VMM agent to all nodes in a Scale-out File Server cluster will cause all nodes to restart.
  • Generation 2 virtual machines are not supported
  • If System Center 2012 R2 VMM is installed on Windows Server 2012, you cannot manage Spaces storage devices that are attached to a Scale-out File server. Spaces storage requires an updated SMAPI that is included with Windows Server 2012 R2 release version.
  • The Physical-to-Virtual (P2V) feature will be removed from the System Center 2012 R2 release.
  • Windows Server supports storage tiering with Storage Spaces. However, VMM does not manage tiering policy.
  • Windows Server supports specifying write-back cache amount with Storage Spaces. However, VMM does not manage this.
  • Performing a Hyper-V Replica failover followed by a cluster migration causes the VMRefresher service to update the wrong virtual manager, putting the virtual machines into an inconsistent state.
  • VMM does not provide centralized management of World Wide Name (WWM) pools.
  • Failing over and migrating a replicated virtual machine on a cluster node might result in an unstable configuration

And there’s more.  Some have workarounds (see the original article).  Some do not, e.g. removal of P2V from VMM 2012 R2 or lack of support for G2 VMs.  In those cases:

  • Use 3rd party tools or DISK2VHD (no DISK2VHDX tool) for P2V
  • Continue to use G1 VMs if using VMM.  Remember that there is no conversion between G1 and G2

Still No News On Windows Server & System Center 2012 R2 Licensing

Microsoft since released the information and you can find my summary on the core editions here.

As you may know, customers with Windows 8 will be getting a free update to Windows 8.1 via the Windows Store.  That’s a nice deal.  There is no news on what will happen with Windows Server 2012 R2 or System Center 2012 R2 (WSSC 2012 R2).  Note that WS2012 news came out very late in the day.  I heard nothing was presented at WPC on the topic.  So we don’t know:

  • Will WS2012 R2 be a new server OS license, where upgrade rights are only granted to Software Assurance (SA) customers?
  • Will WS2012 CALs be good for WS2012 R2, as I am told W2008 CALs were for W2008 R2?
  • Will WS2012 customers get a free upgrade?
  • Anything about the licensing of WSSC 2012 R2.

Microsoft has to be careful here.  There has been a certain amount of customer/partner alienation over the past year or two.  If they said “no upgrade rights” then Microsoft would piss off a lot of customers.  It would also fragment the install base significantly.  This isn’t just an accounting issue; how many of you working on current OSs feel as comfortable with an OS that’s 2-3 generations old?  I know I lose touch very quickly.

On another point, their field staff are probably fretting right now.  Pretty much everyone in Microsoft field has a sales target of some kind.  How many customers have read about the wonders of WSSC 2012 R2, and are deciding to hold off on buying for 6 months because a purchase now might not give them upgrade rights.  While that doesn’t impact the EA customer (who has upgrade rights through SA), not all large enterprises have SA, and a huge number of businesses are too small to qualify for SA.  All sales are measured!

Microsoft could score big by providing free upgrade rights to all WS2012 customers.  It would be a PR coup and prospective customers could be talked into buying now (the important thing for MSFT), start deploying the core pieces, and upgrade where/when appropriate.

If I could make a recommendation I would say:

  • Allow customers free upgrades to Server/CAL within a major release boundary.  For example, any customer who buys within the 2012 generation (2012, 2012 R2, 2012 R3 if there is one) should get free upgrades within that generation.  If we are getting annual or near annual releases then it makes no financial difference to MSFT, and they’re still getting the same amount of revenue for the same amount of development (note that pre- and release costs have probably increased by a factor of 2 or 3).
  • Do the same for CALs.  A 2012 CAL should be good for 2012 R2 and 2012 R3 – likewise a 2012 R2 CAL should be good for 2012 R2 (if there will be one).
  • In summary, treat R2, R3, etc as if they were service packs, license-wise.  Nice and simple.
  • Keep it simple: Any effort to implement “Dungeons and Dragons” rules (like with Office 365) creates chaos, discontent, ill-will, and accidental software pirates (those who think they are legit but are not – only because of Microsoft-created complexity).

FYI, W2008 R2 customers without SA will have to buy new licenses/CALs, no matter what.  Those with current SA at the time of WSSC 2012 R2 release will be entitled to upgrades, just as they are now to WS2012 2012 and SysCtr 2012.

And before you ask, no, I don’t expect any changes with System Center licensing, and yes, System Center Essentials is a dead-end product.

EDIT:

I was told via Twitter by Ryan Boud (@HmmConfused) that an announcement was made on the Microsoft Virtualization Academy Jump Start regarding CALs.  Microsoft will follow the W2008 R2 precedent and enable WS2012 CALs to be used with WS2012 R2.

KB2870270 – A Hotfix Bundle For WS2012 Hyper-V Failover Clusters

Although not referred to as an update rollup, this latest hotfix is a bundle of fixes.  As before, don’t rush out to deploy it unless it looks like it’s going to fix a problem you are having.  Otherwise, wait a few weeks, test if you can, check the news, and then deploy it to prevent those problems.

This bundle is an update that improves cloud service provider resiliency in Windows Server 2012.  That title/description sounds like someone didn’t know how to describe it and fell back to marketing jargon.  Please don’t let the title confuse you – this bundle contains important fixes for all.  This new KB2870270 replaces the recent KB2848344 (Update that improves cloud service provider resiliency in Windows Server 2012).

The bundle contains:

  • KB2796995: Offloaded Data Transfers fail on a computer that is running Windows 8 or Windows Server 2012
  • KB2799728: Virtual machine enters a paused state or a CSV volume goes offline when you try to create a backup of the virtual machine on a Windows Server 2012-based failover cluster
  • KB2801054: VSS_E_SNAPSHOT_SET_IN_PROGRESS error when you try to back up a virtual machine in Windows Server 2012
  • KB2813630: Virtual machine enters a paused state or a CSV volume goes offline when you try to create a backup of the virtual machine on a Windows Server 2012-based failover cluster
  • KB2848727: "Move-SmbWitnessClient" PowerShell command fails in Windows Server 2012

KB2869923 – VM Crash Caused By Physical Disk Resource Move During WS2012 CSV Backup

An “interesting” week for Hyper-V/clustering hotfixes, and they didn’t stop.  Some more came out yesterday.  Test (if you can), wait a few weeks, and then deploy.  This one is for when a Physical Disk resource move during the backup of a Cluster Shared Volume (CSV) may cause resource outage.

Symptoms

Consider the following scenario:

  • You configure a Windows Server 2012-based Hyper-V failover cluster.
  • The VHD or VHDX files reside on a Cluster Shared Volume (CSV).
  • Backups of the CSV are performed using software snapshots.
  • Physical Disk resource for the CSV is moved to another node in the cluster.

In this scenario, the Physical Disk resource may fail to come online if the backup of the CSV is in progress. As a result, virtual machines that rely on the CSV may crash.

 

Cause

During a move of the Physical Disk resource, when the Physical Disk resource comes online on the new node it queries Volume Snapshot Service (VSS) to discover the software snapshots associated with that volume. If the move takes place while software snapshot is in progress, VSS may fail to respond or have a long delay to respond. Ultimately, this may cause the Physical Disk resource to either fail to come online or take a long time to come online on the new node. As a result, VMs that have VHD files on the CSV may crash.

A supported hotfix is available from Microsoft.

KB2855336 – July Update Rollup 2013 For WS2012 Has Been Updated With A Fix (KB2866029)

I just saw an update to KB2855336 (the update rollup) was updated with additional text:

If you have a Windows Server 2012-based Hyper-V cluster that uses NIC Teaming with virtual LAN (VLAN), and you install this update rollup before July 12, 2013, you may receive an "0x000000D1" Stop error when you perform a live migration of a virtual machine. For more information about this issue, click the following article number to view the article in the Microsoft Knowledge Base: KB2866029 "0x000000D1" Stop error when you perform a live migration of a virtual machine on a Windows Server 2012-based cluster.

Over on KB2866029 we can see:

Symptoms

Consider the following scenario:

  • You have a Windows Server 2012-based Hyper-V cluster that uses NIC Teaming with virtual LAN (VLAN). 
  • You perform a live migration of a virtual machine.
  • You perform a live migration of the virtual machine back to the original host.

In this scenario, the host computer crashes. Additionally, you receive the following Stop error message:

STOP: 0x000000D1 (parameter1, parameter2, parameter3, parameter4)

Notes

  • This issue occurs after you install update rollup KB2855336

     

  • This Stop error describes an IRQL_NOT_LESS_OR_EQUAL issue.

The fix is to install the reoffered update rollup KB2855336, available through Windows Update.

My advice:

  • If your hosts are broken now because you installed the July update rollup then reinstall the new update rollup.
  • If you have not deployed the July update rollup then deny it, give it a couple of weeks, and then plan an install if there are no more reported issues.

This is a mess.

Something Has Gone Very Wrong With Microsoft Patch Testing

I suspect I’m going to get some “unwanted attention” for this post but what I’m going to say has to be said publicly …

Something has gone wrong with the testing process for Microsoft hotfixes since the release of WS2012.  There has been a number of really bad releases in those 10 or months.  The latest is KB2855336, aka the July 2013 update rollup, which causes hosts to bug check as Hans Vredevoort and some of you reported.  There is also a thread on the Hyper-V TechNet forum.

People like me and Microsoft are trying to encourage people to:

  • Install security hotfixes with minimal delay
  • Embrace a process of updating their Hyper-V hosts with fixes for Hyper-V and Failover Clustering to prevent issues

This string of updates that break hosts (and this is exponentially worse than breaking an occasional physical server here or there) is embarrassing and dangerous.  The latest failure is in an update rollup that is issued via Windows Update.  This just feeds the argument that patching is ba-ad and shouldn’t be done … and creates a security mess for, not just for those companies but, everyone in the community.

I’d love to say I have a fix.  I’d love to say, hey use the automatic approval process in System Center Configuration Manager where we can:

  • Delay approval of updates for X days – letting others find the bugs and Microsoft issue a superseding update
  • Force the deployment

That will work for non-clustered hosts but:

  • Folks with clusters will want to use Cluster Aware Updating.  ConfigMgr does not have a plug-in for CAU integration.  Someone in MSFT will respond with VMM baselines.  Tell ‘em to go take a long walk off of a short pier; no one should have to do that amount of clicking every month.
  • Most businesses are SMEs and SMEs cannot afford System Center anymore.

So what’s left?  Manual approval and patching.  And as I’ve said before: that means patching just does not happen … at all.  I’m not being cynical; I’m being pragmatic and basing this on experience in the real world.

Let me tell you a story …

I used to work for a consulting company that specialised in Computer Associates software.  I was certified and consulted in CA Unicenter, their huge enterprise monitoring system.  I also dabbled in a few x-IT management products.  CA were shite when it came to product quality and patch management.  The process for installing a new product version was:

  • Install from the media
  • Test
  • Find the broken basic functionality
  • Log into Support and download lots of patches and install them for this new release
  • Find the broken functionality that had been patched/fixed 2 months before in the previous release
  • Open up support calls to get them to update the previous release’s fixes for the new release
  • Try cover your ass with the angry customer

I once had a CA tested over in the office to introduce us to a new beta version of Unicenter.  I asked about the huge number of patches that would appear within a week of release because basic features didn’t work.  He explained that CA couldn’t possibly test more than 75% of features before release.  That’s why I’ve flat-out refused to work with CA software since 2001.

Let’s get back on track here.  The problem with KB2855336 is that it breaks hosts that:

  • Connect a virtual switch to a NIC team
  • VMs are on different VLANs

Hmm, seems like one of the most basic configurations for Hyper-V if you ask me.  How the hell was this not tested?

This litany of mistakes cannot continue.  We (the community) cannot continue to recommend fixes if they break stuff in basic or default configurations.  Microsoft, you want to be a cloud company; learn from how hosting companies have been very public with explanations and apologies.  This actually reassures the customers of those hosting companies – I once worked for a company that blacked out 1/3 of the hosted Irish internet for over a day, and that openness saved the day.  Needless to say, I was amazed.  Something must change, Microsoft, and you must be very public with the apology and the explanation of the process changes – and don’t just hide this in a forum response.

In the meantime, Microsoft should:

  • Remove this update rollup from the catalog to prevent further failures Hans reported just now (09:43 GMT) that this was done.
  • Instruct employees to modify blog posts and retract recommendations to deploy this update rollup

I hope any now-angry persons in Microsoft understand that I am writing this in support of Microsoft.  A friend is honest with criticism and wants change for improvement.  I’m not writing this to score points.  I’m writing this because I care.

EDIT:

A fix was released (allegedly) in an updated version of the July 2013 update rollup.

EDIT (27/7/2013):

It looks like UR3 for DPM 2012 SP1 joins the ranks of bad updates.  Almost immediately people reported that they could not upgrade their agents after upgrading DPM servers.  The update was withdrawn several days later, as noted in these comments.