Some More Cross Platform Updates For OpsMgr 2007 R2

Microsoft released two more updates for Operations Manager 2007 R2 cross platform extensions.

First is the management pack to take advantage of cross platform Audit Collection Services (ACS).

Second is a new management pack for monitoring cross platform agents, i.e. UNIX and Linux.

Technorati Tags:

Audit Collection Services Adds Cross-Platform Support

I first heard about Audit Collection Services (ACS) at TechEd in 2004.  It was going to be a free download like WSUS.  The idea is that it would be an intelligent alternative to SYSLOG for Microsoft platforms/applications, gathering security logs into a central database.  Instead of gathering everything, it would gather the important alerts/events only.

Time went by and no beta appeared.  Then ACS appeared as a feature in System Center Operations Manager 2007.  OpsMgr 2007 evolved in OpsMgr 2007 R2 to add cross platform support, i.e. MS written native agents and management packs for Linux and UNIX.

Microsoft has now added an extension to this cross platform support to offer ACS to Linux and UNIX:

“System Center Operations Manager 2007 R2 Cross Platform Audit Collection Services enables the collection and audit of events from UNIX and Linux Servers. Using Cross Platform ACS, events are collected from the desired Unix/Linux servers and stored in the Audit Collections Services Database. Audit reports for UNIX/Linux Server collected events are included.

Feature Summary

Collection of Audit events from UNIX/Linux server, including:

  • AIX 5.3 (Power), 6.1 (Power)
  • HP-UX 11iv2 (IA64/PA-RISC), 11iv3 (IA64/PA-RISC)
  • Red Hat Enterprise Server 4 (x86/x64), 5 (x86/x64)
  • Solaris 8 (SPARC), 9 (SPARC), 10 (SPARC/x86)
  • SUSE Linux Enterprise Server 9 (x86), 10 (x86/x64), 11 (x86/x64)

Built in Audit Reports including:

  • Access violations – unsuccessful logon attempts
  • Account creation/deletion/password change
  • Administrator activity – su, sudo
  • Forensic – all events for a computer/event ID
  • User logons”
Technorati Tags:

VMM Reporting

System Center Virtual Machine Manager can be integrated into System Center Operations Manager.  Using the information gathered by OpsMgr agents on hosts, virtual and physical machines you can gather information that is relevant to VMM:

image You can see the reports above that are available when you are using Virtual Machine Manager 2008 R2.

Host utilization is a report you will run to see what the current resource usage is on that host.  Host utilization growth is similar.  What you will do with that report is specify two time frames.  The utilisation of the host in the second time frame will be compared with that of the first.

Virtual Machine Allocation is a summary report of the total resources used by virtual machines on your managed hosts.  Virtual Machine utilisation give you more detail.

imageYou can see in this screenshot the utilisation of resources by specific virtual machines.  Note that I have blacked out the VM names and the host names.  Some of the VM’s also do not have OpsMgr agents and therefore are not producing performance stats that can be used in this report.

Finally you have the report that’s going to be popular with most virtualisation implementations.  If I was doing a traditional internal deployment of Hyper-V I would first deploy OpsMgr 2007 R2 and it’s agents to gather Windows Server performance information.  Next I would deploy VMM 2008 R2.  I would let them stew on information for a week before sizing the hosts.  Then I would run the last of the reports: Virtualization Candidates. 

imageWith this report you specify a time frame and a set of criteria.  I’ve blacked out the names of the physical machines in this report.  You’ll use these performance criteria to dictate what is acceptable for a virtual machine candidate:

  • Number of processors (Hyper-V supports a max of 4 virtual CPU’s in a VM)
  • Processor speed
  • Maximum CPU usage
  • Average CPU usage
  • Total RAM
  • Average RAM usage

From this report you can ID your P2V candidates and then use VMM to convert those physical machines to virtual machines.

Technorati Tags: ,,

PRO Tips In Action

We have a VM where the load has been slowly growing over time.  It’s peak season is right around now and we started getting alerts from Operations Manager on Friday.  The contents of the alert were:

Alert Monitor:  PRO CPU Utilization

Alert Description
Source:  MTGWSVR001  CPU utilization in the virtual machine has reached critical levels. The threshold monitor for this virtual machine has detected that the average of %Processor Time has been exceeded.

Summary
This monitor tracks the average CPU utilization for the virtual machine. The average Processor Time has exceeded the threshold. (The default threshold is 90 percent.)

Causes
The virtual machine is consuming too many CPU resources for its configuration.

Resolutions
Update the virtual machine configuration to allocate additional virtual CPU resources. For information about configuring the CPU requirements for a virtual machine, see Virtual Machine Manager 2008 R2 Help”.

The monitor in question is the interesting bit.  We have Virtual Machine Manager (2008 or later) running and it is integrated with Operations Manager (2007 SP1 or later).  We have a Windows Server 2008 R2 Hyper-V cluster which is being managed by VMM.  PRO (Performance and Resource Optimization) tips is enabled on the master host group (the top level host group, containing child host groups).  This allows OpsMgr to feed virtualisation performance alerts to VMM and VMM will act on them.

When the VM started getting increased resource demands it needed to use more CPU.  Eventually it got to the point where the CPU was being maxed out.  The PRO tips monitor in question runs every 60 seconds.  It measures the CPU utilisation of the VM.  If 3 sequential samples are greater than 90% CPU utilisation the monitor will create an alert.  That alert will auto resolve when things quieten down – it is a monitor which is a state engine, i.e. aware of good and bad scenarios unlike a basic rule.

Because PRO tips was enabled VMM was able to move the VM from it’s current host to another host.  That move was done using Live Migration so there was no downtime associated with the move of the VM.  This means that other VM’s on the original host weren’t being deprived of resources.  Moving the VM to another, less utilised host, gave it more CPU resources that it could use.  Which host was best?  That was decided by VMM using Intelligent Placement, which I blogged about last week.

What I’ve just described was dynamic IT.  A problem was automatically detected and resolved using two System Center products working closely together.  I was alerted to the issue.  I didn’t need to do anything right there and then because the alert auto resolved immediately after the PRO tips live migrated the VM.  I talked to the customer of the VM and found out that this is peak season for them and CPU demands would be high.  We scheduled a maintenance window for early this morning.  The VM was power down, an extra virtual CPU was added and the VM was powered back up again.  Less than 5 minutes and now the VM has all the CPU it needs.

Looking Into Other Ways To Automate Maintenance Mode

I’m going to be looking at alternative ways to put computers and other monitored resources (e.g. Web and port monitors) into maintenance mode in Operations Manager 2007 R2 this week.  We pushed out patches this weekend.  We warned customers that they might get one or two nuisance alerts.  Sure, each of them just got a couple of alerts but we got a LOT because we get all of them.  I’ve tried a few batch script and task scheduler approaches and each of them has sucked.

I’m going to have to do this in PowerShell I think.  I’ll see how this week goes.  Any non-customer engineering is frozen until the new year.  I don’t want to make changes that may cause unwanted faults over the holidays.  That gives me some time to do some work; I hope!  Pre-sales is still busy and I’m even going out on-site with some hosting customers to do some work with them.

Post a comment to let me know how you get around scheduling maintenance mode in OpsMgr.

Technorati Tags:

That Unmerged Snapshot Did More Than I Expected

Last month I blogged about how a Hyper-V snapshot had caused some difficulties.  I hadn’t realised how much effect that unmerged snapshot had.

We run OpsMgr and user it not only for fault monitoring but also for performance monitoring.  I noticed that sometime after we upgraded to OpsMgr 2007 R2 2 of our agents stopped gathering performance stats.  I couldn’t see live performance information in the OpsMgr console nor in the reports (from before a certain date).  PerfMon on the servers worked perfectly.

I repaired the agents and then re-installed them by hand.  Reboots were done.  The agents still refused to gather performance statistics. This was probably back in August/September.

I opened a PSS call under our support program to get some help when I ran out of ideas.  The problem made no sense to the PSS engineers because fault monitoring was working fine.  The machines in question were healthy.  I gathered countless logs and did countless tests.  The call ended up getting escalated not just once, but twice.  A few weeks ago I did some SQL queries on behalf of a PSS engineer.  We could see that performance data stopped being stored in the OpsMgr reporting database some time after the upgrade.

Other agents were fine.  We started focusing on comparing working agents with the 2 non-working agents.  Everything checked out so now we started getting particularly paranoid about things like service packs and regional settings.  I really didn’t like that because we hadn’t had any problems with these machines until maybe a month after we upgraded to OpsMgr 2007 R2.

I was getting ready to give up yesterday afternoon.

I don’t know why I did it, but I went into the OpsMgr console to have a peek at some performance stats for another agent.  One of the non-working agents was still selected from previous tests a while ago.  Wait … I could see a graph for CPU utilisation.  The agent was working.  I checked more stats for disk and memory.  They worked.  I checked the other non-working agent.  It was working.  Huh! 

I fired up the reporting console and did reports on the non working machines for the last year.  I had a complete graph with no data gaps.  That’s strange.  I did a report on when I “knew” that data wasn’t being gathered.  I had complete graphs with correct looking numbers of data samples.

So it appears that data was being gathered but it wasn’t being processed correctly.  Even when I couldn’t see the data in reports, graphs or SQL queries, the data was there somewhere in a pre-processing stage, waiting to be added into the relevant tables.

OK, what had changed in the last month or so since I had tried one of these reports?  We had migrated from Windows Server 2008 Hyper-V to Windows Server 2008 R2 Hyper-V.  Could there be a change in the way that performance data was gathered in a VM?  Definitely not.  Had we any changes at the VM level?  That’s when I remembered the issue in that blog post.

When I moved the OpsMgr VM, Hyper-V had to merge a snapshot that we had deleted some time before hand.  It had been running with the AVHD (snapshot/checkpoint differential disk) for 4 months.  It started to affect performance of the VM so badly that TCP was having timeouts.  There were performance issues that were virtual storage related.  Could it be that this affected database operations of the VM?  Of course they would if they had reached the point of messing up TCP.

NOTE: I have only ever used snapshots in production in VM internals upgrade scenarios.  I usually delete the snapshot after success is checked and allow a merge to take place.  That means there should be no impact on performance as long as you do things in a timely manner.  Some how I must have forgotten to do that this time. 

So here’s what I suspect happened.  The OpsMgr agents actually worked perfectly.  The gathered the performance stats the entire time and sent them to OpsMgr.  I am guessing that OpsMgr caches the data for processing.  Due to the unmerged AVHD/snapshot performance issues, the data stopped being processed correctly and sat in that cache.  We know it didn’t make it to the point of being reportable because a direct SQL query showed a data gap.  The problem reared it’s ugly head around a month after the snapshot was taken.  The AVHD/snapshot was merged back in early November and that resolved the performance issue for this VM.  It also sorted out whatever hitch there was in performance processing for these agents.  The data that was cached somewhere got it’s way into the reporting database and live graphs suddenly appeared for the two machines in the OpsMgr console.  That’s the funny bit; it only affected these two agents.

MS PSS are still curious.  The engineer seems to accept the explanation I’ve given him but he’s still curious to dig around and confirm everything, maybe try to see if he can get details on what happened internally.  I’ve got to credit him for that; most support staff would just close the call and move on.

So once again:

Hyper-V Snapshots or VMM Checkpoints, i.e. AVHD differential disks should not be used in production.  They are a form of differential VHD that doesn’t perform well at all.  They really do affect performance and I’ve seen the proof of that.  In fact they affect functionality in the most unpredictable of ways due to their performance impact.  Use something like DPM instead for state captures via backup at the host level.  That’s an issue right now with the lack of CSV support in DPM.  If you really need that right now then have a look at 3rd party providers or wait until DPM 2010 is released (approx April 2010) until you do deploy CSV.

Technorati Tags: ,,

Cannot Delete Cluster Object From Operations Manager 2007

I recently decommissioned a Windows Server 2008 Hyper-V cluster.  It was monitored by OpsMgr 2007 R2.  When we shutdown the last cluster node I tried to remove both its agent object and the agentless managed cluster object from OpsMgr administration.  I couldn’t.  The cluster just refused to disappear.  The server agent would delete because there was a remaining dependency – the cluster object which relied on it as a proxy.

It had a red state (ruining my otherwise all green status view) and, more annoyingly, many of the migrated resources (VM’s) still seemed to be linked to the old cluster despite being moved to the new cluster.

I searched and found lots of similar queries.  The official line from MS is that there is no supported way to do this deletion.  There is a hack but the instructions didn’t work for me – I couldn’t find the key piece of info – plus it is unsupported.

So I uninstalled the agent manually.  No joy.  I waited.  No joy.  I rebuilt the server and added it to our Windows Server 2008 R2 Hyper-V cluster.  No joy.  I installed the OpsMgr agent and enabled the proxy setting.

That was yesterday.  This morning I logged in and the old cluster object is gone.  Vamoose!  I guess OpsMgr figured out that the server was now in a new cluster and everything was good.

Finished Our W2008 R2 Hyper-V Cluster Migration

Last night we finished migrating the last of the virtual machines from our Windows Server 2008 Hyper-V cluster to the new Windows Server 2008 R2 Hyper-V cluster.  As before, all the work was done using System Center Virtual Machine Manager (VMM) 2008 R2.  The remaining host has been rebuilt and is half way to being a new member of the R2 Hyper-V cluster.

I also learned something new today.  There’s no supported way to remove a cluster from OpsMgr 2007.  Yuk!

Lots Of Operations Manager Updates

Microsoft released lots of updates for Operations Manager over the last couple of weeks.  There are lots of updates to management packs, too many for me to go posting them at this time of night.  Have a look on the catalogue and you’ll see them.  Or check your console if you’re using OpsMgr 2007 R2.

Most importantly is KB971541, Update Rollup for Operations Manager 2007 Service Pack 1.

“The Update Rollup for Operations Manager 2007 Service Pack 1 (SP1) combines previous hotfix releases for SP1 with additional fixes and support of SP1 roles on Windows 7 and Windows Server 2008 R2. This update also provides database role and SQL Server Reporting Services upgrade support from SQL Server 2005 to SQL Server 2008.

The Update Rollup includes updates for the following Operations Manager Roles:

  • Root Management Server, Management Server, Gateway Server
  • Operations Console
  • Operations Management Web Console Server
  • Agent
  • Audit Collection Server (ACS Server)
  • Reporting Server

The following tools and updates are provided within this update which may be specific to a scenario:

  • Support Tools folder – Contains SRSUpgradeTool.exe and SRSUpgradeHelper.msi (Enables upgrade of a SQL Server 2005 Reporting Server used by Operations Manager Reporting to SQL Server 2008 Reporting Server)
  • Gateway folder – Contains a MSI transform and script to update MOMGateway.MSI for successful installation on Windows Server 2008 R2
  • ManagementPacks folder – Contains an updated Microsoft.SystemCenter.DataWarehouse.mp which requires manual import

For a list of fixes and tools addressed by this update rollup, see KB971541.

This update is supported for application on System Center Operations Manager 2007 Service Pack 1 only.

Feature Summary

The System Center Operations Manager 2007 SP1 Rollup 1 contains:

  • All binary hotfixes released since Service Pack 1 release
  • Support for Windows 7 and Windows Server 2008 R2
  • Operational and DataWarehouse database support on Windows Server 2008 R2
  • Additional stability hotfixes”

Requirements

  • Supported Operating Systems: Windows 7; Windows Server 2003; Windows Server 2008; Windows Server 2008 R2; Windows Vista; Windows XP
  • System Center Operations Manager 2007 Service Pack 1

Instructions

This update must be applied to each computer that meets the following criteria:

  • Hosts a Microsoft Operations Manager Root Management Server
  • Hosts a Microsoft Operations Manager Management Server
  • Hosts a Microsoft Operations Manager Operations Console
  • Hosts a Microsoft Operations Manager Web Console Server
  • Hosts a Microsoft Operations Manager Reporting Server
  • Hosts a Microsoft Operations Manager Manually installed Agent
  • Hosts a Microsoft Operations Manager ACS Server

Before applying this update it is strongly recommended that Operations Manager databases, Management Server, Report Server and Web Console roles be backed up.

To extract the files contained in this update and installation of the update on the Operations Manager roles above:

  1. Copy the file – SystemCenterOperationsManager2007-SP1-KB971541-X86-X64-IA64-locale.MSI – To either a local folder or accessible network shared folder.
  2. Run the file – SystemCenterOperationsManager2007-SP1-KB971541-X86-X64-IA64-locale.MSI – locally on each applicable computer that meets the predefined criteria.
    You can run SystemCenterOperationsManager2007-SP1-KB971541-X86-X64-IA64-locale.MSI from either Windows Explorer or from a command prompt.
  3. Select the appropriate role to update from the Operations Manager 2007 Software Update dialog.

NOTE: To run this file on Windows Server 2008 you must run this file from a command prompt which was executed with the Run as Administrator option. Failure to execute this Windows installer file under an elevated command prompt will not allow display of the System Center Operations Manager 2007 Software Update dialog to allow installation of the hotfix”.