OCS 2007 R2 Management Pack for Operations Manager 2007

Microsoft has released a new management pack for OpsMgr 2007 monitoring of Office Communications Server 2007 R2.  Here’s what they have to say about it:

“The Office Communications Server 2007 R2 Management Pack monitors Standard and Enterprise Edition of Office Communications Server 2007 R2. This release also incorporates the Quality of Experience (QoE) MP which was previously a separate Management Pack. Monitored types are event log entries, performance counters, as well as stateful monitoring of QoE. Note that this version of the Management Pack only monitors Office Communications Server 2007 R2, and cannot be used to monitor Office Communications Server 2007”

Issues With OpsMgr 2007 W2008 Cluster Management Pack

Over the last two weeks I said a few things:

  1. The management pack for Windows Server 2008 Failover Clustering was available.
  2. You really need to install the pre-requisite updates before installing the management packs.

MS’s System Center teams have been reminding people about the need to apply updates.  I guess PSS must’ve had a few phone calls on the issues the updates resolve.

There’s also an issue with the clustering management pack on clusters that have large numbers of resources, e.g. 60+.  That’s not your typical Exchange or SQL cluster … but it could easily be a Hyper-V cluster.  The issue is that a “Resource State Monitor” monitor is checking resource availability every 60 seconds and is clobbering the CPU.  You can override this by setting it to something larger.  I’m probably going with every 10 or 15 minutes.  My VMM integration will instantly let me know if a VM goes offline so that covers the disappearing resource issue.  MS is internally updating the management pack to fix this monitor.

People know I’m an OpsMgr junkie.  Honestly, I’m not impressed.  Wouldn’t the dog food tests in MS find this on their large Hyper-V farms?  Also, there’s no sign of a “Resource State Monitor”.  There is a “Resource Group State Monitor”.  Is this what they are referring to?  Who knows!  If anyone does then please share.

EDIT #1:

The mystery deepens!  I did some research and found this.  This would seem to say that the "Resource State Monitor" is in a "Cluster Resource" target in "Windows Cluster Management Monitoring Version 6.0.6277.1".  Huh!?  First, my management packs were brand new and downloaded from the catalog after the RTM announcement.  The version of that library is 6.0.6505.0 on my system.  No previous edition of the management pack were installed.  Next, the "Cluster Resource" target does not exist on my RMS.  I’ve double checked and all components were installed.  Something smells very fishy to me on this management pack release.

PLEASE Read The OpsMgr Management Pack Documentation

If you are downloading and running these Operations Manager 2007 management packs then it’s critical that you read the included documentation.  There’s not as much tweaking as in the MOM 2005 days.  But if you run Windows Server 2008 agents then you will have some patching to do, either before you deploy the agent or after you deploy it.  Some of these updates prevent memory leaks.  Some of them prevent false alerts.

If you have ConfigMgr then you can do this automatically.  Create collections that target the criteria, e.g. “All Windows Server 2008 servers with IIS7 and the OpsMgr Health Service” and target that collection with a package(s) containing the appropriate update.

I’ve read through the documentation for the clustering and IIS7 management packs that include W2008 support.  There’s a number of updates that I’ve had to download and will need to be deployed before/after agent installation.

Just Deployed Windows Server Cluster Management Pack for Operations Manager 2007

I just downloaded and deployed the management pack for clustering.  It covers Windows Server 2003 and Windows Server 2008.  It was pretty painless.  Our W2008 cluster is a clean configuration and tightly controlled.  The only pre-requisite for me was to enable proxying on the agent – and to make sure all the W2008 patch pre-requisites were installed (which they were).

You Know, I’m Not Perfect!

I wasn’t always an IT pro.  When I was in college I pretty much wanted to be on the development side of things.  I coded in Cobol, Pascal, C and C++ on Unix, VMS and Windows.  Very early on I learned an important lesson.  When were doing complex work we can often get so used to what we’re doing.  I would write a long piece of code, go to compile and get some unhelpful syntax or logic error that I just couldn’t find.  A great example was when I left college and started work.  Our new team leader wanted to test us so he gave us a network coding challenge.  I got done early so I decided to get fancy with my code.  I had this mad Boolean calculation to determine if I should spin off a new listener or not … I was trying to put an entire function worth of logic into the IF statement!  I got one tiny little thing wrong but didn’t notice.  The thing compiled, linked and I ran it.  Everyone’s session onto the Solaris box died.  It came back a few minutes later and I ran it again.  Boom!  10 minutes later a very angry sys admin came storming down wanting to know who was consuming every available TCP port on his server. 

I went through my code and couldn’t find what was wrong.  I called in some of the others and immediately the saw it.  Sometimes a fresh pair of eyes can identify the simplest of issues when you’re looking for the complex one.

I had an issue with a pair of new OpsMgr agents.  The machines were Windows 2008 x64 and are in a firewalled workgroup.  This means I need to use certificates.  I installed the certs and used MOMCERTIMPORT /SUBJECTNAME <CERTFILENAME> to get the cert loaded.  The cert was in the computer’s personal store and the CA cert was in the trusted root certificate authorities store.  The certification path was fine.  I checked that the cert was copied into the operations manager store by MOMCERTIMPORT and it was.  I restarted the agent and it couldn’t find a cert to load.  The agent does not appear in Pending Management in the Operations Manager 2007 console.  The following alerts appear in the agent computer’s Operations Manager event log:

"SOURCE: OpgMgrConnector
EVENT ID: 21022
No certificate was specified.  This Health Service will not be able to communicate with other health services unless those health services are in a domain that has a trust relationship with this domain.  If this health service needs to communicate with health services in untrusted domains, please configure a certificate."

This is quickly followed by:

"SOURCE: HealthService
EVENT ID: 7006
The Health Service has published the public key [1E 48 38 90 8A 46 11 B2 43 17 DC 64 0D 5A F4 A5 ] used to send it secure messages to management group CInfinity.   This message only indicates that the key is scheduled for delivery, not that delivery has been confirmed."

"SOURCE: OpsMgr Connector
EVENT ID: 21007
The OpsMgr Connector cannot create a mutually authenticated connection to cinwsvr003.cinfinity.ie because it is not in a trusted domain."

"SOURCE: OpsMgr Connector
EVENT ID: 21016
OpsMgr was unable to set up a communications channel to cinwsvr003.cinfinity.ie and there are no failover hosts.  Communication will resume when cinwsvr003.cinfinity.ie is both available and allows communication from this computer."

I did everything I could think of to fix this.  Eventually I gave up and called in Microsoft PSS.  One quick call with a very helpful engineer from India and we identified the issue.  I think he’d already figured it out before he called me because he zeroed in on the diagnostic immediately … it helps if you give them EVERYTHING you can think of when logging the call.  My ticket was probably two A4 pages long.

We fire up REGEDIT and went to HKLMSoftwareMicrosoftMicrosoft Operations Manager3.0Machine Settings.  There should be a key in there called ChannelCertificateSerialNumber and it should have the serial number of the agent certificate.  I didn’t have one there at all, hence the agent didn’t know to load the cert in the Operations Manager store.  I’d used UAC to raise my CMD prompt to administrator so it wasn’t a permissions thing.

We then looked at MOMCERTIMPORT.EXE.  Mine was around 44KB in size. The engineer knew what was wrong.  I keep the installation media for servers on the servers (for troubleshooting reasons) on my servers.  I opened the OpsMgr SP1 media and in Support ToolsAMD64 I could see that MOMCERTIMPORT.EXE was 51KB.  I had used the 32bit version of MOMCERTIMPORT on a 64bit installation.  It fails to create the registry key and does not log an error.

I copied the correct version onto the agent server, re-ran the command and restarted the OpsMgr Health service.  Almost immediately the agent appeared in Pending Management on the OpsMgr console and I approved it.  10 minutes later it was being monitored.

So the lesson is relearned.  If you are bashing your head against the wall with a problem, get a fresh “pair of eyes” to analyse the issue and you might get lucky and they’ll spot the simple cause straight away.  Oh … and make sure you only use the 64 bit version of MOMCERTIMPORT.EXE on 64 bit Windows 🙂