Update #2: Brightstore 11.5 Disk Staging Option

Last night’s backup time was cut once again by running 15 concurrent streams from agent to disk.  The differential backup took only 2.7 hours instead of the 3.5 the night before (10 streams).  Disk to tape took 1.4 hours instead of 1 hour but that’s not a problem because it’s an "offline" process.  It’s tough to really compare 2 sequential differentials because the second will always have more data, especially 24 hours later during the week.  I compared the 2.7+1.4 hours against a Tuesday differential from 2 weeks ago.  The previous backup took 13 hours!  This Disk Staging Option really works.

There was a downside to going with 15 streams.  We appeared to have some network congestion.  Two agents had communication timeouts.  I’ve reduced the number of stream to 8.  This will give us a good opportunity for comparison and analysis.

Windows Sysinternals

Anyone who’s done any decent amount of Windows administration or consulting will be familiar with the name Mark Russinovich and his old company, Sysinternals.  A while back, they were bought out by Microsoft and Mark became one of Microsoft’s "fellows", a big brain repsonsible for coming up with new ideas and research.

The Sysinternals site has now been migrated into Microsoft.  To coincide with this, a new tool called Process Monitor has been released.  How does one describe this new tool … it’s a completely new tool that is like Filemon and Regmon togehter and on steroids.  Yeap … that does it.  This utility is a very useful addition to your toolkit and you’ll find it very useful to track what a program is doing, what it’s touching or even to find out what is touching a file or registry value.

Here’s the blurb from the MS site:

Process Monitor’s user interface and options are similar to those of Filemon and Regmon, but it was written from the ground up and includes numerous significant enhancements, such as:

  • Monitoring of process and thread startup and exit, including exit status codes
  • Monitoring of image (DLL and kernel-mode device driver) loads
  • More data captured for operation input and output parameters 
  • Non-destructive filters allow you to set filters without losing data
  • Capture of thread stacks for each operation make it possible in many cases to identify the root cause of an operation
  • Reliable capture of process details, including image path, command line, user and session ID
  • Configurable and moveable columns for any event property
  • Filters can be set for any data field, including fields not configured as columns
  • Advanced logging architecture scales to tens of millions of captured events and gigabytes of log data
  • Process tree tool shows relationship of all processes referenced in a trace
  • Native log format preserves all data for loading in a different Process Monitor instance
  • Process tooltip for easy viewing of process image information
  • Detail tooltip allows convenient access to formatted data that doesn’t fit in the column

The best way to become familiar with Process Monitor’s features is to read through the help file and then visit each of its menu items and options on a live system.

The "installation" process will be familiar to anyone familiar with Sysinternals tools.  All you have to do is download the tool and run it.

Windows Vista RTM

It’s been much talked about.  It’s late.  It’s here.  Windows Vista has just gone RTM.  Jim Allchin made one simple blog entry to announce the fact on the Vista Blog.  Another entry goes into a little more detail

I’m hoping to find out when TechNet and Volume Customers will be able to download the DVD ISO image.  It will be sometime later this month.  Retail outlets and sales channels will be able to start shipping on January 30th.

You can find out what all the fuss is about on the Microsoft website.

SQL 2005 Service Pack 2 CTP

A customer technology preview (CTP) release of SP2 for SQL 2005 has been released for general consumption.  The list of changes is way too long to list here but you can find them on the Microsoft website.  Key additions include:
 
  • Support for Office 2007 business intelligence functionality.
  • Integration of SQL Reporting Services with Windows Sharepoint Services 3.0 and Office Sharepoint Portal Server 2007.
  • The ability to build reports from Oracle 9.2.0.3 servers.

The CTP release is available for download.  As usual, this sort of release should not be installed on production or valued systems.

Web Server Downtime

Apologies to anyone who tried to download any of my documents from my site today.  My web server (a VM) failed to autostart after it’s host rebooted according to a schedule last night.  That was quickly sorted out when I got home.  All of my documents are stored on this web server and any download links I have here redirect to it.  Everything should be back working now.

Update: Brightstor 11.5 Disk Staging Option

After the first backup (a differential) we found the backup server CPU (max 85%) and RAM (around 60%) handled having 10 simultaneous streams.  The 1GB NIC appeared to be 50% utilised.  What was interesting was that the MSA1500 being used for disk staging was hammered.  Current Disk Queue Length was spiking quite frequently at 27 … well above the recommended maximum of 2!  Seeing as the system isn’t interactive we’re not too worried but the disk (RAID 5 … slow, I know) or controller (shared with other systems that are heavily used at night) could be our bottlenecks.  This new bottleneck was not considered in our estimations.  I never though disk would be a bottleneck before the NIC.  A comparison against a similar job saw agent-tape being 8.5 hours (total 8.5), agent-disk being 3.5 hours and disk to tape being 1 hour (total 4.5 hours).  That’s nearly a 50% reduction on our impact on production systems for backup.  We want to find our optimum setting for concurrent streams.  The only true test is to suck it and see.  Tonight, we will run 15 streams at once.  We’ll then compare against a previous Tuesday differential job from another week and see what the time savings were.  After that, we’ll try 5 streams and see what the trends are.

I mentioned that we had to call in tapes for file recovery.  The file server is Windows 2003 and I would normally utilise Volume Shadow Copy and make use fo the Previous Versions Client for operational file recoveries.  However, the file server is struggling with disk space so this is not an option.

Microsoft VHD Test Drive Program

Microsoft has long since made their products available for evaluation.  But consider this.  You’re a busy administrator.  You’ve been asked by your boss to have a look at SQL 2005 or ISA 2006.  You might not know these products so installing and configuring them is going to be time consuming and difficult.  In the end, you will have to invest a lot of time to in configuring the software to fairly evaluate all of the features.  This is time wasted because if you like the product then you’re going to have to do this all over again when you buy it.
 
Yesterday (at VMWorld, I think) Microsoft announced the launch of their VHD Test Drive Program.This will accelerate any evaluation of a Microsoft product by leveraging virtualisation technology.  VHD is the disk format that Microsoft uses in their virtualisation products to provide a virtual hard disk.  Microsoft has released 4 pre-installed and configured virtual machines so that you can evaluate the installed products.  This list (Windows 2003 R2, SQL 2005, ISA 2006 and Exchange 2007) will be expanded to include other products.

Now the evaluation process gets a lot easier.  Download and install the free Virtual Server 2005 R2 and install it on a decent machine.  Download and install the appropriate evaluation VHD and install it.  Then fire up the virtual machine(s) and try out the pre-installed product.  The software will naturally have an expiration, e.g. ISA 2006 expires after 30 days.

These VHD’s might be handy for anyone looking to do some testing or self-paced training.

Office 2007 RTM

Office 2007 has been released to manufacturing.  Anyone with with a volume license agreement and software assurance will soon be entitled to and be able to download the product.  It’s expected the product will be available in retail outlets in January.
 
Exchange 2007 (now considered an Office server product) will be released to RTM soon.  Windows Vista is expected to RTM very soon.  There was a rumour it would be released yesterday.  Two serious last minute bugs had been found but are thought to be resolved.
 
Product launches are kicking off next month.  Larger Irish Microsoft customers are already receiving invites to a party that is being held in Dublin next month.

Brightstor 11.5 Disk Staging Option

I’ve been trying to sort out backups in my client’s site over the last while.  Part of the process has been to update Arcserve 2000 (7.0) server installations and agents up to Brightstor 11.5 SP2.  We’ve had serious problems with the time it can take to run a full backup that starts on Saturday afternoon.  For example, this last weekend’s backup finished after I arrived in this morning.  We’ve also got the horrible issue of having to call in tapes to recover anything for a user … which is all too common.

The client had previously ordered a HP MSA 1500 and one of the guys attached it to the main Brightstor server last week.  Today I installed the Disk Staging Option for Brightstor 11.5.  I started mucking around to get to know it.  There was zero documentation on the CA website to be found and the help file was less than helpful.

I quickly figured things out:

  • You can create devices which are pretty much just folders where backups are stored on disk.  I created one for full backups and one for differentials.
  • I created a disk device group for full backups and again, one for differentials.
  • I enabled staging on each of the disk device groups. 
  • I also made sure to increase the number of streams on the two disk device groups.  Having X streams means that a Brightstor server can backup X agents at once.  This is how to gain performance gains over tape backup.  We’d noticed that agent-disk was no faster than agent-tape and we found no bottlenecks other than maybe the agent or agent machine itself.
  • I enabled staging in both the full and differential backup jobs.
  • I configured each job to use 10 streams, i.e. 10 servers will backup simultaneously to disk.
  • I configured a policy in each of the jobs.  A copy (migration) of data from disk to tape would commence 1 minute after the backup.  Full backup data on disk will be automatically purged when it is 4 weeks old (the client runs a 4 week cycle plus monthly archives).  Differential data is purged when it is 1 week old.

I tested this on two servers and streaming really made a difference.  I could backup two servers in the same time it took to backup one.  Having the data stream to tape afterwards took no time.  In fact, disk-tape ran 10 times faster (LTO3) than agent-disk!

The first differential backup with disk staging will run tonight.  I’m also running a perfmon job to capture statistics on the Brightstor server to see if 10 streams is too much or if maybe we can increase that without affecting server performance or introducing server-agent timeouts.  We’re hoping that with 10 streams that we can get backup times to 20% of what they were.

What did I think of Disk Staging Option?  I am no fan of CA (I’ve way too much experience with their "enterprise management" products and a constant need to call support and patch systems).  I was pleasantly surprised to see how easy it was to configure.  It also appears to offer some serious performance gains.  However, it does appear to be very inefficient with disk.  Each volume (or agent, e.g. SQL) creates a new backup file on the Brightstor server’s disk.  This isn’t appended intelligently on the next backup.  Instead, another file is just created.  This does not compare well with Commvault’s usage of their "synthetic full backup" in Galaxy Backup.  But, it is simple to set up and it appears to offer some serious reductions in backup times.

VMware Backup Tested Under Fire

I previoulsy blogged a bit about how I am backing up my VMware network at home.  Today I came home to find my physical host was acting up.  The WiFi NIC had "disappeared" and the machine would not shut down cleanly.  Only a hard reset worked.  I made sure to pause my VM’s before hitting the button.  I reseated the WiFi card and that resolved that problem.

I powered up the host and started launching the VM’s after reattaching their storage.  Two of the VM’s fired right up but the web server complained about corruption.  Sure enough, it though the machine had no disks and only 32MB of RAM.  Good luck with running a web server with that!  I recovered the web server from a backup that automatically runs every Sunday morning.  I opened the machine in VMware Server and awoke it from it’s suspended state.  It was running perfectly.  All that was remaining was to resynchronise the updated content that I had added on Sunday night.