Event Notes | Aidan Finn, IT Pro

Service Manager 2012 “Service Ticketing”

Import Management Packs

Service Manager CMDB can become aware of your environment from OpsMgr if:
You import MP in OPsMgr
AND import MP in Service Manager
ConfigMgr data is pulled in, including primary devices for users
AD
Orchestrator runbooks are also importable: LOB and 3rd party management tools

Other options:

Import files
Write/buy 3rd party connectors

Some sets of data can come from multiple sources. All that’s mapped into one object in the CMDB.

Self Service Portal Features

Service Catalog, Silverlight web part hosted in SharePoint:

Role based access
Users fill forms to create service requests
Dynamic forms

Help Articles and more

Supported Configurations:

SharePoint site and WCS (web content server) co-located with SM management server
SharePoint site and/or WCS remote from SM management server

Can use SharePoint Foundation 2010 or Enterprise. Can reuse existing SP farms.

Demo

A user wants access to an app and fills out a form requesting it and gives a business case. A ticket is created, and awaits an approval/rejection. The helpdesk admin can see the ticket with available actions in the portal. Click approve and the automated activity does the work, in this case adding the requestor to a security group in AD.

He browses the now accessible web app. But it crashes. So now he opens an incident ticket.

SLA Capabilities

Features calendars, business hours, holidays. SLA metrics in the box.
Service level objects are supported for all work items. Specify target and warning thresholds.
Notifications when you are about to or have breached SLAs.

Demo

He opens the previous incident. We can see there is an SLO (service level objective) in the form of time left until SLA is breached. This is defined in Administration, Service Level Management, Service Level Objectives.

Technorati Tags: Event Notes,Events,Private Cloud,System Center,Service Manager

Visio Management Pack Designer (VMPD)

Speakers: Brian Wren and Baelson Duque, MSFT.

This is a new way to author management packs for System Center 2012 Operations Manager.

Challenges

Creating MPs takes too long
Difficult to maintain best practices
Difficult to create a model to manage an app

The old R2 Authoring Console was a dog IMO.

Features

Create custom monitoring with minimal effort
Solution for offline management pack creation
Visual design tool

What the VMPD is Not For

Editing existing management packs
Deeply advance customer scenarios

VMPD Shape Types

MP Modelling: Represent components of your app
MP Rollup: Connect components and monitors
MP Monitoring: Monitors and rules

Patterns:

MP modelling a single server patterns: application components with a single type of server
MP modelling distributed patterns: Multiple types of server

Demo

Prereq: It requires Visio 2012 Premium edition.

You start off with a blank diagram with a management pack shape. A shape data sheet gives you properties of the shape – visible when you click on the shape. Here we can specify what versions of Windows the MP will support. This is a discovery.

In MP modelling we have things like server component (e.g. SQL Server Reporting Services) shape. It’s data sheet allows us to do discovery using “how to find”: registry key, vale, Windows Server Role, and WMI query. The Affect Computer Health setting allows you to roll the health of this server component up to the computer, e.g. the server role is red therefore the computer is red. RunOn allows you to optionally schedule when the discovery runs.

Under a server role, you place a server component(s). You can use lines/arrows to dictate health roll up, e.g. “worst of this component”.

A Windows Performance Counter Monitor is added. You specify the object and counter as well as the instances of that counter. You can alert or you can alert and collect data. You can create a performance view for the console. You can optionally save your data to the data warehouse. And you can create a linked report! This is nice. Me want now. Can even set the monitor to only run on a schedule, e.g. why monitor LOB app performance during down hours. Can copy/paste the monitors to quickly expand the MP.

An event monitor is created for an event ID and source. You can set it to trigger after X occurrences in Y seconds.

You can use patterns to create a composite shape.. a set of shapes that you are frequently reusing. You can add your own ones via a stencil

You can then generate an MP and that does all the XML in the backgrouond for you.

Schedule

CTP very soon.

MMS Day 2 Keynote

I am live blogging from the keynote which is titled as something like “a world of connected devices”. I’m expecting Intune V3, ConfigMgr, etc to be the focus. Would be nice if they briefed us on how Windows RT (aka Windows on ARM) will be manageable (am thinking some Intune upgrade).

Work/life blur is a theme, so are application deliver, continuous service, people centric, control and governance. Out comes Brad Anderson.

IDC: the past was one desktop = one user. In 2011, users have between 5 and 7 Internet connected devices. They want to use the right device for the job … have a choice. MSFT want to say “yes, bring your device”. 916million smart connected devices shipped in 2011. That will double in 2016. 34% of corporations are currently enabling users to access corp apps. 69% of their users are already doing it!!!! Most corps aren’t aware of this usage. Where there’s a will, there’s a way.

New concepts: corporate controlled devices (traditional) and user controlled devices (BYOD – bring your own device). In recent past, that was all PC based. In the near future, we see this changing with lots of smart phones and tablets, all being bought and controlled by the user. Corp has no control over these with traditional methods. Ownership not that relevant … control is the important factor, e.g. end user having admin rights over their laptop.

The past has been agent focused control. That doesn’t work on iOS; no app can control another app because of sandboxing. The user will never accept an agent that controls their device. We want to enable the user to be productive on their devices, but we need to control how corporate assets are accessed (governance)

The spoiler is out. MSFT marketing has issued a press release with the content of the keynote.

Control and governance are two important concepts in enabling BYOD.

Infrastructure considerations

Intelligent app infrastructure
Security and access
Control and governance across all devices
User centric

Your opportunity

Broaden your impact – don’t be just another guy, another admin, another consultant
Enable users to work how/when and where they want. Good luck with the HR department and the old school managers. See Lync
Differentiate your organization, e.g. why do you rent office space? Are you a property company? Why can’t an office worker work from home and do the same job?

Celebrating ConfigMgr 2012 and Endpoint Protection 2012

All about the user
Unified device management infrastructure
Much simplified administration

175,000 registered downloads of the beta. 500,000 production devices. 307,000 Endpoint Protection deployments in TAP. 280,000 devices managed by MSIT.

Intelligent App Infrastructure

The user is at the centre. They have lots of devices. We have lots of apps for those lots of apps, with the user in the middle. VDI is being pushed here. They are announcing deep integration with with iOS and Android. Hmm, it’s been referred to as light management up to now. How are they getting over the app store locks on consumer devices? Is there a side load aka Jail break. Ah! They are integrating with Apple App Store, Microsoft Store, by linking apps. Is this an SP1 feature? They are going to side-load apps onto iOS, Windows, and Android without using the app store!!!!! THIS IS NEW. Users can roam across different devices and find their apps on those devices. They’ll have a consistent app experience. And this is done with a single solution – no point solutions for the device types.

Demo

ConfigMgr app deployment to Windows by Bill Anderson (System Center). He’s got 5 deployment types for Adobe Reader in his demo in ConfigMgr. He wants to build intelligence and predictability into this. We can simulate a deployment. Each deployment type has rules like prereqs, etc. The simulation is a real test against client devices – it evaluates the rules on the clients, not in the database. You get real results. We’re shown the results of this simulations. We see the success and, more importantly, the machines with it already installed and where there were failures. We can then use that data to clean up the actual deployment. This is a pre-flight test in the air without flying.

Deliver Applications To Employee Controlled Devices

This is possible with the new V3 version of Windows Intune. The non-domain joined devices, e.g. Windows RT, are managed via SSL.

Demo

Self service management of user controlled consumer devices by Bill Anderson. ConfigrMgr 2012 SP1 to add support for deploying Metro style apps. They can be built and delivered in house and delivered by ConfigMgr or via the Windows Store via a link. In the latter it uses a link instead of a distribution point. For the former, you can distribute that Metro Style app in the DP and deploy from there as you normally would. In the demo, he makes it available via the ConfigMgr app catalog, so a user can request it via the portal.

Now we go into Windows Intune. We see support for iOS. Android is supported too. We get the option to make an app available for install rather than push. Now Brad comes out with an iPhone. Demo gods kill the projector connection. Instead we get a Windows 8 device. There is a self-service app for ConfigMgr vNext and Intune. It’s an alternative to the MSFT Store. We can push out MSFT Store linked apps (jumps into a Store deployment). We can also side load an app for bespoke apps and bypass the MSFT store. I haven’t seen any of the competition do this on iOS, etc. At least I haven’t seen it, even if it exists. In this Center app, you can see your devices and their health status. We see the Windows Phone location on a Bing Map in the Center. They can’t get the iPhone on the projector. We get a similar experience on the iPhone via Intune apparently. These devices can’t join a domain but they are “domain trusted”.

VDI

Going to explode because of BYOD. App V5.0 is live. Now App-V apps can interoperate with each other for the first time. App-V packages can be streamed to a VDI without being committed to disk. Can have a single cache on a VDI host to save space.

UE-V is user state virtualisation, abstracting the user state from the machine. Their settings/data move around freely. The user gets a single working environment across VDI devices.

Windows Server 2012 Reduces VDI Costs

App-V 5.0 reduces cost by using less disk.

Demo

Fast and easy VDI. Bill is back. UE-V configured by GPO. He specifies a server share with a user variable. He specifies templates for app settings. In the user side of the policy, he can specify which parts of the state should roam.

MMS 2013 will be happening: Brad opens a MMS 2013 planning PDF file.

Brad logs into a machine and changes some Adobe Reader settings. He logs out of his domain joined machine. Bill is going to set up Windows Server 2012 VDI as part of the demo because it’s quick simple and easy. He times it and starts up Server Manager. He’s done in a minute, then the system does the rest of the work in the background. Brad logs into a VDI VM and his Adobe settings followed him thanks to UE-V.

A camera man comes up so we can get the iPhone demo working. There we see the Intune center which is an app. Bill browses available apps and installs one. And now it installs on the iPhone, and it appears like a normal app install.

MMS 2013

It will in New Orleans in June 2013. Hmm, what about TechEd NA.

Operations Manager 2012: Network Monitoring

Speaker: Vishnu Nath, PM for Network Monitoring feature in OpsMgr 2012.

Discovery, monitoring, visualisation and reporting. Key takeaway; OpsMgr will help IT Operations gain visibility into the network layer of service to reduce meantime to resolution. All the required MPs, dashboards, and reports are built in-box. Server to network dependency discovery with support for over 80 vendors and 2000+ devices certified. It supports SNMP V1, v2c and V3. There is support for IPv4 and IPv6 endpoints.

Supported devices:

Bridges
Firewalls
Load balancers
Switches
Routers

Discovery

Process of identifying network devices to be monitored. Designed to be simple, without the need to call in network admins.

Demo

You can run the normal discovery wizard to discover network devices. There is also a Discovery Rule that you can configure n Administration/Network Management. This can run on a regular schedule. You can pick a management or gateway server to run the rule, and you set the server resource pool for the monitoring. Note that the design guide prefers that you have a dedicated network monitoring resource pool (min 2 Mgmt servers) if doing this at scale.

There are two discovery types, which are like the types of customer MSFT has encountered. You list the IPs of devices and do explicit discovery. Alternately, you can do a recursive discovery which crawls the network via router ARP and IP tables. That’s useful if you don’t know the network architecture.

You’ll need runas accounts for he community strings … read only passwords to MIBS and SNMP tables in the network devices. It does not need read-write private strings. Using a runas account secures the password/community string. You can have a number of them for complex environments.

You can import a text file of device IP addresses for an explicit discovery. You can use ICMP and/or SNMP access mode to monitor the device. ICMP gives you ping up/down probe monitoring. SNMP gives you more depth. An ISP won’t give you SNMP access. A secure environment might not allow ICMP into a DMZ. You can set the SNMP version, and the runas account for each device. During discovery, OpsMgr will try each community string you’ve entered. It will remember which one works. In some environments, devices can send trap alerts if they have failed logins and that can create a storm of alerts … SO BEWARE. You can avoid this by selecting the right runas account per device.

There are retry attempts, ICMP timeout, SNMP timeout. You also can set a max device number discovery cap. This is to avoid discovering more than you need to in a corporate environment.

You can limit the discovery to Name, OID, or IP range. And you can exclude devices.

You can also do the discovery on a regular basis using a schedule. Not important in static environment. Maybe do it once a week in larger or more fluid environments. You can run the discovery rule manually. When you save the rule, you have the choice to run the rule right then.

What’s Discovered

Connectivity of devices and dependencies, servers to network and network to network
VLAN membership
HSRP for Cisco
Stitching of switch ports to server NICs
Key components of devices: ports/interfaces/processor/ and memory I think

The process:

Probing (if not supported, it’s popped in pending management for you to look at. If OpsMgr knows it then they have built in MIBS to deal with it) –> Processing –> Post Processing (what VLANs, what devices are connected, NIC stitching mapping).

Works only on Gateway/management server
Single rule per gateway/management server
Discovery runs on a scheduled basis or on demand
Limited discoveries can be triggered by device traps – enabled on some devices. Some devices detect a NIC swap, and the device traps, and OpsMgr knows that it needs to rediscover this device. Seamless and clever.

Port/Interface Monitoring

Up/down
Volumes of inbound/outbound traffic
% utilization
Discards, drops, Errors

Processor % utilization

Memory counters (Cisco) and free memory

Connection Health on both ends of the connection

VLAN health based on state of switches (rollup) in the VLAN

HSRP Group Health is a rollup as well

Network Monitoring

Supports resource pools for HA monitoring
Only certain ports monitored by default: ports connecting two network devices together or ports that the management server is connected to
User can override and monitor other ports if required

Visualisation

4 dashboards:

Network summary: This is the high level view, i.e. top 10 nodes list
Network node: Take any device and drill down into it.
Network interface: Drill into a specific interface to see traffic activity
Vicinity: neighbours view and connection health.

Reporting

5 reports:

Memory utilisation
CPU utilisation
Port traffic volume
Port error analysis
Port packet analysis

Demo

Behind the scenes they normalise data, e.g. memory free from vendor A and memory used from vendor B, so you have one consistent view. You can run a task to enable port monitoring for (by default) un-monitored discovered ports (see above).

End

You can author custom management packs with your own SNMP rules. They used 2 industry standard MIBS and it’s worked on 90-95% of devices that they’ve encountered so far. Means there’s a good chance it will work on future devices.

Technorati Tags: Event Notes,Events,System Center,Operations Manager,Networking

Why We Fail–Or How An Architect Learned To Stop Worrying And Love The Cloud

Alex Juch, Architect, NetApp

Everyone wants cloud. No one knows what cloud is.

Gartner: 78% of IT shops will deploy a private cloud computing strategy by 2014.
CIO.COM: “62% of all IT projects fail”

You will fail if you approach this project as a technology project. The architect needs to sell this as a business solution. Architecture is the intersection between technology and business.

Reduce your risk:

Business risk: people and process, managed portfolios, IT/business alignment.
Technical risk: use reference architecture, platform bundles.

The customer must want to do this, you cannot coax/tease them into it. Too much change in mindset and established process.

Abandon hope all ye who enter here. I gave up listening, VPNed into the lab, and continued building a lab for work, happily finding that my COnfigMgr clients were pushed out, updates were downloading, Endpoint was deployed and updated, and I build a few collections and deployed some AV policy.

Technorati Tags: Event Notes,Events,Private Cloud

MMS Keynote Day 1: Are You Ready For The Future, Now?

It opens with a movie trailer about the IT Pro, and up jumps Brad Anderson.

Continuous services and connected devices. For every 600 phones, 1 server is stood up to support them. It’s 100/1 for desktops.

This year, the number of virtual OS instances will be double the number of physical instances. The industry needs to get better and managing these rapidly deploying virtual instances. This is a shift beyond virtualisation to cloud computing.

Their cloud definition is:

Pooled resources
Self-service
Elastic
Usage-based

Similar to NIST definition. Cloud is not defined by location, e.g. there is public, private, and hybrid cloud. See chapter 1 of Microsoft Private Cloud Computing for more. If there is 1 tenant, it is private. If there >1 tenant then it is a public cloud …. not strictly true on NIST definition, but close.

Drivers of cloud:

Economy
Flexibility
Scalability

No substitute for experience. MSFT is the only company operating public and a private cloud services for their customers.

The 4 common techs are:

Identity
Virtualisation
Management
Development

Rest of session is focusing on Private Cloud = Windows Server and System Center. We get the announcement of GA for System Center …. 2 weeks after the actual GA. Simplification was a big focus, from licensing, to deployment, to administration.

100,000 servers were managed by the release candidate of System Center 2012.

Fast Track

Private Cloud configurations that are certified by MSFT, provided as out of the box solutions by the likes of HP.

Agile Resource Management

Vijay Tewari comes out to demo. vSphere 4.0 and XenServer are managed by VMM 2012. Multi-platform clouds. He goes through the process of doing a bare metal Hyper-V deployment on some HP DL servers via iLO. Funny video of Vijay going to Blue Man Group and swimming while his hosts build – automation takes care of the time consuming repetitive work.

Agile Service Level Delivery

Ryan O’Hara is on stage. We get some smooth does some demos with Service Manager reaching into the rest of System Center to deploy a service, and then OpsMgr detecting a breach of SLA so it can scale out the service automatically via VMM service template.

Back to Brad. System Center understands the environment thanks to partner extensions. Application monitoring gives deep insight into J2EE and .Net apps to avoid the admin VS dev finger pointing when there is a problem.

Ryan demos an app breaching SLA in OpsMgr. Then he goes into App Monitoring to diagnose where in the code the problem is.

Certification

The MCSE is back. Ugh! Private cloud certification.

Windows Server 2012

Here comes the announcement. Want to learn more?

Jeff Woolsey comes out. He’s the head PM for Hyper-V. This is a cloud platform release. Lots of stuff that I previously blogged. We see shared nothing live migration in VMM 2012 SP1. There’s a problem in the demo … the memory LM takes waaay too long for a 2 GB RAM VM. No one seems to notice.

Now we see network virtualisation where 2 VMs have the same IP on the same cloud, but are still routing.

App Controller

A new SP1 feature where you can integrate with any hoster that offers the service. You can integrate your cloud with their private cloud and deploy services in their public cloud.

The Microsoft Private Cloud

All about the app
Cross platform from the metal up
Foundation for the future
Cloud on your terms

Winners lead, don’t follow.

Top 10 Production Experiences With Service Manager and Orchestrator

Speaker: Nathan Lasnoski, MVP

Focus is on Service Manager and Orchestrator.

Yu can transform a business in a way that other technology projects cannot. These two products are transformative technologies. Leads to process definition, cut through sacred cows, improve efficiency, and enable users to do what they are really interested in.

Results:

Processes are more clear
Common tasks are automated
People do tasks that use their skills
Time and resource spend is transparent

1) How to get started

Include the right people. This is not just an IT project. Examples of people to include: service desk manager, system center tech lead, IT leadership …. need a champion in the business with some influence. Including the right people = success. Not just a tech project and not just an ITIL project.

2) Choose processes strategically

Look for the processes that have the biggest payoff. They are the quick and influential wins.

Incident management
Service request management
Change management
Risk and compliance

3) Plan to transform process

A great tool doesn’t make a bad process better. This is an opportunity to improve processes.

4) Plan requests first

Plan first, build later.

what are the questions you need to ask in the forms? What data do you need to automate a process? Organize the components and responsibilities.

5) Create a service catalogue

Everything IT does should end up in the service catalogue. Use it to service both IT and end user requests. Use service manager roles to constrain access.

SharePoint choices: Enterprise edition gives PerformancePoint, but Foundation doesn’t.

6) Don’t forget abut BI and reports

This might be the only view that the decision maker has of the system. Ask the business decision maker (BDM) what it is they want to know.

7) Size your environment correctly

This requires big iron. Minimum deployment is 4 servers. Service Manager management server, datwarehouse + SQL, web server, and Orchestrator. Can have additional management servers and web portals. Could cluster datawarehouse.

8) Have a development environment

Build and test in here. Check performance! Version control your management packs.

9) Don’t Forget Training

Get buy-in by including people early in the planning. Show ROI and why this system is good for them. Train on what is relevant to them in the system. ITIL/MOF important for implementers.

10) Use a phased approach

Don’t try to do the whole thing at once. Succeed end-to-end on each process. Something always comes up; plan for that. Check your can-do attitude – new requests can be done later. Watch out for “tangents”. Small chunks of measured and planned work are the key to success.

Build Windows 2011 Review

It’s Friday afternoon in Anaheim USA, I’m sitting in my hotel room (jetlagged), and it’s that time of the week when I have to look back on what’s happened. First thing’s first; how well did Build Windows go?

The Event

I thought the event was excellent. There was a little bit of drama by some people about the lack of an agenda before the event. We knew that “Build Windows” was going to be about Windows 8. And we also knew that Microsoft wanted to keep as much of the Windows 8 announcements for this week to maximise their impact on the media. I think that worked … kind of. I didn’t see the 6 o’clock news, where I’d expect to see a certain California based appliance company be mentioned on one of their launch days. But just about everyone I met of the 5,000 delegates seemed pretty excited.

I thought the venue was good, the event was well organised, food/drink was good, the crew managed to get 5,000 people fed without much in the way of queuing, and I can’t complain about getting a UK£999 device with Windows 8 developer preview preloaded on it.

The Sessions

The two keynotes were well thought out. Everyone I talked to thought that Steven Sinofsky did a good job. To be honest, I didn’t notice the time go by.

Most of the speakers knew what they were presenting on. Was the content level 400? No – but I wouldn’t expect that here; this was a place to kick things off. At ask the experts, two of the presenters gave me a good bit of time to answer my questions.

I had a small bit of concern that there wouldn’t be much in the way of content for an IT Pro like myself. As it turned out, sessions for IT pros were in the minority (as expected for this event) but there were more sessions than I could attend. I’ll be downloading some to watch on my slate PC on the way home.

What Stood Out?

I don’t believe that Microsoft mentioned that Windows Server 8 is optimised for the cloud. They should have because it is Winking smile

I have said over and over that the Hyper-V group listen to feedback like no other, and we got further proof of that this week:

NIC teaming by Windows is a reality
Snapshot merge is done while the VM is online
Hyper-V on the client, with support for wifi and host power settings

And let’s not forget the innovation:

Hyper-V replica
Hyper-v extended switch
Network virtualisation
Live migration without Failover Clustering
Virtual fibre channel HBA
All sorts of offloading
VHDX for up to 16 TB of virtual hard disk with metadata
A new VDI story on preventing the disk storm
Using file shares for VM storage
… and on, and on, and on.

If day 2 seemed to be the private cloud/Hyper-V day, then day 3 was the storage and failover clustering day. It is no secret that I hate Redirected IO and what it does to the backup and CSV design story in Hyper-V. That has been changed because we now have direct IO during CSV backup. That’s all I needed to hear to make me happy. But no, we found out that storage would never be the same again with a new feature called Storage Pools, in which we could create highly available and scalable Storage Spaces. Combined with 10 GbE, NIC teaming, offloading, and RDMA, and SMB 2.2, we get very fast storage on file shares!!! It’s simple, it’s cheap, it makes clustering possible for the small business, and it makes storage more flexible for the large enterprise. Believe it or not, but the thing I most want to try out now is to create one of these active/active clustered file shares on a Storage Spaces located CSV – that’s a mouthful Smile

What’s Next?

The developer preview release is an early pre-beta release aimed at the software developers and hardware manufacturers. It gives them a chance to start getting their products ready in time for RTM, if not earlier – it would be best to test on RC so a final product is ready on RTM day. But that isn’t stopping us IT pros from starting to learn.

We can expect Microsoft to start revealing more information. We IT pros actually learned very little of the new OS this week. We heard nothing of Active Directory, security and identity, “better together”, OS deployment, and so on. There isn’t a TechEd Europe this November/December so I guess most of the announcements will either be online, at some other event (that I don’t know of).

My money is on some kind of event/announcement in January/February 2012 where the complete feature set is detailed.

ARM devices were on display behind secure Perspex cases at Build. There is no public ARM build of Windows 8 so that will have to wait.

Until then, we finally have something to install, dig into, and learn about, and isn’t that what the Build event was all about in the end?

Technorati Tags: Windows 8,Event Notes

Building Continuously Available Filer Server NAS Appliances

Speakers are Gene Chellis and Cristian Teodorescu

A file server NAS could be a fine appliance for SQL Server or Hyper-V file storage. This is the last of the sessions in the storage track.

Why is NAS Relevant?

Customers like them according to sales figures. Sales rising steeply for last 2 year and into future, whereas file servers sales growing slowly now and in future (after 2 years of big drop)
Simple deployment (appliance)
Supports virtualisation and private cloud
Storage optimized hardware

Requirements of NAS

Support heterogeneous environments: Windows/Unix and File/Block
Support multiple workloads: client and server
Designed for end-to-end storage performance
Designed for continuous availability
Integrated software/hardware/packaging (appliance)
Simplified setup, configuration, and management (appliance experience)

New for Unified Storage on Windows Server 8

iSCSI target continuous availability
NFS v3 server continuous availability
NFS 4.1 server

End-to-End Storage Performance

Requirements vary by workload. Some OEMs have not considered that and sometimes have a bottleneck that prevents high end-end performance.

Long demo of a virtualised pre-packaged NAS/cluster appliance with lots of wizards to set it up.

Technorati Tags: Windows 8,Event Notes

Building Continuously Available Systems with Hyper-V

The speaker is Brian Dewey of Microsoft.

If you came to this post because it is about Hyper-V, then I really urge you to read the other “Building Continuously …” session notes that I have taken. They all build to this session (it was a track of sessions).

Continuously available:software and hardware are designed to support transparent failover without service or data loss.

Continuous Availability Improvements

Live Migration: Move a VM with zero downtime. Now we can LM VMs inside of clusters as well as between clusters.

Live Storage Migration: Move the VM storage with no downtime between hosts. First the VHDs and config files are copied from one location to another (while the VM is running). IO is mirrored to both locations – stays in that state while LM of the VM state happens. Once we’re in sync, the VM starts running on the destination location with just I/O running over there. If there’s a failure in the workflow, nothing is lost and the VM resumes on the source location. More flexibility for maintenance, host migrations, etc.

Guest clustering: now high end storage customers can use virtual fibre channel HBAs to create failover clusters using VMs. This allows a legacy service running in the VM to become highly available so a VMs OS can be maintained or fail with no service downtime.

Hyper-V Replica: Maintain a warm standby disaster recovery site with asynchronous replication. At high level, configure any running VM to replicate to a remote host. Perform an initial replication of all content. Once that’s done, Hyper-V tracks changes to the VM. The changes are shipped on a scheduled bases to the remote location to update it. This is optimised for high latency WAN and DR. Initial replication can be huge so it can be done out of band using USB drive. Loose coupling of source and destination: use certificates to replicate to a Hyper-V host in a different AD forest or company, e.g. hosting company. It’s “warm standby” because the administrator initiates the failover – might be one for PSH or System Center Orchestrator to bring up lots of VMs in specific order.

Consolidation Magnifies the Effect of Failure

Virtualisation puts more eggs in one basket: fewer servers and less storage systems.

How to Build the Right Solution with Hyper-V?

Continuously available Hyper-V systems require shared storage. W2008 R2 requires SAN. Windows 8 now adds Remote File Servers, Storage Spaces, and Clustered PCI RAID to the mix.

VHDX

Supports up to 16 TB, which all but eliminates need to use inflexible passthrough disk for scalability
Aligns to megabyte boundaries for large sector disk performance
Customers can embed meta data in VHDs – server applications likely to do this.
VHDX will be the default format going forward. Does not support anything earlier than Windows 8 developer preview release.

Offloads

ODX: offline data transfer where SAN does copy work directly instead of involving slower server. See previous notes on ODX token. Note that ODX makes creation of VHDX happen more quickly, so ODX is more than just data transfer.
Trim: freed up space in a disk can be returned to the storage system – thin provisioning.

Demo:

Creates a large VHDX and it is created in a few seconds. It is not dynamic. It is a fully allocated, zeroed out disk. ODX makes this possible.

Hyper-V and SMB

We now know that file share storage of VMs is now supported. You get Live Migration and planned/unplanned failover. Can cluster the file server for HA and scalability. Cross-cluster LM requires remote file shares, even if only transient. Requirements:

SMB 2.2
Remote VSS for host based backup

Storage Spaces

See previous notes. It provides thin provisioning and resiliency. Mirror and parity spaces deliver resilience to physical storage failures.

PCI RAID

Resiliency to node failure as LUN is switched to the failover node. Resiliency to disk failure through RAID.

Continuously Available Networking

NIC teaming is in the box for network path fault tolerance. NIC teaming works in the root and in the guest VM (2 NICs, connecting to 2 virtual switches, each on different pNICs).

Scalable Networking

Get concurrent live migrations with 10 GbE. Hyper-V can use RDMA in the parent partition for efficient file access. Hyper-V hosts can use network offloads. Hyper-V can utilise SR-IOV on capable NICs to optimize VM networking.

Note: SR-IOV bypasses the virtual switch, so any extensions or configurations you’d have on a virtual switch are no longer applicable.

Note: I’m sure Cisco’s extension offers a SR-IOV option.

Modern Server Hardware

Going from up to 64 logical processors to up to 160 LPs.
Physical NUMA topology projected into the guest. Big issue with more than a few vCPUs in a guest on multi-CPU hosts.
Fault containment: H/W memory errors confined to the affected virtual machine. This is a feature of some modern processor. If an error happens in pRAM that is only used by a VM, then only that VM needs to shut down.

Jose Barreto comes up to do a demo. Two hosts. 1 Ethernet and 1 Infiniband NIC. 1 of each switch type connecting to 2 file servers – 1 Ethernet and 1 Infiniband each on the front end. Each file server has 2 SAS HBAs meshed to 2 JBODs.

The Hyper-V hosts use \<cluster-name> to access VM files on the file share, not \<server-name>. The file servers are using storage pools. Instead of IQN or WWN, we grant permission to the file shares to the Hyper-V hosts’ computer accounts. The cluster has no cluster storage: all file shares. In the HA VM properties, you can see the VHDX is stored in \<cluster-name>VMFolder. That share is in a volume that is in a Storage Space. He’s pumping 2.6 Gbps of data throughput to the VHDX from within the VM. Using high speed NICs and RDMA with multiple connections.

Next up: a demo of a transparent failover of the file share on the clustered file servers. This is while huge throughput is happening. We get a drop in IO because it is being cached. The cluster witness tells the client to redirect after the failover so there is no timeout, cache purges, and IO continues as normal with no loss.