Windows Server 2012 Cluster-In-A-Box, RDMA, And More

Notes taken from TechEd NA 2012 session WSV310:

image

Volume Platform for Availability

Huge amount of requests/feedback from customers.  MSFT spent a year focusing on customer research (US, Germany, and Japan) with many customers of different sizes.  Came up with Continuous Availability with zero data loss transparent failover to succeed High Availability.

Targeted Scenarios

  • Business in a box Hyper-V appliance
  • Branch in a box Hyper-V appliance
  • Cloud/Datacenter high performance storage server

What’s Inside A Cluster In A Box?

It will be somewhat flexible.  MSFT giving guidance on the essential components so expect variations.  MSFT noticed people getting cluster networking wrong so this is hardwired in the box.  Expansion for additional JBOD trays will be included.  Office level power and acoustics will expand this solution into the SME/retail/etc.

image

Lots of partners can be announced and some cannot yet:

  • HP
  • Fujitsu
  • Intel
  • LSI
  • Xio
  • And more

More announcements to come in this “wave”.

Demo Equipment

They show some sample equipment from two Original Device Manufacturers (they design and sell into OEMs for rebranding).  One with SSD and Infiniband is shown.  A more modest one is shown too:

image

That bottom unit is a 3U cluster in a box with 2 servers and 24 SFF SAS drives.  It appears to have additional PCI expansion slots in a compute blade.  We see it in a demo later and it appears to have JBOD (mirrored Storage Spaces) and 3 cluster networks.

RDMA aka SMB Direct

Been around for quite a while but mostly restricted to the HPC space.  WS2012 will bring it into wider usage in data centres.  I wouldn’t expect to see RDMA outside of the data centre too much in the coming year or two.

RDMA enabled NICs also known as R-NICs.  RDMA offloads SMB CPU processing in large bandwidth transfers to dedicated functions in the NIC.  That minimises CPU utilisation for huge transfers.  Reduces the “cost per byte” of data transfer through the networking stack in a server by bypassing most layers of software and communicating directly with the hardware.  Requires R-NICs:

  • iWARP: TCP/IP based.  Works with any 10 GbE switch.  RDMA traffic routable.  Currently (WS2012 RC) limited to 10 Gbps per NIC port.
  • RoCE (RDMA over Converged Ethernet): Works with high-end 10/40 GbE switches.  Offers up to 40 Gbps per NIC port (WS2012 RC).  RDMA not routable via existing IP infrastructure.  Requires DCB switch with Priority Flow Control (PFC).
  • InfiniBand: Offers up to 54 Gbps per NIC port (WS2012 RC). Switches typically less expensive per port than 10 GbE.  Switches offer 10/40 GbE uplinks. Not Ethernet based.  Not routable currently.  Requires InfiniBand switches.  Requires a subnet manager on the switch or on the host.

RDMA can also be combined with SMB Multichannel for LBFO.

image

Applications (Hyper-V or SQL Server) do not need to change to use RDMA and make the decision to use SMB Direct at run time.

Partners & RDMA NICs

  • Mellanox ConectX-3 Dual Port Adapter with VPI InfiniBand
  • Intel 10 GbE iWARP Adapter For Server Clusters NE020
  • Chelsio T3 line of 10 GbE Adapters (iWARP), have 2 and 4 port solutions

We then see a live demo of 10 Gigabytes (not Gigabits) per second over Mellanox InfiniBand.  They pull 1 of the 2 cables and throughput drops to 6,000 Gigabytes per second.  Pop the cable back in and flow returns to normal.  CPU utilisation stays below 5%.

Configurations and Building Blocks

  • Start with single Cluster in a Box, and scale up with more JBODs and maybe add RDMA to add throughput and reduce CPU utilisation.
  • Scale horizontally by adding more storage clusters.  Live Migrate workloads, spread workloads between clusters (e.g. fault tolerant VMs are physically isolated for top-bottom fault tolerance).
  • DR is possible via Hyper-V Replica because it is storage independent.
  • Cluster-in-a-box could also be the Hyper-V cluster.

This is a flexible solution.  Manufacturers will offer new refined and varied options.  You might find a simple low cost SME solution and a more expensive high end solution for data centres.

Hyper-V Appliance

This is a cluster in a box that is both Scale-Out-File Server and Hyper-V cluster.  The previous 2 node Quanta solution is set up this way.  It’s a value solution using Storage Spaces on the 24 SFF SAS drives.  The space are mirrored for fault tolerance.  This is DAS for the 2 servers in the chassis.

What Does All This Mean?

SAN is no longer your only choice, whether you are SME or in the data centre space.  SMB Direct (RDMA) enables massive throughput.  Cluster-in-a-Box enables Hyper-V appliances and Scale-Out File Servers in ready made kits, that are continuously available and scalable (up and out).

More VMware Compete Wins For Hyper-V

VMware made a cute video to defend themselves against Windows Server 2012 Hyper-V.  But MSFT continues to hand out a GTA IV style baseball beat down at TechEd.

This post would have been impossible without the tweeted pictures by David Davis at http://www.vmwarevideos.com

General Feature Comparison

Does your business have an IT infrastructure so you can play, or to run applications?  What features have you got to improve those services?

Capability vSphere Free vSphere 5.0 Ent + WS2012 Hyper-V
Incremental backups No Yes Yes
Inbox VM replication No No Yes
NIC teaming Yes Yes Yes
Integrated High Availability No Yes Yes
Guest OS Application Monitoring N/A No Yes
Failover Prioritization N/A Yes Yes
Affinity & Anti-Affinity Rules N/A Yes Yes
Cluster-Aware Updating N/A Yes Yes

So Hyper-V has more application integrations.

Live Migration

Capability vSphere Free vSphere 5.0 Ent + WS2012 Hyper-V
VM Live Migration No Yes Yes
1 GB Simultaneous Live Migrations N/A 4 Unlimited
10 GB Simultaneous Live Migrations N/A 8 Unlimited
Live Storage Migration No Yes Yes
Shared Nothing Live Migration No No Yes
Network Virtualisation No Partner Yes

Shared-nothing Live Migration is actually a big deal.  We know that 33% of business don’t cluster their hosts, and another 33% have a mix of clustered and non-clustered hosts.  Share-Nothing Live Migration enables mobility across these platforms.  Flexibility is the #2 reason why people virtualise (see Network Virtualisation later on).

Clustering

Can you cluster hosts, and if so, how many?  How many VMs can you put on a host cluster?  Apps require uptime too, because VMs need to be patched, rebooted, and occasionally crash.

Capability vSphere Free vSphere 5.0 Ent + WS2012 Hyper-V
Nodes/Cluster N/A 32 64
VMs/Cluster N/A 3000 4000
Max Size iSCSI Guest Cluster N/A 0 64 Nodes
Max Size Fibre Channel Guest Cluster 2 Nodes 2 Nodes 64 Nodes
Max Size File Based Guest Cluster 0 0 64 Nodes
Guest Clustering with Live Migration Support N/A No Yes
Guest Clustering with Dynamic Memory Support No No Yes

Based on this data, WS2012 Hyper-V is the superior platform for scalability and fault tolerance.

Virtual Switches

In a cloud, the virtual switch plays a huge role.  How do they stack up against each other?

Capability vSphere Free vSphere 5.0 Ent + WS2012 Hyper-V
Extensible Switch No Replaceable Yes
Confirmed partner extensions No 2 4
PVLAN No Yes Yes
ARP/ND Spoofing Protection No vShield/Partner Yes
DHCP Snooping Protection No vShield/Partner Yes
Virtual Port ACLs No vShield/Partner Yes
Trunk Mode to VMs No No Yes
Port Monitoring Per Port Group Yes Yes
Port Mirroring Per Port Group Yes No

Another win for WS 2012 Hyper-V.  Note that vShield is an additional purchase on top of vSphere.  Hyper-V is the clear feature winner in cloud networking.

Network Optimisations

Capability vSphere Free vSphere 5.0 Ent + WS2012 Hyper-V
Dynamic Virtual Machine Queue (DVMQ) NetQueue netQueue Yes
IPsec Task Offload No No Yes
SR-IOV DirectPath I/O DirectPath I/O Yes
Storage Encryption (CSV vs VMFS) No No Yes
  • NetQueue supports a subset of the VMware HCL
  • Apparently DirectPath I/O VMs cannot vMotion (Live Migrate) without certain Cisco UCS (blade server centres) configurations
  • No physical security for VMFS SANs in the data center or cololated hosting

Hyper-V wins on the optimisation side of things for denser and higher throughput network loads.

VMware Fault Tolerance

FT feature: Run a hot standby VM on another host, taking over if another host should fail.

Required sacrifices:

  • 4 FT VMs per host with no memory overcommit: expensive because of low host density
  • 1 vCPU per FT VM: Surely VMs that require FT would require more than one logical processor (physical thread of execution)?
  • EPT/RVI (SLAT) disabled: No offloaded memory management.  This boosts VM performance by around 20% so I guess this FT VM doesn’t require performance.
  • Hot-plug disabled: no hot adding devices such as disks
  • No snapshots: not such a big deal for a production VM in my opinion
  • No VCB (VSS) backups: This is a big deal, because now you have to do a traditional “iron” backup of the VM, requiring custom backup policy, discarding the benefits of storage level backup for VMs

If cost reduction is the #1 reason for implementing virtualisation, then VMware FT seems like a complete oxymoron to me.  VMware FT is a chocolate kettle.  It sounds good, but don’t try boil water with it.

VMware Autodeploy

Centrally deploy a Hypervisor from a central console.

We have System Center 2012 Virtual Machine Manager for bare metal deployment.  Yes, it’s a bit more complex to setup.  B-u-t … with converged fabrics in WS2012, Hyper-V networking is actually getting much easier.

And even with System Center 2012 Datacenter, the MSFT solution is way cheaper than the vSphere alternative, and provides a complete cloud in the package, whereas vSphere is only the start of your vTaxation for disparate point solutions that contradict desires for a deeply integrated, automated, connected, self-service infrastructure.

More Stuff

I didn’t see anything on SRM versus Hyper-V Replica but I guess it was probably discussed.  SRM is allegedly $250-$400 per VM.  Hyper-V Replica is free and even baked into the free Hyper-V Server.  And Hyper-V Replica works with cloud vendors as well as internal sites.  Orchestration of failover can be done manually, by very simple PowerShell scripts, or with System Center 2012 Orchestrator (demonstrated in day 1 keynote).

I don’t know anything about vSphere support for Infiniband and RDMA, both supported by WS2012.  In fact, today it was reported that WS2012 RC Hyper-V benchmarked at 10.36 GigaBYTES/second (not Gbps) with 4.6% CPU overhead.

I also don’t know if VMware supports network abstraction, as in Hyper-V Network Virtualisation, essential for mobility between different networks and cloud consolidation/migration.

Take some time to review the new features in WS2012 Hyper-V.

TechEd North America 2012 Day 2 Keynote

Antoine Leblond, Corporate Vice President is speaking, and the topic is Windows 8.

Over 600,000,000 copies of Windows 7 have been sold.  The enterprise features of Windows 8 are based on, but evolved from Windows 7.  We have moved on from the desktop-centric world when Windows 7 was launched.  Over 75% of consumer machines being bought in USA this year are laptops.  Next year it is projected that tablets will outsell PCs.  More machines will run off of the battery than DC power.  Every microwatt of power saved extends the battery life of the machine.  Tablets = touch UI.  If projections are right, then touch becomes the primary UI.

Connectivity is ubiquitous.  We have moved from a world of local content to a world of multi-cloud stored data: flickr, facebook, Skydrive, Office365, and many others.

The hard split between how I use a machine at home and how I use a machine at work has been blurred or completely dissolved.  Users have reimagined how they use PCs, and Microsoft has reimagined Windows.

Demo Business Apps

We see a bunch of bespoke apps with live tiles.  Info is flashed up so user can see current status.  The dev has use semantic zoom … a conceptual zoom rather than a graphic zoom. 

A CRM app uses GPS sensor to find out where the sales person is, and then shows the location of customers in a map.  Clever.

Linda Averett

Demo on Samsung Ultrabook with mouse/keyboard and a “modern touchpad”.  The Windows 8 gestures are recognised by the touchpad.  Kind of Mac-like I guess, handy if you don’t have touch screen – or are one of those OCD people who hates fingerprints on their screen.

A NewEgg app is shown, with search, filter and contracts being shown off.

Antoine Leblond

Now we see a sales pipeline automation app that is a beta/test app by SAP.  Looks very sexy … and it’s by SAP!  What an oxymoron!  Using touch, the user can explore the data that is graphically presented, changing variables and seeing the results.  Don’t thing columns and rows of numbers.  It was all imagery that was designed for exploring and touch.

Linda Averett – Business Features

She has a Lenovo laptop, but it has a touch screen.  Windows 7 is running in a Hyper-V VM on Windows 8.  As you should know by reading here, Hyper-V is in Windows 8 Pro and Enterprise.  It seems to get biggest cheer of anything in the keynotes so far (audience has been very quiet these 2 days).  Cut The Rope is running in IE 9 in the Win7 VM. 

BitLocker (AES256 full disk encryption) is shown off – it and BitLocker-To-Go now are in Windows 8 Pro, not just in the Enterprise edition.  Great for customers – not great for those of us trying to sell Software Assurance Smile 

Then lots of dev stuff and then the end of the keynote.

Technorati Tags: ,

TechEd North America 2012 Day 1 Keynote

Satya Nadella opens things up, President, Server and Tools Business.  This will be a BIG event for his group: Windows Server 2012 and System Center 2012.

With each generation of computing (client/server onwards) there has been an increase in agility/flexibility and value.  That’s what this keynote is all about.  This will manifest in a new era, a new operating system.

Primary roles of the OS

  • Resource management: hardware abstraction of compute, storage, and network for a modern data centre.
  • Run continuously available applications.

Moore’s law still applies, even if it is different than expected: more cores rather than faster threads.  Storage needs to keep up with compute speed.  IOPS is the key: faster disk (SSD) and more spindles (cheap scalable disk).

99% of tier 1 SQL workloads can be virtualised with the new VM maximums of Hyper-V: 1 TB RAM and 64 vCPUs.

Hyper-V, Storage, and Networking

Jeff Woolsey, Principal PM of Hyper-V, comes on stage to demo.  I’ve met Jeff.  He’s an enthusiastic guy and it shows on stage.  He’s talking agility and flexibility in the cloud OS, Windows Server 2012.  That’s the number 2 reason people virtualise.  A demo VM with 64 logical processors and 100 GB RAM.

image

985,000 IOPS from a single VM.  This was a hardware limitation.

image

Now an ODX demo.  10 TB file is copied from one machine to another.  30-40% CPU utilisation and fully saturated Ethernet.  With ODX enabled, the same file copy.  1 GigaBYTE/second.  Takes 10 seconds to copy this file, with offloaded data transfer on an EMC SAN.  No network utilisation.

Hyper-V Extensible Switch, focusing on the extensibility, demoing the Cisco 1000v to deploy QoS profiles.

image

On to automation for reducing costs and increasing quality.  Over 2400 PowerShell cmdlets.  Demo will use POSH to enable Hyper-V Replica.  He’s using Orchestrator to power up the VMs in the other site, as I’ve been talking about at events here in Ireland.  All the dependencies can be modelled with VMs powering up in order.

image

Back to Satya

A video: “It’s in the box; it’s there”.  That’s the Windows Server way.  No $250-$400 per VM for SRM.  Hyper-V is built-in, free, and there for you to use.

Mark Russinovich

“We wanted it so easy to deploy virtual machines that even your boss could do it” – on Azure Virtual Machine portal.

System Center 2012 App Controller (likely SP1 or vNext) will be used to upload a VM from the private cloud to Azure.  A VM’s VHDs have been stored in a library.  The migrate action can upload the VM to Azure.  You can configure the instance during this wizard.  In this example, the VM has 2 VHDs, and both are migrated up.

image

Using a tool called CloudXplorer, he downloads the VHDs of a VM to a private cloud Hyper-V host.  A new VM can be built from these VHDs.

Aflac (insurance company with annoying sponsorship placements in American Football games) join Mark.  Full N-Tier app based on SQL cluster and SharePoint running.  Load balancing in place.  Domain controllers on Azure as well.  And System Center Operations Manager running as an Azure VM instance.

On to the dev stuff … I’ll skip that.

Technorati Tags:

Meet Windows Azure Event Notes

I am live blogging so refresh for updates.  I’m not interested in the coder stuff so I’m only recording what’s of interest to me as an IT Pro.

VMs

He creates a persistent Windows VM where you can install anything you want that runs on Windows.  Then he creates a Ubuntu VM from a Mac, choosing the distro from a library.  The web console looks quite attractive and simple.

He can RDP into the Windows VM and SSH into the Linux VM.  You can mix PaaS and IaaS on Azure to create a service.

You can integrate with existing systems in your own data centre or another service provider via the new VPN capability.  When you create a network … you specify your own address space and it doesn’t clash with other tenants’ address spaces.  THIS IS NETWORK VIRTUALISATION from WS2012.  Creating the VPN looks easy … specify your local VPN and it’ll produce a script for you to run o your local endpoint.  Nice!  Give the person who thought of that a nice bonus.

VM Portability: VMs are using VHD.  You can upload a VM from your data centre to Azure without export/import.  You can also download a VHD to local without export/import.  This means you don’t have lock-in.  You can move to local private cloud or to other service providers.  Big plus over PaaS.

The VM persistent storage is triple replicated.  There are always two backup copies that can auto startup/connect if you get a bad disk.  Replication to another data centre (e.g. Dublin to Amsterdam) is available for geo fault tolerance.

Websites Hosting 

You can build and deploy websites using things like FTP or TFS.  It’s a shared multi tenant environment that can scale out to dedicated instances.  A web site is quickly created.  A web site connections profile is saved, allowing easy connection of Visual Studio.  Publish the project and a new website is uploaded, using the same type of persistent Azure storage as VMs.  A republish just uploads the changed files.  There is real near time metrics of the site via monitoring.  You can customise this monitoring.  That was .NET.  Then he switches over to a Mac with a different run time platform, NodeJS or something.

Without writing code, he creates a site from templates: WordPress, etc are in there.  MySQL is supported on the backend.  Free MySQL instance with every Azure instance.  The template does all the setup/deployment work for you – you just have the final wizard to configure/secure it.

If the blog scales?  By default it’s in a multitenant instance.  You can fire up more processes in this instance.  You can also scale out to get reserved instances – basically dedicated VMs under the Azure hood.  Azure does all the load balancing stuff for you.  Nice way to transition from ultra basic to BIG.

I’ve just checked out the Web hosting plan.  Yes, you get 10 free web sites.  But that does not cover SQL Server space or network bandwidth – additional cost.  When I plugged in some numbers, my current 10 site hosting plan by a local company with excellent support is 1/3 cheaper.  I guess Azure will be good if you’re planning on scaling out your website.

And it went all dev after that.  That’s all folks.

MMS 2012 – What Happens In Vegas, Stays in Vegas … Unless It Itches

As you might have noticed by the glut of MMS 2012 blog posts, I’ve spent the last 7 days in Las Vegas at the Microsoft Management Summit 2012 conference.  It was a good week.  I mostly hung out with the small group of Irish delegates but it was good to meet many folks from around the world that I regularly communicate with, as well some of you readers. 

The content of the week was interesting.  The majority of it was level 100 or introductory show-and-tell.  For me and the role I do in technical sales, I valued the sessions that gave real world examples.  The best of those was the one on Thursday evening that was delivered by the Inframon guys, looking at real world examples of where they’ve deployed integrated System Center 2012 solutions with automated remediation.

Another interesting sessions was the one on the Visio Management Pack Designer (VMPD).  The MP authoring tool is dreadfully documented in my opinion and hard to get into, so a visual tool that’s easy to pick up and create custom MPs from is greatly appreciated.

The keynotes were interesting, as long as you hadn’t read the spoiler press releases by MSFT marketing.  MSFT marketing does something good from time to time, such as Tad, but most of the time they … well … you know that 200 people that were let go from MSFT marketing recently?  Maybe they let the wrong people go. 

Keynotes are usually aimed at people who don’t keep up with events, and those of us who do are usually bored silly.  But we all got something this week.  In day one we got the new name of Windows Server 2012 and a funny video with Vijay Tewari making the most of his free time thanks to automation.  In the day 2 keynote we got a real surprise.  The day before I was talking about the deep versus light management of mobile devices in ConfigMgr 2012 and joking how one was better off with a Windows Mobile 6.5 phone if deep management was their goal.  But damn, did they come through with the vNext (aka 2012 SP1) news.  Side loading of apps onto Android and iOS is a BIG deal.  And to be able to do that with both ConfigMgr 2012 SP1 and Intune vNext is very cool.  The demo was a little ropey thanks to a projector cable malfunction but the keynote team adapted and overcame the problem on the fly – well done!

You may not have read between the lines: Windows Phone 7.x cannot be side loaded with apps like Android and iOS because of its security model.  I was told this at the Intune booth in the Expo hall.

Overall, we had a blast this week.  But I am glad to be leaving the 90F temperatures, the perfumed air conditioning, and the constant ding-ding-ding of the slot machines behind.  Now if only I was allowed to bring this Heckler & Koch G36 home … Winking smile 

IMG_0508

MMS 2012: Deep Dive Into ConfigMgr 2012 DRS and SEDO

Speaker: Saud Al-Mishari, MSFT PFE – think he’s based in the UK

The session is on the new replication model: RCM, DRS, and SEDO.

Key Concepts

  • SQL replication in ConfigMgr 2012 is nothing do do with SQL Server Transaction Replication
  • Data Replication Service (DRS)

Terminology

  • Stored procedure: sproc
  • SSB: SQL Service Broker
  • Change Tracking: SQL Server Change Tracking

More:

  • RCM: Replication Configuration Management/Monitoring
  • Replication Pattern: a set of rules on what will replicate
  • Replication group: a set of tables that are monitored and replicated together
  • Replication Link: a replication connection between two SQL servers for a particular RG
  • Backlog: Unable to write data t the SQL Server DB after being received in the SSB Queue (usually SQL Server write performance)

New Replication Model

  • Global data is anything an admin creates and is replicated everywhere, e.g. collection rules
  • Site data is stuff like status, collection membership results, replicated up to parent site.

Client generates XML file and copies to management point.  MP copies MIF to the site server.  Site server process it.  DRS replicates the changed data to the parent  CAS contains the discovery data.

SQL Server Change Tracking

  • Change tracking allows application to keep a record of rows in a table that have been changed: insert/update/deete
  • Does not track changed data – obtained directly each sync
  • Added in SQL Server 2008 … not to be confused with Change Data Capture
  • Is enabled at the DB level and at the table level.

DO NOT ALTER THIS SETTING ON A SITE DATABASE

SQL Service Broker

Messaging service:

  • Asynchronous queue based service
  • Guaranteed delivery (not infrastructural guarantee – developer guarantee)
  • Allows messages to be grouped into a conversation … messages processed in order, allows for multiple threads to process queue

Elasticity:

  • Allows scalability

Replication Patterns

  • Global data flows in both directions.  CAS and primaries all have the same data, e.g. collections and package meta data.
  • Site data flows up.
  • Global-proxy is admin and control data for secondary sites.  A primary and secondary sites all have the same data.  Subset of global data that secondary sites needs.  Leverages SQL 2008 R2 Express at the secondary site with 10 GB limit.

Select * from vReplicationData to find all RGs and their sync schedules

ID is the key field in here.

Provider Access

SMS_ReplicationGroup is a new WMI class that supports replication.  1 instance per RG.  Status propert allow you to determine the sttus of the RG.

What’s in an RG?

Select * from vArticleData where ReplicationID = XX  …. using ID from above query

How big is the RG?

EXEC spDiagGetSpaceUsed

If a site goes down for a week or two, how much data must you send across?  Use the above query to figure out how much data must be replicated by the RG.

Demo

In the SQL Management Studio.  Select * from vReplicationData. Can see all the patterns for global, site and global-proxy.  SyncInterval is the number of minutes between replications.  DRS runs every 5 minutes .. no control over that. 

Select * from vArticleData where ReplictionID = 7.  Looks like Endpoint Protection data being replicated here.

Runs spDiagGetSpaceUsed .. takes a while.  Returns the size of the tables.  Replication Pattern shows the amount of data to replicate if you lose a site for the 3 patterns (global, site, global_proxy).

DRS Architecture

  • RCM handles replication link setup, maintenance and monitoring – command and control.  It’s a thread of SMSEXEC.
  • SSB is the transmission engine of replication
  • The Sender still lives and is used for bulk copy for initialization and re-init.
  • 5 day limit on DRS for outages – Due to the need to retain changes.  It retains 5 days of data.  Try to expand this for a 30 day outage and ConfigMgr needs to maintain 30 days of data.  It’s 5 days to handle a long weekend apparently – site breaks at start of holiday, come back 4 days later and fix it. 

Initialisation:

  1. BCP: to extract table data
  2. Sender: SMS EXEC sender thread
  3. SMB/CIFS: copy data to the destination

On-going replication

  1. SQL Server Change Tracking
  2. DRS sprocs and SQLCLR
  3. SQL Server Service Broker
  4. XML

Demo – Break replication

SQL DBA has a bad day and disables dbo.ConfMgrDRSQueue.  CMTrace is started from DVD.  Opens rcmctrl log on site server.  See that the queue not running causes and error.  We can see that ConfigMgr actually reached out into SQL and re-enabled the queue. 

In CMconsole , we have send demo.  The link is degraded in one direction but not the other under Database Replication.  Looks like TCP 1433 connectivity issue.

Site Initialisation

  1. Setup start
  2. Setup asks CAS for site number.  If you have more than 50,000 clients, then you need SQL Enterprise Edition to chunk up data in the DB and partition it.
  3. Setup finished and waits for replication to initialise.
  4. The replication configuration data is requested.  This group tells RCM as the primary how replication should be setup
  5. CSA receives request and BCPS out the data and sends it via sender back to the primary
  6. Primary now request remaining Global Replication Groups.  CAS creates the BC packages and send them back to the primary.  Primary then applies the new data from the CAS.
  7. Primary site receives BCP fles and inserts all the data from the CAS>  The primary can now switch to normal replication.

DRS Message Replication

  • Provider executes query that modifies table
  • SQL Server writes entries into change tracking table
  • On DRS sync: changes are packages up and inserted into SQL Server message queue sing a stored proc.
  • Message Broker transmits the message to the receiving site.
  • RCM monitors the queue launching activation stored procs to process
  • And more on receiving side to insert modifications on receiving side

WARNING: When A CAS Goes Offline

When the CAS goes offline for more than 5 days, don’t make changes on the Primary as a substitute as the CAS.  The CAS will re-initialise the primaries after more than 5 days outage, thus wiping the Primary’s changes.

DRS Troubleshooting

  • The Replication Link Analyser RLS should be yur first stop.  It’s predictable and can do some fairly complex remediation
  • RCM Log should be the follow up.  But this is just a summary of what has happend.
  • For transmissions layer errors, the SSB queue is sometimes the most immediate source for error messages (of this type)

Views for Detailed Info

  • The main logging view: vLogs.  They log into the DB.  Select top 1000 * from vLogs order by LogTime desc.  Limit that number.  DO not select everything.  Will hammer prod environment and compund the issue.
  • SMS_Replication_Configuration_Monitor registry key to configure logging

DRS Troubleshooting

  • Ensure that TCP 1433 exception is there for SQL Service and 4022 for SQL Broker.
  • SSB keys transmitted through setup – monitoring with Hman.
  • spDiagDRS will give you an overview of the state of DRS replication at the site.  SiteStatus (coded), Replication Group Initialization Status, DRSQueueStates, QueueLenghts (ideally 0 and 0 or you have a backlog), Replication Group Status deltails the last time messages sent

Demo: View Queues

Click on the queues in SQL under service broker under CM database.

Procedural troubleshooting of DRS DEMO

Turns of SQL Broker. Makes a change to Client Policy.

  1. Run spDiagDRS: EXEC spDiagDRS in SQL MS.  We see messages jammed in the outbound queue.
  2. SSB transmission_queue: 
  3. Service broker queues: We see connection failed errors.  Telnet to the port and we see it fails.
  4. vLogs: select * from vLogs ORDER BY LogTime DESC (beware * in real world … too much data)
  5. RCM_ReplicationLinkStatus

The Database Replication link in CM console will flip to degraded and then flip to fail after about 25 minutes.  Can run Replication Link Analyzer (RLA).  In the demo it shows that there’s a network connectivity issue.

Invoke-WmiMethod –namespace rootrootsmssite_CAS –path SMS_ReplicatinGroup –Name InitializeData = arguementlist “20”, “CAS”, “PR1” to reinitialize a RG.  RLA should do this for you if required.

SEDO – Why do we need a way of controlling changes?

  • As global data is replicated everywhere, a user on a primary site culd change an object at the same time as a user on the CAS or another primary.
  • This is an unavoidable consequence of multi-master replicated data model – ask AD.
  • SEDO is the solution to this.

What is SEDO?

  • SEDO = Serialized Editing of Data/Distributed Objects
  • Provides a way to enforece a single user editing of an object at any one time.
  • A lock request round trip can take less than 200ms from Primary to CAS to Primary
  • Default Timeout is 5 minutes.
  • Only SEDO enabled objects require users to get a lock
  • Supports explicit and implicit lock handling.
  • This is all transparent to admins.  Important for devs building extensions to CM.

MMS 2012: Automating Data Protection And Recovery With DPM and System Center 2012

Speakers: Orin Thomas and Mike Ressler

Replication is not the same as backup.  Lose it in site A = lose it in site B.  Backup is still required.  And backup provisioning in the private cloud is a challenge cos admins don’t know what’s being deployed.

DPM is a part of system center, a part of a holistic integrated solution.  Makes it perfect for provisioning in the private cloud.

How Will The Agent Get Deployed?

  • Make it part of image
  • GPO for an OU
  • Scripting or manually
  • Use Configuration Manager
  • And probably lots more options, e.g. a runbook fired off from Service Manager

Their solution is user goes to Service Manager, creates a request, and Orchestrator runs a runbook.  Their is a DPM Integration Pack.  It’s a confusing IP apparently. 

  1. Initialize Data: Add parameters – ServerName, DatabaseName, and Type (3 types of protection group in DPM such as gold, silver, and bronze for recovery points, retention, etc).
  2. Get Data Source (renamed as Get Protection Group): Data Source Location set as protection group and select Type
  3. Get Data Source (get server ID) – choose protection server and select ServerName
  4. Get Data Source (renamed as Get Data Source ID) – DPM, Get protection server name and filter to DatabaseName to protect a single DB, could have said type = SQL to protect all DBs.
  5. Protect Data Source: Protection Group = Get Protection Group
  6. Create Recovery – Something.

Yup, it’s confusing.  Go look at the videos when the guys tweet the link.

Keep the self-service simple.  If there’s more than a few questions, the user won’t do it and they’ll blame you when data isn’t protected and it’s lost.

There’s a bunch of Service Manager stuff after this.

MMS2012 – I’ve Deployed OpsMgr 2012 Application Performance Monitoring (APM); Now What?

Speaker: Pete Zerger and a Dude Who Was WIth Avicode

APM was Avicode, and allows .NET and J2EE application monitoring from the inside.  Help IT isolate the issue.  Provide the app team with the info they need to fix the app.

Teams you might have involved in app troubleshooting:

  • Operations: Runs the infrastructure n a day-day basis
  • Support and development: writes it and fixes bug
  • QA/Testing: tests it
  • DevOps: owns the production code

Processes

  • Troubleshooting
  • Daily/weekly app health analysis
  • Fixing top issues
  • Next application release scope
  • Improve monitoring configuration

Reports

Start with Top reports

Figure out how often to send reports, who to send them to, and what apps to include.

Problems distribution analysis is a good high level report of all apps.  Application status gives you a week-week report on app performance/health.  Run it weekly and send to an active/involved supervisor.  Application CPU utilization should be run weekly/monthly.

Make a note of http://dinnernow.codeplex.com/ for testing/demo.

Rules

Filter out noise, e.g. non-actionable alerts .. maybe fixed in next release, etc.  Use rules everyday.  Start with top level problems, create rules for exception events.

Using REGE Sensitive Data Filters

You can use expressions to find and mask sensitive data that you don’t want out in the wild, e.g. social security number, credit card number, etc.

 

There’s a lot more demo after this.  Best you watch the video when it’s made available in a few days.

MMS2012 – SC 2012 VMM: PowerShell Is Your Friend, And Here’s Why

Speakers: Hector Linares, Senior Program Manager and Susan Hill, Senior Technical Writer, MSFT

Went from 162 cmdlets in VMM 2008 R2 to 438 in VMM 2012.  They maintained backwards compatibility through aliases.  The cmdlets got renamed so they don’t conflict with the new Windows Server 2012 Hyper-V cmdlets.

POSH is the driving force for the UI.  Cmdlets are executed as jobs in VMM so there’s an audit trail.  Other partners, e.g. TFS or XenDesktop, integrates with VMM cmdlets for deployment.

Overview of VMM 2012 system

  • Infrastructure: HA VMM Server, PowerShell, Upgrade, Custom Properties
  • Fabric: Server lifecycle management, multiple hypervisors, network management, storage management, dynamic optimisation.
  • Clouds: An abstraction of fabrics.  Application ower usage, capacity and capability, delegation and quota.
  • Services: Service templates, application deployment, customer command execution, image-based servicing.

Cmdlet groups: 46 nouns

  • get-command –module VirtualMachinemanager –commandtype cmdlet
  • get-scvirtualmachine
  • Now you run read-SCvirtualmachine to do a refresh
  • Repair-scvirtualmachine wil do the repair action.
  • Stop-scvirtualmachine takes more parameters, e.g. stop (cold), save state, or clean shutdown
  • Register-sCVMHost to register a bare metal host.
  • Restart-SCVMHost to reboot a host.
  • Test-SCVMHostCluster to run a cluster validation.

Domain Join for VM

You can use –DomainJoinOrganizationalUnit “ou=, dc=” to set where a new VM joins in a domain.

-AutolongCredential to  set autologon account and –AutoLogonCount to say how many times that will run.

These must be set at the same time.  You can clean up with disableautologon.

UnattendSettings

Looks like we can use this to customise an unattend.xml for Specialize (3) and OOBE (6) passes.  Use Add)key,value) to add settings.

  • $unattend.add
  • $unattend.remove

Your settings will override settings in GuestOSProfile or VMTemplate.  You have to commit the settings with set-scvmtemplate (I think – quick slides) to use them.

Demo

In the demo, he wants to override a template.  He gets the template.  Now he creates a new temporary template.  He sets the OU for it to join to.  He creates runas account as the account he’ll use for building the VM.  He uses that for autologon.  He get’s the unattend object.  No he adds a bunch of overrides to the template using $unattend.add().  set-scvmtemplate – vmtemplate $template –UnanntedSettings $unattend) | Out-Null commits the overrides.  They create a $vmconfig using new-scmconfiguration –vmtemplate $template –Name ($vmNamePrefix + @_config@)) | fl Name. 

VMM still doesn’t have the ability to create differencing disks so you have to use WMI to do it instead.  Apparently this has been blogged. 

He sets the disk name and location.  This can be done on a per disk basis.  In this cmdlet he’s told it to use an existing VHD he just created using WMI. 

Virtual Machine Configuration

You can create a VM config so you can deploy very specific VM configs, different from the defaults.  $VHD to get-scvirtualharddisk from the library.  Then set$storageclass viariable with get-scstorageclassification.  Now $ComputeTier with get-sccomputertier.  Then $VMconfig with new-scvmconfiguration and the $computertier variable.  $vhdconfig and get-scvirtualharddiskconfiguration and $vmconfig.  setscvirtualharddiskconfiguration and $vhdconfig and $vhd and $storageclass. 

Now $virtualnetworkadatperconfig = get-scvirtualnetworkadapterconfiguration.  Setscvirtunetworkadapterconfiguration with $virtualnetadapterconfiguration.  And then more stuff.  Download the slide deck when it comes out in a few days.

Basically you build up a VM config and then you create a VM from that config.

There is a script on the net that will automatically sign the scripts in your VMM library.  It was written for 2008 R2.

We’re shown a demo where a script checks for expired (by date) VMs and stores them in the VMM library.

Hyper-V Data Exchange

Can read and set the KVPs in the VM.  Can read data from a VM without using the network via read.  Can pass in string values to a VM regardless of power state with Set.  A Key is a registry VALUE create to store DATA.  The value is the DATA.  And a KVPMAP is a hash table is one ore more VALUEs or DATA.

Cool demo where Hector writes to the registry of the VM in different power states (on, off, paused, save state).

VDI

Jobs submitted to VMM using –RunAsynchronously from one or more runspaces.  Hundreds of parallel jobs.  Typically used in the morning bootstorm in VDI.

VMM 2012 has a concept of threadpools.  By default it handles 25 threads per core in the VMM server with a max of 150 (requires a monster VMM server).  High number of context switches can slow performance of the VMM server.  The WCF timeout is configurable (default of 120 seconds).  Monitor the performance of jobs if you increase threadpools.

If you run asynchronously then query the job object for status.  For higher throughput, use multiple threads with multiple runspaces.

Make sure you tune the VMM refreshers in VDI, and also in very large static environments.  4000 VMs doing a light refresh every 2 minutes and a ful refresh every 30 minutes will hammer the VMM server.