Event Notes - Aidan Finn, IT Pro

Notes: Continuously Available File Server – Under The Hood

Here are my notes from TechEd NA session WSV410, by Claus Joergensen. A really good deep session – the sort I love to watch (very slowly, replaying bits over). It took me 2 hours to watch the first 50 or so minutes 🙂

For Server Applications

The Scale-Out File Server (SOFS) is not for direct sharing of user data. MSFT intend it for:

Hyper-V: store the VMs via SMB 3.0
SQL Server database and log files
IIS content and configuration files

Required a lot of work by MSFT: change old things, create new things.

Benefits of SOFS

Share management instead of LUNs and Zoning (software rather than hardware)
Flexibility: Dynamically reallocate server in the data centre without reconfiguring network/storage fabrics (SAN fabric, DAS cables, etc)
Leverage existing investments: you can reuse what you have
Lower CapEx and OpEx than traditional storage

Key Capabilities Unique to SOFS

Dynamic scale with active/active file servers
Fast failure recovery
Cluster Shared Volume cache
CHKDSK with zero downtime
Simpler management

Requirements

Client and server must be WS2012:

SMB 3.0
It is application workload, not user workload.

Setup

I’ve done this a few times. It’s easy enough:

Install the File Server and Failover Clustering features on all nodes in the new SOFS
Create the cluster
Create the CSV(s)
Create the File Server role – clustered role that has it’s own CAP (including associated computer object in AD) and IP address.
Create file shares in Failover Clustering Management. You can manage them in Server Manager.

Simple!

Personally speaking: I like the idea of having just 1 share per CSV. Keeps the logistics much simpler. Not a hard rule from MSFT AFAIK.

And here’s the PowerShell for it:

CSV

Fundamental and required. It’s a cluster file system that is active/active.
Supports most of the NTFS features.
Direct I/O support for file data access: whatever node you come in via, then Node 2 has direct access to the back end storage.
Caching of CSVFS file data (controlled by oplocks)
Leverages SMB 3.0 Direct and Multichannel for internode communication

Redirected IO:

Metadata operations – hence not for end user data direct access
For data operations whena file is being accessed simultaneously by multiple CSVFS instances.

CSV Caching

Windows Cache Manager integration: Buffered read/write I/O is cached the same way as NTFS
CSV Block Caching – read only cache using RAM from nodes. Turned on per CSV. Distributed cache guaranteed to be consistent across the cluster. Huge boost for polled VDI deployments – esp. during boot storm.

CHDKDSK

Seamless with CSV. Scanning is online and separated from repair. CSV repair is online.

Cluster checks once/minute to see if chkdsk spotfix is required
Cluster enumerates NTFS $corrupt (contains listing of fixes required) to identify affected files
Cluster pauses the affected CSVFS to pend I/O
Underlying NTFS is dismounted
CHKDSK spotfix is run against the affected files for a maximum of 15 seconds (usually much quicker) to ensure the application is not affected
The underlying NTFS volume is mounted and the CSV namespace is unpaused

The only time an application is affected is if it had a corrupted file.

If it could not complete the spotfix of all the $corrupt records in one go:

Cluster will wait 3 minutes before continuing
Enables a large set of corrupt files to be processed over time with no app downtime – assuming the apps’ files aren’t corrupted – where obviously the would have had downtime anyway

Distributed Network Name

A CAP (client access point) is created for an SOFS. It’s a DNS name for the SOFS on the network.
Security: creates and manages AD computer object for the SOFS. Registers credentials with LSA on each node

The actual nodes of the cluster nodes are used in SOFS for client access. All of them are registered with the CAP.

DNN & DNS:

DNN registers node UP for all notes. A virtual IP is not used for the SOFS (previous)
DNN updates DNS when: resource comes online and every 24 hours. A node added/removed to/from cluster. A cluster network is enabled/disabled as a client network. IP address changes of nodes. Use Dynamic DNS … a lot of manual work if you do static DNS.
DNS will round robin DNS lookups: The response is a list of sorted addresses for the SOFS CAP with IPv6 first and IPv4 done second. Each iteration rotates the addresses within the IPv6 and IPv4 blocks, but IPv6 is always before IPv4. Crude load balancing.
If a client looks up, gets the list of addresses. Client will try each address in turn until one responds.
A client will connect to just one cluster node per SOFS. Can connect to multiple cluster nodes if there are multiple SOFS roles on the cluster.

SOFS

Responsible for:

Online shares on each node
Listen to share creations, deletions and changes
Replicate changes to other nodes
Ensure consistency across all nodes for the SOFS

It can take the cluster a couple of seconds to converge changes across the cluster.

SOFS implemented using cluster clone resources:

All nodes run an SOFS clone
The clones are started and stopped by the SOFS leader – why am I picturing Homer Simpson in a hammock while Homer Simpson mows the lawn?!?!?
The SOFS leader runs on the node where the SOFS resources is actually online – this is just the orchestrator. All nodes run independently – moving or crash doesn’t affect the shares availability.

Admin can constrain what nodes the SOFS role is on – possible owners for the DNN and SOFS resource. Maybe you want to reserve other nodes for other roles – e.g. asymmetric Hyper-V cluster.

Client Redirection

SMB clients are distributed at connect time by DNS round robin. No dynamic redistribution.

SMB clients can be redirected manually to use a different cluster node:

Cluster Network Planning

Client Access: clients use the cluster nodes client access enable public networks

CSV traffic IO Redirection:

Metadata updates – infrequent
CSV is built using mirrored storage spaces
A host loses direct storage connectivity

Redirected IO:

Prefers cluster networks not enabled for client access
Leverages SMB Multichannel and SMB Direct
iSCSI Networks should automatically be disabled for cluster use – ensure this is so to reduce latency.

Performance and Scalability

SMB Transparent Failover

Zero downtime with small IO delay. Supports planned and unplanned failovers. Resilient for both file and directory operations. Requires WS2012 on client and server with SMB 3.0.

Client operation replay – If a failover occurs, the SMB client reissues those operations. Done with certain operations. Others like a delete are not replayed because they are not safe. The server maintains persistence of file handles. All write-throughs happen straight away – doesn’t effect Hyper-V.

The Resume Key Filter fences off file handles state after failover to prevent other clients grabbing files when the original clients expect to have access when they are failed over by the witness process. Protects against namespace inconsistency – file rename in flight. Basically deals with handles for activity that might be lost/replayed during failover.

Interesting: when a CSV comes online initially or after failover, the Resume Key Filter locks the volume for a few seconds (less than 3 seconds) for a database (state info store in system volume folder) to be loaded from a store. Namespace protection then blocks all rename and create operations for up to 60 seconds to allow for local file hands to be established. Create is blocked for up to 60 seconds as well to allow remote handles to be resumed. After all this (up to total of 60 seconds) all unclaimed handles are released. Typically, the entire process is around 3-4 seconds. The 60 seconds is a per volume configurable timeout.

Witness Protocol (do not confuse with Failover Cluster File Share Witness):

Faster client failover. Normal SMB time out could be 40-45 seconds (TCP-based). That’s a long timeout without IO. The cluster informs the client to redirect when the cluster detects a failure.
Witness does redirection at client end. For example – dynamic reallocation of load with SOFS.

Client SMB Witness Registration

Client SMB connects to share on Node A
Witness on client obtains list of cluster members from Witness on Node A
Witness client removes Node A as the witness and selects Node B as the witness
Witness registers with Node B for notification of events for the share that it connected to
The Node B Witness registers with the cluster for event notifications for the share

Notification:

Normal operation … client connects to Node A
Unplanned failure on Node A
Cluster informs Witness on Node B (thanks to registration) that there is a problem with the share
The Witness on Node B notifies the client Witness that Node A went offline (no SMB timeout)
Witness on client informs SMB client to redirect
SMB on client drops the connection to Node A and starts connecting to another node in the SOFS, e.g. Node B
Witness starts all over again to select a new Witness in the SOFS. Will keep trying every minute to get one in case Node A was the only possibility

Event Logs

All under Application and Services – Microsoft – Windows:

SMBClient
SMBServer
ResumeKeyFilter
SMBWitnessClient
SMBWitnessService

Technorati Tags: Event Notes,SQL,Storage,Virtualisation,Windows Server 2012

Notes: Microsoft Virtual Machine Converter Solution Accelerator

These are my notes from the TechEd NA recording of WCL321 with Mikael Nystrom.

Virtual Machine Converter (VMC)

VMC is a free-to-download Solution Accelerator that is currently in beta. Solution Accelerators are glue between 2 MSFT products to provide a combined solution. MAP, MDT are other examples. They are supported products by MSFT.

The purpose of the tool is to convert VMware VMs into Hyper-V VMs. It can be run as standalone or it can be integrated into System Center, e.g. Orchestrator Runbooks.

It offers a GUI and command line interface (CLI). Nice quick way for VMware customers to evaluate Hyper-V – convert a couple of known workloads and compare performance and scalability. It is a low risk solution; the original VM is left untouched.

It will uninstall the VMware tools and install the MSFT Integration components.

The solution also fixes drive geometries to sort out possible storage performance issues – basic conversion tools don’t do this.

VMware Support

It supports:

vSphere 4.1 and 5.0
vCenter 4.1 and 5.0
EXS/ESXi

Disk types from VMware supported include:

VMFS Flat and Sparse
Stream optimised
VMDK flat and sparse
Single/multi-extent

Microsoft Support

Beta supports Windows VMs:

Server 2003 SP2 x64/x86
7 x64/x86
Server 2008 R2 x64
Server 2008 x64 (RC)
Vista x86 (RC)

Correct; no Linux guests can be converted with this tool.

In the beta the Hyper-V support is:

Windows Server 2008 R2 SP1 Hyper-V
VHD Fixed and Dynamic

In the RC they are adding:

Windows Server 2012 and Windows 8 Hyper-V
VHDX (support to be added in RTM)

Types of Conversion

Hot migration: no downtime to the original VM. Not what VMC does. But check the original session recording to see how Mikael uses scripts and other MSFT tools to get one.
Warm: start with running VM. Create a second instance but with service interruption. This is what VMC does.
Cold: Start with offline VM and convert it.

VMC supports Warm and Cold. But there are ways to use other MSFT tools to do a Hot conversion.

Simplicity

MSFT deliberately made it simple and independent of other tools. This is a nice strategy. Many VMware folks want Hyper-V to fail. Learning something different/new = “complexity”, “Microsoft do it wrong” or “It doesn’t work”. Keeping it simple defends against this attitude from the stereotypical chronic denier.

Usage

Run it from a machine. Connect to ESXi or vCenter machine (username/password). Pick your VM(s). Define the destination host/location. Hit start and monitor.

The VM is snapshotted.
The VMware Tools are removed.
The VM is turned off.
The VMDK is transferred to the VMC machine
The VMDK is converted. You will need at least twice the size of the VMDK file … plus some space (VHD will be slightly larger). Remember that Fixed VHD is full size in advance.
The VHD is copied to the Hyper-V host.
The new Hyper-V VM is built using the VM configuration on the VMware host.
The drive is added to the VM configuration.
The VM is started.
The Hyper-V integration components are installed.

The conversion will create a Hyper-V VM without a NIC. Supposed to prevent split-brain conversion where source and target VM are both online at the same time. I’d rather have a tick box.

If a snapshot is being used … then you will want any services on that VM offline …. file shares, databases, etc. But offline doesn’t mean powering down the VM …. we need it online for the VMware tools removal.

The Wizard

A VM must has a FQDN to be converted. Install the VMware tools and that makes the VM convertible. This is required to make it possible to … uninstall the VMware tools Smile

It will ask for your credentials to log into the guest OS for the VMware tools uninstall.

Maybe convert the VM on an SSD to speed things up.

Technorati Tags: Event Notes,Hyper-V,VMware,Virtualisation

TechEd Europe 2012 Day 1 Keynote Notes #TEE12

Great that TechEd is back in Amsterdam. I wish I was there. Berlin is a nice city, but the Messe is a hole.

Brad Anderson

Mentions the Yammer acquisition, Windows Phone 8, and the new Surface tablets. He’s talking about change. Is it chaos or is it opportunity? Pitching the positive spin of innovation in change.

Think of storage, compute, and network as one entity, manage it as such. In other words: Windows Server 2012, System Center 2012, and Azure are integration into a single solution – you pick and choose the ingredients that you want in the meal.

Patrick Lownds has tweeted a great word: convergence. This is beyond hybrid cloud; this is converged clouds.

Design with the knowledge that failures happen. That’s how you get uptime and continuous availability of the service. Automation of process allows scalability.

Hyper-V: “no workload that you cannot virtualise and run on Hyper-V”. We’re allegedly going to see the largest every publicly demonstrated virtual machine.

Jeff Woolsey

The energetic principal PM for Windows Server virtualisation. “Extend to the cloud on your terms”. Targeted workloads that were not virtualisable. Dozens of cores. Hundreds of MB RAM. Massive IOPS requirements. This demo (40 SSDs) is same as 10 full sized fully populated racks of traditional SAN disk. MSFT using SSD in this demo. VMware: up to 300,000 IOPS. Hyper-V now beats what it did in TechEd USA: Over 1,000,000 (1 million) IOPS from a Hyper-V VM.

Iometer

Now we see the Cisco Nexus 1000v Hyper-V Switch extension (not a switch replacement like in VMware). Shows off easy QoS policy deployment.

PowerShell: Over 2400 cmdlets in WS2012. Now we’re going to see Hyper-V Replica management via System Center 2012 Orchestrator. A Site Migration runbook. It verifies source/destination, and then it brings up the VMs in the target location in the order defined by the runbook. And we see lots of VMs power up.

Once again, we see System Center 2012 App Controller integrating with a “hosting company” and enabling additional VM hosting capacity beyond the private cloud.

I”m wrapping up here … looks like the keynote is mostly the same as the USA one (fine for 99% of the audience who aren’t hooked to their Twitter/RSS like myself) and I have to head to work.

This keynote recording will be available on Channel 9, and the USA one is already there. Enjoy!

Technorati Tags: Event Notes

Windows Server 2012 NIC Teaming and Multichannel

Notes from TechEd NA 2012 WSV314:

Terminology

It is a Team, not NIC bonding, etc.
A team is made of Team Members
Team Interfaces are the virtual NICs that can connect to a team and have IP stacks, etc. You can call them tNICs to differentiate them from vNICs in the Hyper-V world.

Team Connection Modes

Most people don’t know the teaming mode they select when using OEM products. MSFT are clear about what teaming does under the cover. Connection mode = how do you connect to the switch?

Switch Independent can be used where the switch doesn’t need to know anything about the team.
Switch dependent teaming is when the switch does need to know something about the team. The switch decides where to send the inbound traffic.

There are 2 switch dependent modes:

LACP (Link Aggregation Control Protocol) is where the is where the host and switch agree on who the team members are. IEEE 802.1ax
Static Teaming is where you configure it on the switch.

Load Distribution Modes

You also need to know how you will spread traffic across the team members in the team.

1) Address Hash comes in 3 flavours:

4-tuple (the default): Uses RSS on the TCP/UDP ports.
2-tuple: If the ports aren’t available (encrypted traffic such as IPsec) then it’ll go to 2-tuple where it uses the IP address.
MAC address hash: If not IP traffic, then MAC addresses are hashed.

2) We also have Hyper-V Port, where it hashes the port number on the Hyper-V switch that the traffic is coming from. Normally this equates to per-VM traffic. No distribution of traffic. It maps a VM to a single NIC. If a VM needs more pipe than a single NIC can handle then this won’t be able to do it. Shouldn’t be a problem because we are consolidating after all.

Maybe create a team in the VM? Make sure the vNICs are on different Hyper-V Switches.

SR-IOV

Remember that SR-IOV bypasses the host stack and therefore can’t be teamed at the host level. The VM bypasses it. You can team two SR-IOV enabled vNICs in the guest OS for LBFO.

Switch Independent – Address Hash

Outbound traffic in Address Hashing will spread across NICs. All inbound traffic is targeted at a single inbound MAC address for routing purposes, and therefore only uses 1 NIC. Best used when:

Switch diversity is a concern
Active/Standby mode
Heavy outbound but light inbound workloads

Switch Independent – Hyper-V Port

All traffic from each VM is sent out on that VM’s physical NIC or team member. Inbound traffic also comes in on the same team member. So we can maximise NIC bandwidth. It also allows for maximum use of VMQs for better virtual networking performance.

Best for:

Number of VMs well exceeds number of team members
You’re OK with VM being restricted to bandwidth of a single team member

Switch Dependent Address Hash

Sends on all active members by using one of the hashing methods. Receives on all ports – the switch distributes inbound traffic. No association between inbound and outbound team members. Best used for:

Native teaming for maximum performance and switch diversity is not required.
Teaming under the Hyper-V switch when a VM needs to exceed the bandwidth limits of a single team member Not as efficient with VMQ because we can’t predict the traffic.

Best performance for both inbound and outbound.

Switch Dependent – Hyper-V Port

Sends on all active members using the hashed port – 1 team member per VM. Inbound traffic is distributed by the switch on all ports so there is no correlation to inbound and outbound. Best used when:

When number of VMs on the switch well exceeds the number of team members AND
You have a policy that says you must use switch dependent teaming.

When using Hyper-V you will normally want to use Switch Independent & Hyper-V Port mode.

When using native physical servers you’ll likely want to use Switch Independent & Address Hash. Unless you have a policy that can’t tolerate a switch failure.

Team Interfaces

There are different ways of interfacing with the team:

Default mode: all traffic from all VLANs is passed through the team
VLAN mode: Any traffic that matches a VLAN ID/tag is passed through. Everything else is dropped.

Inbound traffic passes through to one team interface at once.

The only supported configuration for Hyper-V is shown above: Default mode passing through all traffic t the Hyper-V Switch. Do all the VLAN tagging and filtering on the Hyper-V Switch. You cannot mix other interfaces with this team – the team must be dedicated to the Hyper-V Switch. REPEAT: This is the only supported configuration for Hyper-V.

A new team has one team interface by default.

Any team interfaces created after the initial team creation must be VLAN mode team interfaces (bound to a VLAN ID). You can delete these team interfaces.

Get-NetAdapter: Get the properties of a team interface

Rename-NetAdapter: rename a team interface

Team Members

Any physical ETHERNET adapter with a Windows Logo (for stability reasons and promiscuous mode for VLAN trunking) can be a team member.
Teaming of InfiniBand, Wifi, WWAN not supported.
Teams made up of teams not supported.

You can have team members in active or standby mode.

Virtual Teams

Supported if:

No more than 2 team members in the guest OS team

Notes:

Intended for SR-IOV NICs but will work without it.
Both vNICs in the team should be connected to different virtual switches on different physical NICs

If you try to team a vNIC that is not on an External switch, it will show up fine and OK until you try to team it. Teaming will shut down the vNIC at that point.

You also have to allow teaming in a vNIC in Advanced Properties – Allow NIC teaming. Do this for each of the VM’s vNICs. Without this, failover will not succeed.

PowerShell CMDLETs for Teaming

The UI is actually using POSH under the hood. You can use the NIC Teaming UI to remotely manage/configure a server using RSAT for Windows 8. WARNING: Your remote access will need to run over a NIC that you aren’t altering because you would lose connectivity.

Supported Networking Features

NIC teaming works with almost everything:

TCP Chimney Offload, RDMA and SR-IOV bypass the stack so obviously they cannot be teamed in the host.

Limits

32 NICs in a team
32 teams
32 team interfaces in a team

That’s a lot of quad port NICs. Good luck with that! Winking smile

SMB Multichannel

An alternative to a team in an SMB 3.0 scenario. Can use multiple NICs with same connectivity, and use multiple cores via NIC RSS to have simultaneous streams over a single NIC (RSS) or many NICs (teamed, not teamed, and also with RSS if available). Basically, leverage more bandwidth to get faster SMB 3.0 throughput.

Without it, a 10 GbE NIC would only be partly used by SMB – single CPU core trying to transmit. RSS makes it multi-threaded/core, and therefore many connections by the data transfer.

Remember – you cannot team RDMA. So another case to use Multichannel and get an LBFO effect is to use SMB Multichannel …. or I should say “use” … SMB 3.0 turns it on automatically if multiple paths are available between client and server.

SMB 3.0 is NUMA aware.

Multichannel will only use NICs of same speed/type. Won’t see traffic spread over a 10 GbE and a 1 GbE NIC, for example, or over RDMA-enabled and non-RDMA NICs.

In tests, the throughput on RSS enabled 10 GbE NICs (1, 2, 3, and 4 NICs), seemed to grow in a predictable near-linear rate.

SMB 3.0 uses a shortest queue first algorithm for load balancing – basic but efficient.

SMB Multichannel and Teaming

Teaming allows for faster failover. MSFT recommending teaming where applicable. Address-hash port mode with Multichannel can be a nice solution. Multichannel will detect a team and create multiple connections over the team.

RDMA

If RDMA is possible on both client and server then SMB 3.0 switches over to SMB Direct. Net monitoring will see negotiation, and then … “silence” for the data transmission. Multichannel is supported across single or multiple NICs – no NIC teaming, remember!

Won’t Work With Multichannel

Single non-RSS capable NIC
Different type/speed NICs, e.g. 10 GbE RDMA favoured over 10 GbE non-RDMA NIC
Wireless can be failed from but won’t be used in multi-channel

Supported Configurations

Note that Multichannel over a team of NICs is favoured over multichannel over the same NICs that are not in a team. Added benefits of teaming (types, and fast failover detection). This applies, whether the NICs are RSS capable or not. And the team also benefits non-SMB 3.0 traffic.

Troubleshooting SMB Multichannel

Plenty to think about there, folks! Where it applies in Hyper-V?

NIC teaming obviously applies.
Multichannel applies in the cluster: redirected IO over the cluster communications network
Storing VMs on SMB 3.0 file shares

Technorati Tags: Event Notes,Windows Server 2012,Networking

Windows Server 2012 High-Performance, Highly-Available Storage Using SMB

Notes from TechEd NA 2012 session WSV303:

One of the traits of the Scale-Out File Server is Transparent Failover for server-server apps such as SQL Server or Hyper-V. During a host power/crash/network failure, the IO is paused briefly and flipped over to an alternative node in the SOFS.

Transparent Failover

The Witness Service and state persistence enable Transparent Failover in SMB 3.0 SOFS. The Witness plays a role in unplanned failover. Instead of a TCP timeout (40 seconds and causing application issues), speeds up the process. It tells the client that the server that they were connected to has failed and should switch to a different server in the SOFS.

NTFS Online Scan and Repair

CHKDSK can take hours/days on large volumes.
Scan done online
Repair is only done when the volume is offline
Zero downtime with CSV with transparent repair

Clustered Hardware RAID

Designed for when using JBOD, probably with Storage Spaces.

Resilient File System (ReFS)

A new file system as an alternative to NTFS (which is very old now). CHKDSK is not needed at all. This will become the standard file system for Windows over the course of the next few releases.

Comparing the Performance of SMB 3.0

Wow! SMB 3.0 over 1 Gbps network connection achieved 98% of DAS performance using SQL in transactional processing.

If there are multiple 1 Gbps NICs then you can use SMB Multichannel which gives aggregated bandwidth and LBFO. And go extreme with SMB Direct (RDMA) to save CPU.

VSS and SMB 3.0 File Shares

You need a way to support remote VSS snapshots for SMB 3.0 file shares if supporting Hyper-V. We can do app consistent snapshots of VMs stored on a WS2012 file server. Backup just works as normal – backing up VMs on the host.

Backup talks to backup agent on host.
Hyper-V VSS Writer reaches into all the VMs and ensures everything is consistent.
VSS engine is then asked to do the snapshot. In this case, the request is relayed to the file server where the VSS snapshot is done.
The path to the snapshot is returned to the Hyper-V host and that path is handed back to the backup server.
The backup server can then choose to either grab the snapshot from the share or from the Hyper-V host.

Data Deduplication

Dedup is built into Windows Server 2012. It is turned on per-volume. You can exclude folders/file types. By default files not modified in 5 days are deduped – SO IT DOES NOT APPLY TO RUNNING VMs. It identifies redundant data, compresses the chunks, and stores them. Files are deduped automatically and reconstituted on the fly.

REPEAT: Deduplication is not intended for running virtual machines.

Unified Storage

The iSCSI target is now built into WS2012 and can provide block storage for Hyper-V before WS2012. ?!?!?! I’m confused. Can be used to boot Hyper-V hosts – probably requiring iSCSI NICs with boot functionality.

Technorati Tags: Event Notes,Windows Server 2012,Storage

Building a Highly Available Failover Cluster Solution With WS2012 From The Ground Up

Some notes taken from TechEd NA 2012 WSV324:

I won’t blog too much from this session. I’ve more than covered a lot of it in the recent months.

Cluster Validation Improvements

Faster storage validation
Includes Hyper-V cluster validation tests
Granular control to validate a specific LUN
Verification of CSV requirements
Replicated hardware aware for multi-site clusters

CSV Improvements

No external authentication dependencies for improved performance and resiliency
Multi-subnet support (multi-site clusters)

Asymmetric Cluster

BitLocker on CSV

This will get the BitLocker status of the CSV:

manage-bde –status C:ClusterStorageVolume1

This will enable BitLocker on a CSV:

manage-bde –on C:ClusterStorageVolume1 –RecoverPassword

You get a warning if you try to run this with the CSV online. You need the volume to be offline (Turn On Maintenance Mode under More Actions when you right-click the CSV) … so plan this in advance. Otherwise be ready to do lots of Storage Live Migration or have VM downtime.

NOTE! A recovery password is created for you. Make sure you record this safely in a place independent from the cluster that is secure and reliable.

Get the status again to check the progress.

It’s critically important that you add the security descriptor for the cluster so that the cluster can use the now encrypted CSV. Get that by:

get-cluster

Say that returns the name HV-Cluster1.

Now run the following, and note the $ at the end of the security descriptor (indicating computer account for the cluster):

manage-bde C:ClusterStorageVolume1 –protectors –add –sid HV-Cluster1$

That can be done while the CSV is encrypting. Once encrypted, you can take it out of maintenance mode.

AD Integration

You now can intelligently place Cluster Name Objects (CNO) and Virtual Computer Objects (VCO) in desired OUs.
AD-less Cluster Bootstrapping allows you to run/start a cluster with no physical domain controllers. This gets a justifiable applause It’s great news for branch offices and SMEs.
Repair action to automatically recreate VCOs
Improved logging and diagnostics
RODC support fro DMZ and branch office deployments

Node Vote Weight

In a stretch or mult-site cluster, you can configure which nodes have votes in determining quorum.
Configurable with 1 or 0 votes. All nodes have a vote by default. Does not apply in Disk Only quorum model.
In the multi-site cluster model, this allows the primary site to have the majority of votes.

Dynamic Quorum

It is now the default quorum choice in WS2012 Failover Clustering
Works in all quorum models except Disk Only Quorum.
Quorum changes dynamically based on nodes in active membership
Numbers of votes required for quorum changes as nodes go inactive
Allows the cluster to stay operations with >50% node count failure

Thoughts:

I guess it is probably useful for extremely condensed cluster dynamic power optimisation (VMM 2012)
Also should enable cluster to reconfigure itself when there are node failures

Configuration:

EnableDynamicQuorum edit a cluster common property to enable dynamic quorum

DynamicWeight Node private property to view a node’s current vote weight

Cluster Scheduled Tasks

3 types:

Cluster wide: On all nodes
Any node: On a random node
Resource specific: On the node that owns the resource

PowerShell:

Register-ClusteredScheduleTask
Unregister-ClusteredShceduledTask
Set-ClusteredScheduledTask
Get-ClusteredScheduledTask

Technorati Tags: Windows Server 2012,Failover Clustering,Hyper-V,Virtualisation

Windows Server 2012 Cluster-In-A-Box, RDMA, And More

Notes taken from TechEd NA 2012 session WSV310:

Volume Platform for Availability

Huge amount of requests/feedback from customers. MSFT spent a year focusing on customer research (US, Germany, and Japan) with many customers of different sizes. Came up with Continuous Availability with zero data loss transparent failover to succeed High Availability.

Targeted Scenarios

Business in a box Hyper-V appliance
Branch in a box Hyper-V appliance
Cloud/Datacenter high performance storage server

What’s Inside A Cluster In A Box?

It will be somewhat flexible. MSFT giving guidance on the essential components so expect variations. MSFT noticed people getting cluster networking wrong so this is hardwired in the box. Expansion for additional JBOD trays will be included. Office level power and acoustics will expand this solution into the SME/retail/etc.

Lots of partners can be announced and some cannot yet:

HP
Fujitsu
Intel
LSI
Xio
And more

More announcements to come in this “wave”.

Demo Equipment

They show some sample equipment from two Original Device Manufacturers (they design and sell into OEMs for rebranding). One with SSD and Infiniband is shown. A more modest one is shown too:

That bottom unit is a 3U cluster in a box with 2 servers and 24 SFF SAS drives. It appears to have additional PCI expansion slots in a compute blade. We see it in a demo later and it appears to have JBOD (mirrored Storage Spaces) and 3 cluster networks.

RDMA aka SMB Direct

Been around for quite a while but mostly restricted to the HPC space. WS2012 will bring it into wider usage in data centres. I wouldn’t expect to see RDMA outside of the data centre too much in the coming year or two.

RDMA enabled NICs also known as R-NICs. RDMA offloads SMB CPU processing in large bandwidth transfers to dedicated functions in the NIC. That minimises CPU utilisation for huge transfers. Reduces the “cost per byte” of data transfer through the networking stack in a server by bypassing most layers of software and communicating directly with the hardware. Requires R-NICs:

iWARP: TCP/IP based. Works with any 10 GbE switch. RDMA traffic routable. Currently (WS2012 RC) limited to 10 Gbps per NIC port.
RoCE (RDMA over Converged Ethernet): Works with high-end 10/40 GbE switches. Offers up to 40 Gbps per NIC port (WS2012 RC). RDMA not routable via existing IP infrastructure. Requires DCB switch with Priority Flow Control (PFC).
InfiniBand: Offers up to 54 Gbps per NIC port (WS2012 RC). Switches typically less expensive per port than 10 GbE. Switches offer 10/40 GbE uplinks. Not Ethernet based. Not routable currently. Requires InfiniBand switches. Requires a subnet manager on the switch or on the host.

RDMA can also be combined with SMB Multichannel for LBFO.

Applications (Hyper-V or SQL Server) do not need to change to use RDMA and make the decision to use SMB Direct at run time.

Partners & RDMA NICs

Mellanox ConectX-3 Dual Port Adapter with VPI InfiniBand
Intel 10 GbE iWARP Adapter For Server Clusters NE020
Chelsio T3 line of 10 GbE Adapters (iWARP), have 2 and 4 port solutions

We then see a live demo of 10 Gigabytes (not Gigabits) per second over Mellanox InfiniBand. They pull 1 of the 2 cables and throughput drops to 6,000 Gigabytes per second. Pop the cable back in and flow returns to normal. CPU utilisation stays below 5%.

Configurations and Building Blocks

Start with single Cluster in a Box, and scale up with more JBODs and maybe add RDMA to add throughput and reduce CPU utilisation.
Scale horizontally by adding more storage clusters. Live Migrate workloads, spread workloads between clusters (e.g. fault tolerant VMs are physically isolated for top-bottom fault tolerance).
DR is possible via Hyper-V Replica because it is storage independent.
Cluster-in-a-box could also be the Hyper-V cluster.

This is a flexible solution. Manufacturers will offer new refined and varied options. You might find a simple low cost SME solution and a more expensive high end solution for data centres.

Hyper-V Appliance

This is a cluster in a box that is both Scale-Out-File Server and Hyper-V cluster. The previous 2 node Quanta solution is set up this way. It’s a value solution using Storage Spaces on the 24 SFF SAS drives. The space are mirrored for fault tolerance. This is DAS for the 2 servers in the chassis.

What Does All This Mean?

SAN is no longer your only choice, whether you are SME or in the data centre space. SMB Direct (RDMA) enables massive throughput. Cluster-in-a-Box enables Hyper-V appliances and Scale-Out File Servers in ready made kits, that are continuously available and scalable (up and out).

Technorati Tags: Event Notes,Windows Server 2012,Storage,Failover Clustering,Networking

Cluster Shared Volumes Reborn in WS2012: Deep Dive

Noes from TechEd North America 2012 session WSV430:

New in Windows Server 2012

File services is supported on CSV for application workloads. Can leverage SMB 3.0 and be used for transparent failover Scale-Out File Server (SOFS)
Improved backup/restore
Improved performance with block level I/O redirection
Direct I/O during backup
CSV can be built on top of Storage Spaces

New Architecture

Antivirus and backup filter drivers are now compatible with CSV. Many are already compatible.
There is a new distributed application consistent backup infrastructure.
ODX and spot fixing are supported
BitLocker is supported on CSV
AD not longer a dependency (!?) for improved performance and resiliency.

Metadata Operations

Lightweight and rapid. Relatively infrequent with VM workloads. Require redirected I/O. Includes:

VM creation/deletion
VM power on/off
VM mobility (live migration or storage live migration)
Snapshot creation
Extending a dynamic VHD
Renaming a VHD

Parallel metadata operations are non disruptive.

Flow of I/O

For non-metadata IO: Data sent to the CSV Proxy File System. It then routes to the disk via CSV VolumeMgr via direct IO.
For metadata redirected IO (see above): We get SMB redirected IO on non-orchestrator (not the CSV coordinator/owner for the CSV in question) nodes. Data is routed via SMB redirected IO by the CSV Proxy File System to the orchestrator via the cluster communications network so the orchestrator can handle the activity.

Interesting Note

You can actually rename C:ClusterStorageVolume1 to something like C:ClusterStorageCSV1. That’s supported by CSV. I wonder if things like System Center support this?

Mount Points

Used custom reparse points in W2008 R2. That meant backup needed to understand these.
Switched to standard Mount Points in WS2012.

Improved interoperability with:

Performance coutners
OpsMgr (never had free space monitoring before)
Free space monitoring (speak of the devil!)
Backup software can understand mount points.

CSV Proxy File System

Appears as CSVFS instead of NTFS in disk management. NTFS under the hood. Enabled applications and admins to be CSV aware.

Setup

No opt-in any more. CSV enabled by default. Appears in normal storage node in FCM. Just right click on available storage to convert to CSV.

Resiliency

CSV enables fault tolerance file handles. Storage path fault tolerance, e.g. HBA failure. When a VM opens a VHD, it gets a virtual file handle that is provided by CSVFS (metadata operation). The real file handle is opened under the covers by CSV. If the HBA that the host is using to connect the VM to VHD fails, then the real file handle needs to be recreated. This new handle is mapped to the existing virtual file handle, and therefore the application (the VM) is unaware of the outage. We get transparent storage path fault tolerance. The fault tolerant SAN connectivity (remember that direct connection via HBA has failed and should have failed the VM’s VHD connection) is re-routed by Redirected IO via the Orchestrator (CSV coordinator) which “proxies” the storage IO to the SAN.

If the Coordinator node fails, IO is queued briefly and the orchestration role fails over to another node. No downtime in this brief window.

If the private cluster network fails, the next available network is used … remember you should have at least 2 private networks in a CSV cluster … the second private network would be used in this case.

Spot-Fix

Scanning is separated from disk repair. Scanning is done online.
Spot-fixing requires offline only to repair. It is based on the number of errors to fix rather than the size of the volume … could be 3 seconds.
This offline does not cause the CSV to go “offline” for applications (VMs) using that CSV being repaired. CSV proxy file system virtual file handles appear to be maintained.

This should allow for much bigger CSVs without chkdsk concerns.

CSV Block Cache

This is a distributed write-through cache. Un-buffered IO is targeted. This is excluded by the Windows Cache Manager (buffered IO only). The CSV block cache is consistent across the cluster.

This has a very high value for pooled VDI VM scenario. Read-only (differencing) parent VHD or read-write differencing VHDs.

You configure the memory for the block cache on a cluster level. 512 MB per host appears to be the sweet spot. Then you enable CSV block cache on a per CSV basis … focus on the read-performance-important CSVs.

Less Redirected IO

New algorithm for detecting type of redirected IO required
Uses OpsLocks as a distributed locking mechanism to determine if IO can go via direct path

Comparing speeds:

Direct IO: Block level IO performance parity
Redirected IO: Remote file system (SMB 3.0) performance parity … can leverage multichannel and RDMA

Block Level Redirection

This is new in WS2012 and provides a much faster redirected IO during storage path failure and redirection. It is still using SMB. Block level redirection goes directly to the storage subsystem and provides 2x disk performance. It bypasses the CSV subsystem on the coordinator node – SMB redirected IO (metadata) must go through this.

You can speed up redirected IO using SMB 3.0 features such as Multichannel (many NICs and RSS on single NICs) and RDMA. With all the things turned on, you should get 98% of the performance of direct IO via SMB 3.0 redirected IO – I guess he’s talking about Block Level Redirected IO.

VM Density per CSV

Orchestration is done on a cluster node (parallelized) which is more scalable than file system orchestration.
Therefore there are no limits placed on this by CSV, unlike in VMFS.
How many IOPS can your storage handle, versus how many IOPS do your VMs need?
Direct IO during backup also simplifies CSV design.

If your array can handle it, you could (and probably won’t) have 4,000 VMs on a 64 node cluster with a single CSV.

CSV Backup and Restore Enhancements

Distributed snapshots: VSS based application consistency. Created across the cluster. Backup applications query the CSV to do an application consistent backup.
Parallel backups can be done across a cluster: Can have one or more concurrent backups on a CSV. Can have one or more concurrent CSV backups on a single node.
CSV ownership does not change. There is no longer a need for redirected IO during backup.
Direct IO mode for software snapshots of the CSV – when there is no hardware VSS provider.
Backup no longer needs to be CSV aware.

Summary: We get a single application consistent backup snapshot of multiple VMs across many hosts using a single VSS snapshot of the CSV. The VSS provider is called on the “backup node” … any node in the cluster. This is where the snapshot is created. Will result in less data being transmitted, fewer snapshots, quicker backups.

How a CSV Backup Work in WS2012

Backup application talks to the VSS Service on the backup node
The Hyper-V writer identifies the local VMs on the backup node
Backup node CSV writer contacts the Hyper-V writer on the other hosts in cluster to gather metadata of files being used by VMs on that CSV
CSV Provider on backup node contacts Hyper-V Writer to get quiesce the VMs
Hyper-V Writer on the backup node also quiesces its own VMs
VSS snapshot of the entire CSV is created
The backup tool can then backup the CSV via the VSS snapshot

More VMware Compete Wins For Hyper-V

VMware made a cute video to defend themselves against Windows Server 2012 Hyper-V. But MSFT continues to hand out a GTA IV style baseball beat down at TechEd.

This post would have been impossible without the tweeted pictures by David Davis at http://www.vmwarevideos.com

General Feature Comparison

Does your business have an IT infrastructure so you can play, or to run applications? What features have you got to improve those services?

Capability	vSphere Free	vSphere 5.0 Ent +	WS2012 Hyper-V
Incremental backups	No	Yes	Yes
Inbox VM replication	No	No	Yes
NIC teaming	Yes	Yes	Yes
Integrated High Availability	No	Yes	Yes
Guest OS Application Monitoring	N/A	No	Yes
Failover Prioritization	N/A	Yes	Yes
Affinity & Anti-Affinity Rules	N/A	Yes	Yes
Cluster-Aware Updating	N/A	Yes	Yes

So Hyper-V has more application integrations.

Live Migration

Capability	vSphere Free	vSphere 5.0 Ent +	WS2012 Hyper-V
VM Live Migration	No	Yes	Yes
1 GB Simultaneous Live Migrations	N/A	4	Unlimited
10 GB Simultaneous Live Migrations	N/A	8	Unlimited
Live Storage Migration	No	Yes	Yes
Shared Nothing Live Migration	No	No	Yes
Network Virtualisation	No	Partner	Yes

Shared-nothing Live Migration is actually a big deal. We know that 33% of business don’t cluster their hosts, and another 33% have a mix of clustered and non-clustered hosts. Share-Nothing Live Migration enables mobility across these platforms. Flexibility is the #2 reason why people virtualise (see Network Virtualisation later on).

Clustering

Can you cluster hosts, and if so, how many? How many VMs can you put on a host cluster? Apps require uptime too, because VMs need to be patched, rebooted, and occasionally crash.

Capability	vSphere Free	vSphere 5.0 Ent +	WS2012 Hyper-V
Nodes/Cluster	N/A	32	64
VMs/Cluster	N/A	3000	4000
Max Size iSCSI Guest Cluster	N/A	0	64 Nodes
Max Size Fibre Channel Guest Cluster	2 Nodes	2 Nodes	64 Nodes
Max Size File Based Guest Cluster	0	0	64 Nodes
Guest Clustering with Live Migration Support	N/A	No	Yes
Guest Clustering with Dynamic Memory Support	No	No	Yes

Based on this data, WS2012 Hyper-V is the superior platform for scalability and fault tolerance.

Virtual Switches

In a cloud, the virtual switch plays a huge role. How do they stack up against each other?

Capability	vSphere Free	vSphere 5.0 Ent +	WS2012 Hyper-V
Extensible Switch	No	Replaceable	Yes
Confirmed partner extensions	No	2	4
PVLAN	No	Yes	Yes
ARP/ND Spoofing Protection	No	vShield/Partner	Yes
DHCP Snooping Protection	No	vShield/Partner	Yes
Virtual Port ACLs	No	vShield/Partner	Yes
Trunk Mode to VMs	No	No	Yes
Port Monitoring	Per Port Group	Yes	Yes
Port Mirroring	Per Port Group	Yes	No

Another win for WS 2012 Hyper-V. Note that vShield is an additional purchase on top of vSphere. Hyper-V is the clear feature winner in cloud networking.

Network Optimisations

Capability	vSphere Free	vSphere 5.0 Ent +	WS2012 Hyper-V
Dynamic Virtual Machine Queue (DVMQ)	NetQueue	netQueue	Yes
IPsec Task Offload	No	No	Yes
SR-IOV	DirectPath I/O	DirectPath I/O	Yes
Storage Encryption (CSV vs VMFS)	No	No	Yes

NetQueue supports a subset of the VMware HCL
Apparently DirectPath I/O VMs cannot vMotion (Live Migrate) without certain Cisco UCS (blade server centres) configurations
No physical security for VMFS SANs in the data center or cololated hosting

Hyper-V wins on the optimisation side of things for denser and higher throughput network loads.

VMware Fault Tolerance

FT feature: Run a hot standby VM on another host, taking over if another host should fail.

Required sacrifices:

4 FT VMs per host with no memory overcommit: expensive because of low host density
1 vCPU per FT VM: Surely VMs that require FT would require more than one logical processor (physical thread of execution)?
EPT/RVI (SLAT) disabled: No offloaded memory management. This boosts VM performance by around 20% so I guess this FT VM doesn’t require performance.
Hot-plug disabled: no hot adding devices such as disks
No snapshots: not such a big deal for a production VM in my opinion
No VCB (VSS) backups: This is a big deal, because now you have to do a traditional “iron” backup of the VM, requiring custom backup policy, discarding the benefits of storage level backup for VMs

If cost reduction is the #1 reason for implementing virtualisation, then VMware FT seems like a complete oxymoron to me. VMware FT is a chocolate kettle. It sounds good, but don’t try boil water with it.

VMware Autodeploy

Centrally deploy a Hypervisor from a central console.

We have System Center 2012 Virtual Machine Manager for bare metal deployment. Yes, it’s a bit more complex to setup. B-u-t … with converged fabrics in WS2012, Hyper-V networking is actually getting much easier.

And even with System Center 2012 Datacenter, the MSFT solution is way cheaper than the vSphere alternative, and provides a complete cloud in the package, whereas vSphere is only the start of your vTaxation for disparate point solutions that contradict desires for a deeply integrated, automated, connected, self-service infrastructure.

More Stuff

I didn’t see anything on SRM versus Hyper-V Replica but I guess it was probably discussed. SRM is allegedly $250-$400 per VM. Hyper-V Replica is free and even baked into the free Hyper-V Server. And Hyper-V Replica works with cloud vendors as well as internal sites. Orchestration of failover can be done manually, by very simple PowerShell scripts, or with System Center 2012 Orchestrator (demonstrated in day 1 keynote).

I don’t know anything about vSphere support for Infiniband and RDMA, both supported by WS2012. In fact, today it was reported that WS2012 RC Hyper-V benchmarked at 10.36 GigaBYTES/second (not Gbps) with 4.6% CPU overhead.

I also don’t know if VMware supports network abstraction, as in Hyper-V Network Virtualisation, essential for mobility between different networks and cloud consolidation/migration.

Take some time to review the new features in WS2012 Hyper-V.

Technorati Tags: Event Notes,Windows Server 2012,Hyper-V,Virtualisation,VMware

TechEd North America 2012 Day 2 Keynote

Antoine Leblond, Corporate Vice President is speaking, and the topic is Windows 8.

Over 600,000,000 copies of Windows 7 have been sold. The enterprise features of Windows 8 are based on, but evolved from Windows 7. We have moved on from the desktop-centric world when Windows 7 was launched. Over 75% of consumer machines being bought in USA this year are laptops. Next year it is projected that tablets will outsell PCs. More machines will run off of the battery than DC power. Every microwatt of power saved extends the battery life of the machine. Tablets = touch UI. If projections are right, then touch becomes the primary UI.

Connectivity is ubiquitous. We have moved from a world of local content to a world of multi-cloud stored data: flickr, facebook, Skydrive, Office365, and many others.

The hard split between how I use a machine at home and how I use a machine at work has been blurred or completely dissolved. Users have reimagined how they use PCs, and Microsoft has reimagined Windows.

Demo Business Apps

We see a bunch of bespoke apps with live tiles. Info is flashed up so user can see current status. The dev has use semantic zoom … a conceptual zoom rather than a graphic zoom.

A CRM app uses GPS sensor to find out where the sales person is, and then shows the location of customers in a map. Clever.

Linda Averett

Demo on Samsung Ultrabook with mouse/keyboard and a “modern touchpad”. The Windows 8 gestures are recognised by the touchpad. Kind of Mac-like I guess, handy if you don’t have touch screen – or are one of those OCD people who hates fingerprints on their screen.

A NewEgg app is shown, with search, filter and contracts being shown off.

Antoine Leblond

Now we see a sales pipeline automation app that is a beta/test app by SAP. Looks very sexy … and it’s by SAP! What an oxymoron! Using touch, the user can explore the data that is graphically presented, changing variables and seeing the results. Don’t thing columns and rows of numbers. It was all imagery that was designed for exploring and touch.

Linda Averett – Business Features

She has a Lenovo laptop, but it has a touch screen. Windows 7 is running in a Hyper-V VM on Windows 8. As you should know by reading here, Hyper-V is in Windows 8 Pro and Enterprise. It seems to get biggest cheer of anything in the keynotes so far (audience has been very quiet these 2 days). Cut The Rope is running in IE 9 in the Win7 VM.

BitLocker (AES256 full disk encryption) is shown off – it and BitLocker-To-Go now are in Windows 8 Pro, not just in the Enterprise edition. Great for customers – not great for those of us trying to sell Software Assurance Smile

Then lots of dev stuff and then the end of the keynote.

Technorati Tags: Event Notes,Windows 8