Cisco & The Windows 8 Hyper-V Extensible Switch

In Windows 2008/R2 Hyper-V, a virtual network was the term that was used to describe the switch that connected a physical network card to the port of a VM’s virtual network adapter.  That has changed in Windows Server 8; it is now referred to as a virtual switch, or to be more precise, the extensible virtual switch.

Why extensible?  Microsoft has made it possible for 3rd party software developers to plug into the switch and add more functionality.  One such example is Cisco, who have developed a solution.  To put it simply, using extensions, you can extend your Cisco network into  Hyper-V networking.  I heard about it on Twitter, and then I heard that Cisco had a booth at Build Windows so I went to talk to them, and got a demo.

Wait a moment: I have had the next question twice when working with senior Cisco network engineers.  I asked Cisco the question, and their eyes rolled; they’d heard this question non-stop since opening the booth Smile  How will virtual switches, to be precise, Cisco virtual switches deal with spanning tree?  The answer was that “they will break the loop” so there should be no problem.

The core advantage for customers that do this is that they can use a single management solution and skill set to manage all of networking.  In the demo, I was shown how everything about the virtual switch in the Cisco command line console was very similar, if not identical, to managing a physical switch. 

Additionally you get the power and configurability of Cisco networking.  For example, in a GUI, you could create Port policies to dictate:

  • What a port could talk to
  • What protocol it could use
  • Etc

You assigned a policy to the port and suddenly it was filtering – but this was all done using Cisco tools that network admins already know.  Another integration was VLAN support for ports.

Pretty powerful stuff!

How Many NICs for Clustered Windows Server 8 Hyper-V?

If you asked me, any Hyper-V expert, or Microsoft that question about Windows Server 2008 R2 then it was easy: either 4 or 6 (or 8 or 10 with NIC teaming) depending if you used iSCSI (2 NICs with MPIO) or not.  Ask that question with Windows Server 8 and the answer is … it depends.

You do have several roles that need to be serviced with network connections:

  • Parent
  • Cluster/Storage
  • Live Migration
  • Hyper-V Extensible Switch (note that what we called virtual network is now virtual switch – a virtual network is an abstraction or virtualisation now.  This is probably serviced by 2 NICs with NIC teaming (by Windows)

How this connections are physically presented to the network really does depend on the hardware in your server, whether you need physical fabric isolation or not (trend is to fabric convergence to reduce physical fabrics complexity and cost), and whether you want to enable NIC teaming or not.

Here’s a converged example from yesterday’s Build Windows sessions that uses fault tolerant 10 GbE NICs (teamed by Windows Server 8). 

image

All of the networking functions all have port connections into the Hyper-V Extensible Switch.  The switch is bound to two 10 GbE network adapters in the host server.  NIC teaming provides network path fault tolerance (in my experience a switch is more likely to die than a NIC now).  QoS ensures that each connection gets the necessary bandwidth – I reckon the minimum bandwidth option is probably best here because it provides a service guarantee and allows burst when capacity is available.  Port ACLs can be used to control what a connection can connect to – to provide network isolation.

The reason that MSFT highlighted this example is because it is a common hardware configuration now.  If you buy HP blades, you can do some of this now with their Flex10 solution.  Microsoft are recommending 10 GbE for future proofing, and you can use 2 NICs and physical switch ports with NIC teaming and network fault tolerance, instead of using 10 NICs and 10 switch ports for the 1 GbE alternative!

A lot of examples were shown.  This one goes down a more traditional route with physical isolation:

image

Most servers come with 4 * 1 GbE NICs by default.  You could take the above example, and use just 1 * 1 GbE NIC for the Hyper-V Extensible switch if budget was an issue, but you’d lose NIC teaming.  You could add NIC teaming to that example by adding another 1 GbE NIC (now giving a total of 5 * 1 GBe NICs).

The answer to the “how many NICs” question is, fortunately and unfortunately, a consultants answer: it depends.

Looking Back on Day 3 at Build Windows … Plus More!

Today was storage day at Build for me.  I attended 1.5 Hyper-V networking sessions and filled out the rest of the day with clustering and storage (which are pretty much one and the same now).  The highlights:

  • CSV backup in Windows Server 8 does not use Redirected I/O
  • The storage vendors were warned to increase the size of their iSCSI 3 tables (much bigger cluster support now from Microsoft, and more opportunity to use the SAN)
  • Storage Pool and File Share Clustering … well let me dig deeper ….

image

Investing in a virtualisation cluster is a pricey deal, for anyone because of the cost of SAS/iSCSI/FC SANs.  Even a start kit with just a few TB of disk will be the biggest investment in IT that most small/medium businesses will ever make.  And it requires a bunch of new skills, management systems, and procedures.   The operations of LUN deployment can slow down a cloud’s ability to respond to business demands.

Microsoft obviously recognised this several years ago and started working on Storage Pools and Spaces.  The idea here is that you can take a JBOD (just a bunch of disks, which can be internal or DAS) or disks on an existing SAN, and create a storage pool.  That is an aggregation of disks.  You can have many of these for isolation of storage class, administrative delegation, and so on.  From the pool, you create Storage Spaces.  These are VHDX files AFAIK on the disk, and they can be mounted as volumes by servers.

In this new style of Hyper-V cluster design, you can create a highly available File Server cluster with transparent failover.  That means failover is instant, thanks to a Witness (informs the server connecting to the cluster if a node fails and to connect to an alternative).  For something like Hyper-V, you can set your cluster up with active-active clustering of the file shares, and this uses CSV (CSV is no longer just for storing Hyper-V VMs).  The connecting clients (which are servers) can be load balanced using PowerShell scripting (could be a scheduled task).

Note: active/passive file share clustering (not using CSV) is recommended when there are lots of little files, when implementing end user file shares, and when there is a lot of file metadata activity.

Now you can create a Hyper-V cluster which uses the UNC paths of the file share cluster to store VMs.

This is all made possible by native NIC teaming, SMB 2.2, RDMA, and offloading technologies.

The result is actually a much cheaper storage solution than you could get with a starter kit SAN, and probably would include much more storage space.  It is more flexible, and more economic.  One of the examples we were shown had the file server cluster also hosting other shares for SQL Server files and end user file shares.

Brian Ehlert (@BrianEh) said it best: file servers are now cool.

Asymmetric Hyper-V Cluster

Elden Christensen briefly mentioned this one in his talk and I asked him about it at Ask The Experts.  The idea is that you take the above design, but only a single Windows cluster is used.  It is used to cluster the VMs and to cluster the file share(s).  This flattens the infrastructure, reduces the number of servers, and thus reduces the cost.  This one would be of great interest to small and medium businesses, as well as corporate branch offices.

Self Healing CSV

Myself and Didier van Hoye (@workinghardinit) once had a chat about sizing of CSV.  He brought up the point that no one wanted to take a CSV offline for a weekend to chkdsk a multi-terabye CSV volume.  True!

Microsoft have now implemented this solution in Windows Server 8:

  • Every 60 seconds, the health of the CSV volume is assessed.
  • If a fault is found, Windows will target that fault for a fix.
  • Windows will dismount the volume, and start caching VM write activity.
  • With the CSV offline, Windows will start fixing the fault.  It has an 8 second window.
  • If the fault is fixed the volume is brought back online and the storage activity cache is pushed out.
  • If the fault is not fixed, the volume is brought back online, and Windows will take a later 8 second break at continuing to fix the fault.  Eventually the fault is fixed with a one or more 8 second cumulative attempts.

VDI Changes

It seems like the VDI management/broker architecture will be getting much simpler.  We’re also getting some performance boosts to deal with the 9am disk storm.  Pooled VMs will be based on a single VHD.  Each created pooled VM will actually be a differencing disk.  When a pooled VM is booted up on a host, a differencing disk is created and cached on the host.  The disk is stored on an SSD in the host.  Because it’s a differencing disk, it should be tiny, holding probably no more than the user’s state.  Using local high IOPS SSD massively improves performance over accessing AVHDs on the SAN, and takes care of the 9am storage storm.

Designing Systems for Continuous Availability – Multi-Node with Remote File Storage

The speakers are Jim Pinkerton and Claus Jorgensen

Topic is on using SMB for remote storage of application files. Servers access their files on UNC file paths. Example: VM VHDs, SQL Server database and log files. Easier to provision and manage shares than LUNs. More flexible with dynamic serer relocation. No need for specialised hardware/netwok knowledge or infrastructure. LOWER cost.

Basic idea of architecture: some shared stord (e.g. Storage Spaces), file server cluster with shares, Hyper-V cluster hosts, SQL, or other servers store files on those shares.

Transparent Failover
In W2008 R2 a failover is not transparent. There is brief downtime to take down, move over, bring up the clustered service or role. 99% uptime at best

Failover in W8 is transparent to the server application. Supported planned and unplanned failovers, e.g. maintenance, failures, and load balancing. Requires Windows Failover Cluste, and both server and client must be running Windows Server 8. All operations, not just IO, must be continuous and transparent – transparent for file and directory operations.

This means we can have an application cluster that places data on a back end file server cluster. Both can scale independently.

Changes to Windows Server 8 to make transparent failover possible:
– New protocol: SMB 2.2
– SMB 2.2 Client (redirector): client operation replay, end-to-end for replay of idempotent and non-idempotent operations
– SMB 2.2 Server: support for network stte persistence, singles share spans multiple nodes (active/active shares – wonder if this is made possible by CSV?), files are always opened write-through.
– Resume Key – used to failover to: resume handle state after planned or unplanned failover, fence handle state information, mask some NTFS issues. This fences file locks.
– Witness protocol: enables faster unplanned failover because clients do not wait for timeouts, enables dynamic reallocation of load (nice!). Witness tells the client that a node is offline and tells it to redirect.

SMB2 Transparent Failover Semantics:
Server side: state persistence until the client reconnects. Example: delete a file. The file is opened, a flag is set to delete on close, and you close the file -> it’s deleted. Now you try to delete the file on a clustered file share. A planned failover happens. The node closes the file and it deletes. But after reconnect the client tries to close the file to delete it but its gone. This sort of circumstance is handled.

In Hyper-V world, we have “surprise failover” where a faulty VM can be failed over. The files are locked on file share by original node with the fence. A new API takes care of this.

SMB2 Scale Out
In W2008 R2 we have active-pasive clustered file shares. That means a share is only ever active on 1 node, so its not scalable. Windows Server 8 has scale out via active-active shares. The share can be active on all nodes. Targeted for server/server applications like SQL Server and Hyper-V. Not aimed at client/server applications like Office. We also get fewer IP addresses and DNS names. We only need one logical file server with a single file system namespace (no drive letter limitations), and no cluster disk resources to manage.

We now have a new file server type called File Server For Scale-Out Application Data. That’s the active/active type. Does not support NFS and certain role sevices such as FSRM or DFS Replication. The File Server for General Use is the active/passive one for client/server, but it also supports transparent failover.

VSS for WIndows Server 8 File Shares
Application consistent shadow copyof server application data that is stored on Windows Server 8 file shares. Bckup agent on the application server triggers backup. VSS on app server acts with File Share Shaow Copy Provider. It hits the File Share Shadow Copy Agent on the file server via RPC, and that then triggers the VSS on the file server to create the shadow copy. The backup server can read the snapshot directly from the file server, saving on needless data transfer.

Performance for Server Applications
SMB2.2 makes big changes. Gone from 25% to 97% of DAS performance. MSFT used same DAS storage in local and file share storage with SQL Server to get these numbers. NIC teaming, TCP offloads and RDMA improved performance.

Perfmon counters are added to help admins troubleshoot and tune. IO size, IO latency, IO queue length, etc. Can seperately tune SQL data file or log file.

Demo:
Scale-out file server in the demo. 4 clients accessing 2 files, balanced across 2 nodes in the scale out file server cluster. A node in the cluster is killed. The witness service sees this, knows which clients were using it, and tells them to reconnect – no timeouts, etc. The clients do come back online on the remaining node.

Platforms
– Networking: 2+ interfaces … 1 GbE, 10 GbE optionaly with RDMA, or Infiniband with RDMA
– Server: 2+ servers … “cluster in a box” (a self contained cluster appliance) or 2+ single node servers.
– Storage: Storage Spaces, Clustered PCI RAID (both on Shared JBOD SAS), FC/iSCSI/SAS fabric (on arrays)

Sample Configurations
– Lowest cost: cluster in a box with shared JBOD SAS using 1 GbE and SAS HBA. Or use the same with Cluster PCI RAID for better performance instead of the SAS HBA. An external port to add external storage to scale out. Beyong td that look at 10 GbE
– Discreet servers: 1/10 GbE with SAS HBA to Shared JBOD SAS. Or use advanced SANS.

Note: This new storage solution could radically shake up how we do HA for VMs or server applications in the small/mid enterprise. It’s going to be cheaper and more flexible. Even the corporations might look at this for low/mid tier services. MSFT did a lot of work on this and it shows IMO; I am impressed.

Designing Systems for Continuous Availability – Multi-Node with Block Storage

Speakers: Elden Christensen and Mallikarjun Chadalapaka

This session will focus on block based storage. It’s a clustering session. It seems like failover clustering is not opimised for the cloud. *joking*

Sneak Peak at Failover Clustering
– scale up to 4,000 VMs in a cluster
– scale out to 63 nodes in a cluster
– 4 x more than W2008 R2

Note: more persistent reservations and iSCSI-3 resevations to the SAN!

Multi-Machine Management with Server Manager, Featuring Cluster Integration
– Remote server management
– Server groups to manage sets of machines – single click to affect all nodes at once (nice!)
– Simplified management
– Launch clustering management from Server Manager

New Placement Policies
– Virtual Machine Priority: start with the most important VMs first (start backend first, then mid tier, then front tier). Ensure the most important VMs are running – shut down low priority VMs to allow high priority VMs to get access to constrained resources
– Enahance Failover Placement: Each VM based on note with best avaialble memory resources. Memory requirements determined on the per VM basis – finds best node based on how DM is configured. NUMA aware.

VM Mobility
– Live Migration Queing
– Storage Live Migration
– Concurrent Live Migrations – multiple simulataneous LMs for a given source or target
– Hyper-V Replica is integrated with clustering

Cluster Management
Demo: The demo cluster has 4001 VMs and 63 nodes (RDP into Redmond). In the FCM, it is smooth and fast. You can see the priority of each VM. You can search for VMs with basic and complex queries. The thumbnai of the VM is on the FCM.

Guest Clustering – Increased Storage Support
– Most common scenario is SQL Server
– Could only be done in iSCSI. Now we have a virtual fibre channel HBA

VM Monitoring
– Application level recovery: Service Control Manager or event triggered
– Guest Lvel HA Recovery – FC reboots the VM
– Host level HA recovery – FC fails over VM to another node
– Generic health monitoring for any application: Service Control Manager and generation of specific event IDs

VM Monitoring VS Guest Clustering
– VM Monitoring: Application monitoing, simplified configuration and event monitoring – good for tier 2 apps
– Guest clustering: applciation health monitoring, application mobility (for scheduled maintenance) – still for tier 1 apps

Automated Node Draining
Like VMM maintenance mode. Click a node to drain it of hosted roles (VMs).

Cluster Aware Updating
CAU updates alll cluster nodes in an automated fashion without impacting service availability. It is an end to end orchestration of updates. Built on top of WUA. Patching does not impact cluster quorum. Workflow:

– Scan nodes to ID appropriate updates
– ID node with fewest worklaodss
– Place node into maintenance mode to drain
– WSUS update
– Rinse and repeat

The workloads return to their original node at the end of the process.

Note: The machine managing this is called the orchestrator. That might be a little confusing because SC Orchestrator can do this stuff too.
Note: I wonder how well this will play with updates in VMM 2012?

There is extensibility to include firmware, BIOS, etc, via updates, via 3rd party plugin.

Demo: Streaming video from a HA VM. The cluster is updated, the workflow runs, and the videos stay running. The wizard gives you the PSH. You can save that and schedule it. No dedicated WSUS needed by the looks of it.

Cluster Shared Volume
Redirected I/O is b-a-d.

Windows Server 8: Improve backup / restore of CSV. Expanded CSV to include more roles. CSV expands out to 63 nodes. Enables zero downtime for planned and unplanned failures of SMB workloads Provides iteroperability with file system mini-filer drivers (a/v and backup), and lots more.

CSV no longer needs to be enabled. Just right click on a disk to make it a CSV. File systems now appears as CSVFS. It is NTFS under the covers. It enables applications to know they are on CSV and ensure their copatibility.

AV, Continuous data protection, backup and replication all use filter drivers to insert themselves in the CSV pseudo-file system stack.

High speed CSV I/O redirection will have negligible impact. CSV is integrated with SMB mutli-channel. Alows streaming CSV traffic acros multiple networks. Delivers improved performance when in redirected mode. CSV takes advantage of SMB 2 Direct and RDMA

BitLocker is now supported on traditional shared nothing disks and CSV. The Cluster Name Object (CNO) ID is used.

Cluster Storage Requirements Are:
– FC
– SAS RBOD
– Storage Spaces
– RAID HBA/SAS JBOD
– SMB
– iSCSI
– FCoE

Data Replication storage requirements:
– Hardware
– Software replication
– Aplication Replication (Exchange, SQL Denali AlwaysOn)

SCSI Command requirements: storage must support SCSI-3 SPC-3 compliant SCSI Commands.

Cost Effective & Scale Out with Storage Spaces. Integrated and supported by clustering and CSV.

Redirected I/O is normally file level. There is now a block level variant – not covered in this talk.

What if your Storage Spaces servers were in the same cluster as the Hyper-V hosts? High speed block level redirected IO. Simplified management. Single CSV namespace accessiible on all nodes. Unified security model Single cluster to manage. VMs can run anywhere.

Note: Wow!

Called an asymmetric configuration.

CSV Backup
Support for parallel backups on same or different CSV volumes, or on same or different cluster nodes. Improved I/O performance. Direct IO mode for snapshot and backup operations. (!!!) Software snapshots will stay in direct IO mode (!!!!) CSV volume ownership does not change during backup. Improved filter driver support for incremental backups. Backup applications do not need to be CSV aware. Fully compatible with W2008 R2 “requestors”.

Distributed App Consistens VM Shadow Copies:
Saw you have a LUN with VMs scattered across lots of hosts. Can now snap the entire LUN using an orchestrated snapshot.

Comparing Backup With W2008 R2
– Backup app: W2008 R2 rquires CSV aware backup app
– IO performance: No redireced IO for backup
– Locality of CSV volume: Snapshot can be created by any volume
– Complexity: Cluster coordinates the backup process

Note: I’m still trying to get over that we stay in direct IO during a system VSS provider backup of a CSV.

Cluster.exe is deprecated. Not there by default but you can install it in Server Manager. Use PSH instead.

SCSI Inquiry Data (page 83h) is now changed from recommended to required.

Designing Systems for Continuous Availability and Scalability.

Extra session where I ran to in this slot after previous one ended very early.  This one is on storage pools and spaces.  Speaker has a Dell 1U server with a bunch of internal unallocated disks.  Uses PSH to:

  1. New-StoragePool (Get-StorageSubsystem and Get-PhysicalDisk)  The command pools all un-pooled disks.  The disks appear from Disk Manager because they are pooled.
  2. A space (which is a virtual disk) is created: New-VirtualDisk
  3. Initialize-Disk is run to initialise it.
  4. New-Partition formats the disk which is visible in disk manager and can be explored.  Note that it has a drive letter.

Optimized Space Utilisation

  • On-demand provisioning with trim (h/w command that gives space back to the pool when files are deleted) support – for NTFS, Hyper-V, and apps like SQL.
  • Elastic capacity expansion by just adding more disks.  You’ll get alerts when nearly full.
  • Defrag optimized to work with Storage Pools

Resiliency:

  • Mirrored spaces and Parity Spaces with integrated journaling supported.
  • Per-pool hot spare disk supported
  • Application driven intelligent error correction: SQL and Exchange should be able to take advantage of this.

Not very well explained – sorry. 

Demo: he plays a video that is stored on a resilient space and pulls a disk from it.  The video is uninterrupted. 

Spaces have granular access control.  Could be good for multi-tenant deployment – I’m hesitant of that because it means giving visibility of the back end system to untrusted customers (rule #1 is users are stupid).

You can base SLA on the type of disks in your JBOD, e.g. SSD, 15K or SATA.  Your JBOD could be connected to a bunch of servers.  They can create spaces for themselves.  E.g. a file server could have spaces, and use the disk space to store clustered VMs.

Questions to sfsquestions@microsoft.com

Enabling Multi-Tenancy and Converged Fabric for the Cloud Using QoS

Speakers: Charley Wen and Richard Wurdock

Pretty demo intensive session.  We start off with a demo of “fair sharing of bandwidth” where PSH is used with minimum bandwidth setting to provide equal weight to a set of VMs.  One VM is needs to get more bandwidth but can’t get it.  A new policy is deployed by script and it get’s a higher weight. It then can access more of the pipe.  Maximum bandwidth would have capped the VM so it couldn’t access idle b/w.

Minimum Bandwidth Policy

  • Enforce bandwidth allocation –> get performance predictability
  • Redistribute unused bandwidth –> get high link utilisation

The effect is that VMs get an SLA.  They always get the minimum if the require it.  They consume nothing if they don’t use it, and that b/w is available to others to exceed their minimum.

Min BW % = Weight / Sum of Weights

Example of 1 Gbps pipe:

  • VM 1 = 1 = 100 Mbps
  • VM 2 = 2 = 200 Mbps
  • VM 3 = 5 = 500 Mbps

If you have NIC teaming, there is no way to guarantee minimum b/w of total potential pipe. 

Maximum Bandwidth

Example, you have an expensive WAN link.  You can cap a customer’s ability to use the pipe based on what they pay.

How it Works Under the Covers

Bunch of VMs trying to use a pNIC.  The pNIC reports it’s speed.  It reports when it sends a packet.  This is recorded in a capacity meter.    It feeds into the traffic meter and it determines classification of packet.  Using that it figures out if exceeds capacity of the NIC.  The peak bandwidth meter is fed by latter and it stops traffic (draining process). 

Reserved bandwidth meter guarantees bandwidth. 

All of this is software, and it is h/w vendor independent. 

With all this you can do multi-tenancy without over-provisioning.

Converged Fabric

Simple image: two fabrics: network I/O and storage I/O across iSCSI, SMB, NFS, and Fiber Channel.

Expensive, so we’re trying to converge onto one fabric.  QoS can be used to guarantee service of various functions of the converged fabric, e.g. run all network connections through a single hyper-v extensible switch, via 10 Gbps NIC team.

Windows Server 8 takes advantage of hardware where available to offload QoS.

We get a demo where a Live Migration cannot complete because a converged fabric is saturated (no QoS).  In the demo a traffic class QoS policy is created and deployed.  Now the LM works as expected … the required b/w is allocated to the LM job.  The NIC in the demo supports h/w QoS so it does the work.

Business benefit: reduced capital costs by using fewer switches, etc.

Traffic Classification:

  • You can have up to 8 traffic classes – 1 of them is storage, by default by the sound of it.
  • Appears that DCB is involved with the LAN miniport and iSCSI miniport is traffic QoS with traffic classification.  My head hurts.

Hmm, they finished after using only half of their time allocation.

Platform Storage Evolved

“Windows 8 is the most cost effective HA storage solution”

  • Storage Spaces: virtualised storage
  • Offloaded data transfer (ODX)
  • Data deduplication

File System Availability

Confidently deploy 64 TB NTFS volumes with Windows 8 with Online scan and repair:

  • Online repair
  • Online scan and corruption logging
  • Scheduled repair
  • Downtime proportional only to number of logged corruptions: scans don’t mean downtime now
  • Failover clustering & CSV integration
  • Better manageability via Action Center, PowerShell and Server Manager

Note: this means bigger volumes aren’t the big maintenance downtime problem they might have been for Hyper-V clusters. 

Operational Simplicity

Extensible storage management API:

  • WMI programmatic interfaces
  • PSH for remote access and scripting – easy E2E provisioning
  • All new in-box application using one new API
  • Foundational infrastructure for reducing operations expenditure

Multi-vendor interoperability – common interface for IHVs

  • SMI-S standards conformant: proxy service enables broad interoperability with existing SMI-S storage h/w – standards based approach … wonder if the storage manufacturers know that Smile
  • Storage Management Provider interface enables host-based extensibility

Basically everything uses one storage management interface to access vendor arrays, SMI-S compliant arrays, and Storage Spaces compatible JBOD.  The Windows 8 admin tools use this single API via WMI and PowerShell.

We are shown a 6 line PSH script to create a disk pool, create a virtual disk, configure the virtual disk, mount it on the server, and format it with NTFS.

Storage Spaces

New category of cost effective, scalable, available storage, with operationsl simplicity for all customer segments.  Powerful new platform abstractions:

  • Storage pools: units of aggregation (of disks), administration and isolation
  • Storage spaces (virtual disks): resiliency, provisioning, and performance

Target design point:

  • Industry standard interconnects: SATA or (shared) SAS
  • Industry standard storage: JBODs

You take a bunch of disks and connect them to the server with (shared or direct) SAS (best) or direct SATA (acceptable).  The disks are aggregated into pools.  Pools are split into spaces.  You can do CSV, NFS, or Windows Storage Management.  Supports Hyper-V.

Shared SAS allows a single JBOD to be attached to multiple servers to make a highly available and scalable storage fabric.

Capabilities:

  • Optimized storage utilisation
  • Resiliency and application drive error correction
  • HA and scale out with Failover Clustering and CSV
  • Operational simplicity

Demo:

Iometer is running to simulate storage workloads.  40x Intel x25-M 160 GB SSDs connected to a Dell T710 (48 GB RAM, dual Intel CPU) server with 5 * LSI HBAs.  Gets 880580.06 read IOPS with this developer preview pre-beta release.

Changes demo to a workload that needs high bandwidth rather than IOPS.  This time he gets 3311.04 MB per second throughput.

Next demo is a JBOD with a pool (CSV).  A pair of spaces are created in the pool, each assigned to virtual machines.  Both VMs have VHDs.  The VHDs are stored in VHDs.  Both are running on different Hyper-V nodes.  Both nodes access the space via CSV.  In the demo, we see that both nodes can see both pools.  The spaces appear in Explorer with driver letters (Note: I do not like that – indicates a return to 2008 days?).  For some reason he used Quick Migration – why?!?!?  A space is only visible in explorer on a host if the VM is running on that host – they follow when VMs are migrated between nodes. 

Offloaded Data Transfer (ODX)

Co-developed with partners, e.g. Dell Equalogic.  If we copy large files on the SAN between servers, the source server normally has had to do the work (data in, CPU and SAN utilisation), send it over a latent LAN, and then the destination server has to write it to the SAN again (CPU and data out).  ODX offloads the work to a compatible SAN which can do it more quickly, and we don’t get the needless cross LAN data transfer or CPU utilisation.  E.g. Host A wants to send data to Host B.  Token is passed between hosts.  Host A sends job to SAN with the token.  SAN uses this token to sync with host B, and host B reads direct from the SAN, instead of getting data from host A across the LAN.  This will be a magic multi-site cluster data transfer solution.

In a demo, he copies a file from SAN A in Redmond to SAN B in Redmond on his laptop in Anaheim.  With ODX, runs at 250 Mbps with zero data transfer on his laptop, takes a few minutes.  With no ODX, it wants to copy data to Anaheim from SAN A and then copy data from Anaheim to SAN B, would take over 17 hours.

Thin Provisioning Notifications

Can ID thinly provisioned virtual disks. 

Data Deduplication 

Transparent to primary server workload.  Can save over 80% of storage for VHD library, around 50% for general file share.  Deduplication scope is the volume.  It is cluster aware.  It is integrated with BranchCache for optimised data transfer over the WAN.

The speakers run out of time.  Confusing presentation: think the topics covered need much more time.

Designing the Building Blocks for a Windows Server 8 Cloud

Speakers: Yigal Edery and Ross Ortega from Microsoft.

Windows Server 8 apparently is cloud optimized.  That rings a bell … I expect some repetition so I’ll blog the unique stuff.

There is no one right cloud architecture.  The architecture depends on the environment and the requirements.  Don’t take from this that there are no wrong cloud architectures Winking smile  “Building an optimized could requires difficult decisions and trade-offs among an alphabet soup of options”.  This session will try provide some best practices.

Requirements

  • Cost
  • Scalability
  • Reliability
  • Security
  • Performance
  • High availability

Balance these and you get your architecture: workloads, networking, storage and service levels.

Which workloads will run in my cloud?

You need to understand your mission.

  • Cloud aware apps or legacy/stateful apps? Are you IaaS or PaaS or SaaS?
  • Are workloads trusted?  This is an important one for public clouds or multi-tenant clouds.  You cannot trust the tenants and they cannot trust each other.   This leads to some network security design decisions.
  • Compute-bound or Storage-bound?  This will dictate server and storage design … e.g. big hosts or smaller hosts, big FC SAN or lower end storage solution.
  • Workloads size?  And how many per server?  Are you running small apps or big, heavy apps?  This influences server sizing too.  Huge servers are a big investment, and will cost a lot of money to operate while they are waiting to be filled with workloads.

Networking

  • Are you isolating hoster traffic from guest traffic?  Do you want them on the same cable/switches?  Think about north/south (in/out datacenter) traffic and east/west (between servers in datacenter) traffic.  In MSFT datacenters, 70% is east/west traffic.
  • Will you leverage existing infrastructure?  Are you doing green field or not?  Green field gives you more opportunity to get new h/w that can use all Windows Server 8 features.  But trade-off is throwing out existing investment if there is one.
  • Will you have traffic management?

Infiniband VS 10 GBE vs 1 GbE

10 GbE:

  • Great performance
  • RDMA optional for SMB 2.2
  • Offers QoS (DCB) and flexible bandwidth allocation
  • New offloads
  • But physical switch ports are more expensive
  • New tech appears on 10 GbE NICs rather than on 1 BgE

InfiniBand (32 Gb and 56 Gb):

  • Very high performance and low latency
  • RDMA includes for SMB 2.2 file access
  • But network management different than Ethernet.  Can be expensive and requires a different skillset.  Can be hard to find staff, requires specific training.  Not many installations out there.

1 GbE:

  • Adequate for many workloads
  • If investing in new equipment for long life, then invest in 10 GbE to safeguard your investment

Price of WAN traffic is not reducing.  It is stable/stuck.  Datacenter north/south WAN links can be a fraction of the bandwidth of east/west LAN links.

How many NICs should be in the server? 

We are shown a few examples:

Physical Isolation with 4 NICs:

  • Live Migration –1
  • Cluster/Storage – 1
  • Management – 1
  • Hyper-V Extensible Switch – 2 bound together by Windows 8 NIC teaming, use Port ACLs for the VMs

Many people chose 10 GbE to avoid managing many NICs.  Windows Server 8 resloves this with NIC teaming so now you can use the b/w for throughput.

2 NICs with Management and guest isolation:

  • Live Migration, Cluster/Storage, Management (all on different subnets) – 1
  • Hyper-V Extensible Switch – 1 NIC, use Port ACLs for the VMs

1 * 10 GbE NIC:

  • Live Migration, Cluster/Storage, Management all plug into the Hyper-V Extensible Switch.
  • VMs plug into the Hyper-V Extensible Switch
  • 1 * 10 GbE NIC for the Hyper-V Extensible Switch
  • Use QoS to management bandwidth
  • Use Port ACLs for all ports on the Hyper-V Extensible Switch to isolate traffic
  • This is all done with PowerShell

Windows Server 8 NIC Scalability and Performance Features

  • Data Center Bridging (DCB)
  • Receive Segement Coalescing (RSC)
  • Receive Side Scaling (RSS)
  • Remote Direct Memory Access (RDMA)
  • Single Root I/O Virtualisation (SR-IOV)
  • Virtual Machine Queue (VMQ)
  • IPsec Offload (IPsecTO)

Note: no mention of failover or Hyper-V cluster support of the features.  E.g. We don’t recommend TOE in W2008 R2 … not supported.

Using Network Offloads for Increase Scale

  • NIC with RSS for native (parent) traffic: Live Migration, Cluster/Storage, Management
  • NIC with VMQ for virtualisation traffic: Hyper-V Extensible Switch

Note: RSS and VMQ cannot be enabled on the same NIC.  RSS not supported on the Hyper-V switch.

  • Raw performance: RDMA and SR-IOV:
  • Flexibility and scalability: Hyper-V extensible switch, network virtualisation, NIC teaming, RSS, VMQ, IPsecTO

Notes:

  • SR-IOV and RSS work together.
  • Offloads require driver and possibly BIOS support.
  • When you are working with 1 or restricted number of NICs, you need to pick and choose which features you use because of support statements.

Storage

HBAs VS NICs.  HBA (FC, iSCSI, or SAS) bypasses networking stack and has less CPU utilisation.

Storage Architectures

2 possible basic solutions:

  • Internal/DAS disk: cheap with disk bound VMs
  • External disk: expensive but mobile VMs, can grow compute and storage capacity on 2 different axis, compute bound VMs, storage offloading

The Great Big Hyper-V Survey of 2011 findings are that the breakdown in the market is 33% use A, 33% use B, and 33% use both.

Service Levels

  • What performance guarantees do you give to the customers?  More guarantees = more spending
  • How important is performance isolation?
  • What are the resiliency promises?  This is the challenging one: in-datacenter or inter-datacenter. 

More on the latter:

  • Some failure is acceptable.  You can offer cheaper services with storage/compute bound VMs.  Often done by hosters.  Windows Server 8 trying to offer mobility with non HA Live Migration.
  • Failure is not acceptable: Failover clustering: make everything as HA as possible.  Dual power, dual network path, N fault tolerant hosts, etc.  Maybe extend this to another data center.  Often done in private cloud and legacy apps, rarely done by hosters because of the additional cost.  Windows Server 8 trying to reduce this cost with lower cost storage options.

Representative Configurations by Microsoft

Tested in MS Engineering Excellence Center (EEC).  Optimized for different cloud types.  Guidance and PowerShell script samples.  These will be released between now and beta.

Start with:

The traditional design with 4 NICs (switch, live migration, cluster, and parent) + HBA: physically isoated netwowkrs, HBA, and W2008 R2 guidance.

Enable Support for Dmeanding Workloads:

  • Put Hyper-V switch on 10 GbE. 
  • Enable SR-IOV for better scale and lower latency

Enable 10 GbE for Storage:

  • Enable RSS
  • Fast storage
  • Ethernet so you have single skill set and management solution

Converge 10 GbE if you have that network type:

  • Use the NIC for Live Migration, Clsuter/Storage/Management.  Enable QoS with DCB and RSS.  MSFT saying they rarely see 10 GbE being fully used.
  • Switches must support DCB
  • QoS and DCB traffic classes ensure traffic bandwidth allocations

Use File Servers:

  • Share your VM storage using a file server instead of a SAN controller.  Use JBOD instead of expensive SAN.
  • Enable RDMA on file server NIC and converged 10 GbE NIC on host
  • RDMA is high speed, low latency, reduced CPU overhead solution.
  • “Better VM mobility”: don’t know how yet

High Availability and Performance with 3 * 10 GbE NICs

  • 2 teamed NICs for parent, cluster/storage, parent with DCB and RSS (no RDMA)
  • File server has 10 GbE
  • Hyper-V Switch and 10 GbE

Sample Documented Configuration:

  • 10 GbE NIC * 2 teamed for Live Migration, Cluster/Storage, and parent with DCB, RSS, and QoS.
  • 1 * 1 GbE with teaming for Hyper-V switch.
  • File server with 2 by 10 GbE teamed NICs with RSS, DCB, and QoS.
  • File server has FC HBA connected to back end SAN – still have SAN benefits but with fewer FC ports required and simpler configuration (handy if doing auto host deployment)

Damn, this subject could make for a nice 2 day topic.

Do You Dislike Paying vTaxes?

Then you seriously need to look at Hyper-V.  Even now, if you strip vSphere down to it’s most economic deployment with the Standard edition, you can save quite a bit by going with Windows Server 2008 R2 Hyper-V (with Software Assurance or though a scheme with upgrade rights like OVS) and the System Center Management Suite (for managing the entire application/infrastructure stack AKA cloud).  And because Windows Server Hyper-V is not vTaxed cripple-ware, you get access to all of the features.

I mentioned upgrade rights for Windows Server because you will want Windows 8 Server Hyper-V.  If Windows Server 2008 R2 Hyper-V has more features than vSphere Standard (which it does), then Windows 8 Server Hyper-V will leave VMware and their overpaying customers in the dust.

If you’re a VMware customer then you need to look now.  Get a lab machine or two and try it out – do some prep because they are different products.  System Center Virtual Machine Manager will allow you to migrate from vSphere, and you’ll get to focus systems management on what the business cares about: the service.

If you’re a Microsoft partner that’s focused on VMware then go looking for Symon Perriman’s content on Hyper-V training for VMware engineers.  Work with you local Microsoft PTA to get trained up.  in Ireland, MicroWarehouse customers can work with me – I will be running a number of virtualisation and System Center training classes for partners in MSFT Dublin, and I am available to call out to prepare sales staff and account managers.

Windows Server 2008 R2 Hyper-V made an impact.  Windows 8 Server Hyper-V is a game changer.  Ignore it at your peril!