Ignite 2015–Exploring Storage Replica in Windows Server 2016

Speaker: Net Pyle.

What is a Disaster?

Answer: McDonalds running out of food at Ignite. But I digress … you lose your entire server room or data centre.

Hurricane Sandy wiped out Manhattan. Lots of big hosting facilities went offline. Some stayed partially online. And a handful stayed online.

Storage Replica Overview

Synchronous replication between cities. Asynchronous replication between countries. Not just about disaster recovery but also disaster avoidance.

It is volume based. Uses SMB 3.1.1. Works with any Windows data volume. Any fixed disk storage: iSCSI, Spaces, local disk or any storage fabric (iSCSI, FCoE, SAS, etc). You manage it using FCM (does not require a cluster), PowerShell, WMI, and in the future: Azure Site Recovery (ASR).

This is a feature of WS2016 and there is no additional licensing cost.

Demo

A demo that was done before, using a 2 node cluster, file changes in a VM in site A, replicates, and change shows up after failover.

Scenarios in the new Technical Preview

  • Stretch Cluster
  • Server to Server
  • Cluster to Cluster, e.g. S2D to S2D
  • Server to self

Stretch Cluster

  • Single cluster
  • Automatic failover
  • Synchronous

Cluster to Cluster

  • Two separate cluster
  • Manual failover
  • Sync or async replication

Server to Server

  • Two separate servers, even with local storage
  • Manual failover
  • Sync or asynch replication

Server to Self

Replicate one volume to another on the same server. Then move these disks to another server and use them as a seed for replication.

Blocks, not Files

Block based replication. It is not DFS-R. Replication is done way down low. It is unaware of the concept of files so doesn’t know that they are used. It only cares about write IO. Works with CSVFS, NTFS and ReFS.

2 years of work by 10 people to create a disk filter driver that sits between the Volume Manager and the Partition Manager.

Synch Workflow

A log is kept of each write on primary server. The log is written through to the disk  The same log  is kept on the secondary site. The write is sent to the log in parallel on both sites. Only when the secondary site has written to the log in both sites is the write acknowledged

Asynch Workflow

The write goes to the log on site A and acknowledged. Continuous replication sends the write to the log in the secondary site. Not interval based.

SMB 3.1.1.

RDMA/SMB Direct can be used long range with Mellanox InfiBand Metro-X and Chelsio iWarp can do long distance. MSFT have tested 10KM, 25 KM, and 40KM networks to test this. Round trip latencies are hundreds of microseconds for 40 KM one-way (very low latency). SMB 3.1.1 has optimized built-in encryption. They are still working on this and you should get to the point where you want encryption on all the time.

Questions

  • How Many Nodes? 1 cluster with 64 nodes or 2 clusters with 64 nodes each.
  • Is the log based on Jet? No; The log is based on CLFS

Requirements

  • Windows Server Datacenter edition only – yes I know.
  • AD is required … no schema updates, etc. They need access to Kerberos.
  • Disks must be GPT. MBR is no supported.
  • Same disk geometry (between logs, between data) and partition fo rdata.
  • No removable drives.
  • Free space for logs on a Windows NTFS/ReFS volume (logs are fixed size and manually resized)
  • No %Systemroot%, page filem hibernation file or DMP file replication.

Firewall: SMB and WS-MAN

Synch Replication Recommendations

  • <5 MS round trip latency. Typically 30-50 KM in the real world.
  • > 1 Gbps bandwidth end-end between the servers is a starting point. Depends on a lot.
  • Log volume: Flash (SSD, NVME, etc). Larger logs allow faster recovery from larger outages and less rollover, but cost space.

Asynchronous Replication

Latency not an issue. Log volume recommendations are the same as above.

Can we make this Easy?

Test-SRTopology cmdlet. Checks requirements and recommendations for bandwidth, log sizes, IPS, etc. Runs for specified duration to analyse a potential source server for sizing replication. Run it before configuration replication against a proposed source volume and proposed destination.

Philosophy

Async crash consistency versus application consistency. Guarantee mountable volume. App must guarantee a usable file

Can replicate VSS snapshots.

Management Rules in SR V1

You cannot use the replica volume. In this release they only do 1:1 replication, e.g. 1 node to 1 node, 1 cluster to 1 cluster, and 1 half cluster to another half cluster. You cannot do legs of replication.

You can do Hyper-V Replica from A to B and SR from B to C.

Resizing replicated volumes interrupts replication. This might change – feedback.

Management Notes

Latest drivers. Most problems are related to drivers, not SR. Filter drivers can be dodgy too.

Understand your performance requirements. Understand storage latency impact on your services. Understand network capacity and latency. PerfMon and DiskSpd are your friends. Test workloads before and after SR.

Where can I run SR?

In a VM. Requires  WS2016 DC edition. Work on any hypervisor. It works in Azure, but no support statement yet.

Hyper-V Replica

HVR understands your Hyper-V workload. It works with HTTPS and certificates. Also in Std edition.

SR offers synchronous replication. Can create stretched guest clusters. Can work in VMs that are not in Hyper-V.

SQL Availability Groups

Lots of reasons to use SQL AGs. SR doesn’t require SQL Ent. Can replicate VMs at host volume level. SR might be easier than SQL AGs. You must use write ordering/consistency if you use any external replication of SQL VMs – includes HVR/ASR.

Questions

  • Is there a test failover: No
  • Is 5MS a hard rule for sync replication. Not in the code. But over 5 MS will be too slow and degrade performance.
  • Overhead? Initial sync can be heavy due to check-summing. There is a built-in throttle to prevent using too much RAM. You cannot control that throttle in TP2 but you will later.

What SR is Not

  • It is not shared-nothing clustering. That is Storage Spaces Direct (S2D).
  • However, you can use it to create a shared-nothing 2 node cluster.
  • It is not a backup – it will replicate deletions of data very very well.
  • It is not DFS-R, multi-endpoint, not low bandwidth (built to hammer networks),
  • Not a great branch office solution

It is a DR solution with lots of bandwidth between them.

Stretch Clusters

  • Synchronous only
  • Asymmetric storage,e.g. JBOD in one site and SAN in another site.
  • Manage with FCM
  • Increase cluster DR capabilities.
  • Main use cases are Hyper-V and general use file server.

Not for stretch-cluster SOFS – you’d do cluster-to-cluster replication for that.

Cluster-Cluster or Server-Server

  • Synch or asynch
  • Supports S2D

PowerShell

  • New-SrPartnership
  • Set-SRPartnership
  • Test-SrTopology

DiskSpd Demo on Synch Replication

Runs DiskSpd on volume on source machine.

  • Before replication: 63,000 IOPS on source volume
  • After replication: In TPv2 it takes around 15% hit. In latest builds, it’s under 10%.

In this demo, the 2 machines were 25 KM apart with an iWarp link. Replaced this with fibre and did 60,000 IOPS.

Azure Site Recovery

Requires SCVMM. You get end-end orchestration. Groups VMs to replicate together. Supports for Azure Automation runbooks. Support for planned/unplanned failover. Preview in July/August.

Questions:

  • Tiered storage spaces: It supports tiering, but the geometry must be identical in both sides.
  • Does IO size affect performance? Yes.

The Replication Log

Hidden volume.

Known Issues in TP2

  • PowerShell remoting for server-server does not work
  • Performance is not there yet
  • There are bugs

A guide was published on Monday on TechNet.

Questions to srfeed <at> microsoft.com

Ignite 2015–Stretching Failover Clusters and Using Storage Replica in Windows Server 2016

Speakers: Elden Christensen & Ned Pyle, Microsoft

A pretty full room to talk fundamentals.

Stretching clusters has been possible since Windows 2000, making use of partners. WS2016 makes it possible to do this without those partners, and it’s more than just HA, but also a DR solution. There is built-in volume replication so you don’t need to use SAN or 3rd-party replication technologies, and you can use different storage systems between sites.

Assuming: You know about clusters already – not enough time to cover this.

Goal: To use clusters for DR, not just HA.

RTO & RPO

  • RTO: Accepted amount of time that services are offline
  • RPO: Accepted amount of data loss, measured in time.
  • Automated failover: manual invocation, but automated process
  • Automatic failover: a heartbeat failure automatically triggers a failure
  • Stretch clusters can achieve low RPO and RTO
  • Can offer disaster avoidance (new term) ahead of a predicted disaster. Use clustering and Hyper-V features to move workloads.

Terminology

  • Stretch cluster. What used to be aclled a multi-site cluster, metro cluster or geo cluster.

Stretch Cluster Network Considerations

Clusters are very aggressive out of the box: once per second heartbeat and 5 missed heartbeats = failover. PowerShell = (Get-Cluster).SameSubnetThreshold = 10 and (Get-Cluster).CrossSubnetThreshold = 20

Different data centers = different subnets. They are using Network Name Resources  for things like file shares which are registered in DNS depending on which site the resource is active in. The NNR has IP address A and IP Address B. Note that DNS registrations need to be replicated and the TTL has to expire. If you failover something like a file share then there will be some time of RTO depending on DNS stuff.

If you are stretching Hyper-V clusters then you can use HNV to abstract the IPs of the VMs after failover.

Another strategy is that you prefer local failover. HA scenario is to failover locally. DR scenario is to failover remotely.

You can stretch VLANs across sites – you network admins will stop sending you XMas cards.

There are network abstraction devices from the likes of Cisco, which offer the same kind of IP abstraction that HNV offers.

(Get-Cluster).SecurityLevel =2 will encrypt cluster traffic on untrusted networks.

Quorum Considerations

When nodes cannot talk to each other then they need a way to reconcile who stays up and who “shuts down” (cluster activities). Votes are assigned to each node and a witness. When a site fails then a large block of votes disappears simultaneously. Plan for this to ensure that quorum is still possible.

In a stretch cluster you ideally want a witness in site C via independent network connection from Site A – Site B comms. The witness is available even if one site goes offline or site A-B link goes down. This witness is a file share witness. Objections: “we don’t have a 3rd site”.

In WS2016, you can use a cloud witness in Azure. It’s a blob over HTTP in Azure.

Demo: Created a storage account in Azure. Got the key. A container contains a sequence number, just like a file share witness. Configures a cluster quorum as usual. Chooses Select a Witness, and slect Configure a Cloud Witness. Enters the storage account name and pastes in the key. Now the cluster starts using Azure as the 3rd site witness. Very affordable solution using a teeny bit of Azure storage. The cluster manages the permissions of the blob file. The blob stores only a sequence number – there is no sensitive private information. For an SME: a single Azure credit ($100) might last a VERY long time. In testing, they haven’t been able to get a charge of even $0.01 per cluster!!!!

Controlling Failover

Clustering in WS2012 R2 can survive a 50% loss of votes at onces. One site is automatically elected to win. It’s random by default but you can configure it. You can configure manual failover between sites. You do this by manually toggling the votes in the DR site – remove the votes from DR site nodes. You can set preferred owners for resources too.

Storage Considerations

Elden hands over to Ned. Ned will cover Storage Replica. I have to leave at this point … but Ned is covering this topic in full length later on today.

Ignite 2015 – Spaces-Based, Software-Defined Storage–Design and Configuration Best Practices

Speakers: Joshua Adams and Jason Gerend, Microsoft.

Designing a Storage Spaces Solution

  1. Size your disks for capacity and performance
  2. size your storage enclosures
  3. Choose how to handlw disk failures
  4. Pick the number of cluster nodes
  5. Select a hardware solution
  6. Design your storage pools
  7. Design your virtual disks

Size your disks – for capacity (HDDs)

  1. Identify your workloads and resiliency type: Parity for backups and mirror for everything else.
  2. Estimate how much raw capacity you need. Currently capcity x% data grown X data copies (if your using mirrors). Add 12% initially for automatic virtual disk repairs and meta data overhead. Example: 135 TB x 1. x 3 data copies + 12 % = 499 TB raw capacity
  3. Size your HDDs: Pick big 7200 RPM NL SAS HDDs. Fast HDD not required is using SSD tier.

Software Defined Storage Calculator allows you to size and design a deployment and it generates the PowerShell. Works with WS2012 R2 and WS2016, disaggregated and hyperconverged deployments.

Size your disks – for performance (SSDs)

  1. How many SSDs to use. Sweet spot is 1 SSD for every 2-4 HDDs. Typically 4-5 SSDs per enclosure per pool. More SSDs = more absolute performance
  2. Determine the SD size. 800 GB SSDs are typical. Larger SSD capacity = can handle larger amounts of active data. Anticipate around 10% of SSD capacity for automatically repairing after an SSD failure.

Example 36 x 800 GB SSDs.

Size you Enclosures

  1. Pick the enclosure size (12, 24, 60, etc  disks)
  2. Pick the number of enclosures. If you have 3 or 4 then you have enclosure awareness/fault tolerance, depending on type of mirroring.
  3. Each enclosure should have an identical number of disks.

Example, 3 x 60 bay JBODs each with 48 HDDs and 12 SSDs

The column count is fixed between 2 tiers. The smaller tier (SSD) limits the column count. 3-4 columns is a sweet spot.

Expanding pools has an overhead. Not trivial but it works. Recommend that you fill JBODs.

Choose how to Handle Disk Failures

  1. Simultaneous disk failures to tolerate. Use 2 data copies for small deployments and disks, and/or less important data. use 3 data copies for larger deployments and disks, and for more important data.
  2. Plan to automatically repair disks. Instead of hot spares, set aside pool capacity to automatically replace failed disks. Also effects column count … more later.

Example: 3-way mirrors.

Pick the number of Cluster Nodes

Start with 1 node per enclosure and scale up/down depending on the amount of compute required. This isn’t about performance; it’s about how much compute you can afford to lose and still retain HA.

Example: 3 x 3 = 3 SOFS nodes + 3 JBODs.

Select a hardware vendor

  1. DataON
  2. Dell
  3. HP
  4. RAID Inc
  5. Microsoft/Dell CPS

Design your Storage Pools

  1. Management domains: put your raw disks in the pool and manage them as a group. Some disk settings are applied at the pool level.
  2. More pools = more to manage. Pools = fault domains. More pools = less risk – increased resiliency and resiliency overhead..

Start with 84 disks per pool.

Divide disks evenly between pools.

Design your Virtual Disks

  • Where storage tiers, write-back cache and enclosure awareness are set.
  • More VDs = more uniform load balancing, but more to manage.
  • This is where column count come in. More columns = more throughput, but more latency. 3-4 columns is best.
  • Load balancing is dependent on identical virtual disks.
  • To automatically repair after a disk failure, need at least one more disk per tier than columns for the smallest tier, which is usually the SSD tier.
  1. Set aside 10% of SSD and HDD capacity for repairs.
  2. Start with 2 virtual disks per node.
  3. Add more to keep virtual disk size to 10 TB or less. Divide SSD and HDD capacity evenly between virtual disks. Use 3-4 columns if possible.

Best Practices for WS2012 R2

  • Scale by adding fully populated clusters. Get used to the concept of storage/compute/networking stamps.
  • Monitor your existing workloads for performance. The more you know about the traits of your unique workloads, the better future deployments will be.
  • Do a PoC deployment. Use DiskSpd and fault injection to stress the solution. Monitor the storage tiers performance to determine how much SSD capacity you need to fit a given scale of your workloads into SSD tiers.

WORK WITH A TRUSTED SOLUTION VENDOR. Not all hardware is good, even if it is on the HCL. Some are better than others, and some suck. In my opinion Intel and Quanta suck. DataON is excellent. Dell appears to have gone through hell during CPS development to be OK. And some disks, e.g. SanDISK, are  the spawn of Satan, in my experience – Note that Dell use SanDISK and Toshiba so demand Toshiba only SSDs from Dell. HGST SSDs are excellent.

Deployment Best Practices

  • Disable TRIM on SSDs. Some drives degrade performance with TRIM enabled.
  • Disable all disk based caches – if enabled if degrades performance when write-through is used (Hyper-V).
  • Use LB (least blocks) for MPIO policy. For max performance, set individual SSDs to Round Robin. This must be done on each SOFS node.
  • Optimize Storage Spaces repair settings on SOFS. Use Fast Rebuild. Change it from Auto to Always on the pool. This means that 5 minutes after a write failure, a rebuild will automatically start. Pulling a disk does not trigger an automatic rebuild – an expensive process.
  • Install the latest updates. Example: repair process got huge improvement in November 2014 update.

Deployment & Management Best Practices

  • Deploy using VMM or PowerShell. FCM is OK for small deployments.
  • VMM is great for some stuff, but in 2012 R2 it doesn’t do tiering etc. It can create the cluster well and manage shares, but for disk creation, use PowerShell.
  • Monitor it using SCOM with the new Storage Spaces management pack.
  • Also use Test-StorageHealth.PS1 to do some checks occasionally. It needs tweaking to size it for your configuration.

Design Closing Thoughts

  • Storage Spaces solutions offer: 2-4 cluster nodes and 1-4 JBODs. Store 100 to as many as 2000 VMs.
  • Storage Pool Design; HDDs  provide most of the capacity. SSDs offer performance. Up to 84 disks per pool.
  • Virtual Disk design: Set aside 10% of SSD and HDD capacity for repairs. Start with 2 VDs per node. Max 0 TB/virtual disk. 3-4 volums for balanced performance.

Coming in May

  • Storage Spaces Design Considerations Guide (basis of this presentation)
  • Storage Spaces Design Calculator (spreadsheet used in this presentation)

Ignite 2015–Nano Server: The Future of Windows Server

Speaker: Jeffrey Snover

Reasons for Nano Server, the GUI-less installation of Windows Server

 

  • It’s a cloud play. For example, minimize patching. Note that Azure does not have Live Migration so patching is a big deal.
  • CPS can have up to 16 TB of RAM moving around when you patch hosts – no service interruption but there is an impact on performance.
  • They need a server optimized for the cloud. MISFT needs one, and they think cloud operators need one too.

Details:

  • Headless, there is no local interface and no RDP. You cannot do anything locally on it.
  • It is a deep ra-factoring of Windows Server. You cannot switch from Nano to/from Core/Full UI
  • The roles they are focused on are Hyper-V, SOFS and clustering.
  • They also are focusing on born-in-the-cloud applications.
  • There is a zero-footprint model. No roles or features are installed by default. It’s a functionless server by default.
  • 64-bit only
  • No special hardware or drivers required.
  • Anti-malware is built in (Defender) and on by default.
  • They are working on moving over the System Center and app insights agents
  • They are talking to partners to get agent support for 3rd party management.
  • The Nano installer is on the TP2 preview ISO in a special folder. Instructions here.

Demo

  • They are using 3 *  NUC-style PCs as their Nano server cluster demo lab.  The switch is bigger than the cluster, and takes longer to boot than Nano Server. One machine is a GUI management machine and 2 nodes are a cluster. They use remote management only – because that’s all Nano Server supports.
  • They just do some demos, like Live Migration and PowerShell
  • When you connect to a VM, there is a black window.
  • They take out a 4th NUC that has Nano Server installed already, connect it up, boot it, and add it to the cluster.

Notes: this demo goes wrong. Might have been easier to troubleshoot with a GUI on the machine Smile

Management

  • “removing the need” to sit in front of a server
  • Configuration via “Core PoSH” and DSC
  • Remote management/automation via Core PowerShell and WMI: Limited set of cmdlets initially. 628 cmdlets so far (since January).
  • Integrate it into DevOps tool chains

They want to “remove the drama and heroism from IT”. Server dies, you kill it and start over. Oh, such a dream. To be honest, I hardly ever have this issue with hosts, and I could never recommend this for actual application/data VMs.

They do a query for processes with memory more than 10 MB. There are 5.

Management Tools

Some things didn’t work well remotely: Device Manager and remove event logging. Microsoft is improving in these tools to improve them and make remote management 1st class.

There will be a set of web-based tools:

  • Task manager
  • Registry editor
  • Event viewer
  • Device manager
  • sconfig
  • Control panel
  • File Explorer
  • Performance monitor
  • Disk management
  • Users/groups Manager

Also can be used with Core, MinShell, and Full UI installations.

We see a demo of web-based management, which appears to be the Azure Stack portal. This includes registry editor and task manager in a browser. And yes, they run PoSH console on the Nano server running in the browser too. Azure Stack could be a big deal.

Cloud Application Platform:

  • Hyper-V hosts
  • SOFS noes
  • In VMs for cloud apps
  • Hyper-V containers

Stuff like PoSH management coming in later releases.

Terminology

  • At the base there is Nano Server
  • Then there is Server …. what used to be Server Core
  • Anything with a GUI is now called Client, what used to be called Full UI

Client is what MSFT reckons should only be used for RDS and Windows Server Essentials. As has happened since W2008, customers and partners will completely ignore this 70% of the time, if not more.

The Client experience will never be available in containers.

The presentation goes on to talk about development and Chef automation. I leave here.

Platform Vision & Strategy–Storage Overview

Speakers: Siddhartha Roy and Jose Barreto

This will be a very interesting session for people Smile

What is Software Defined Storage?

Customers asking for cost and scales of Azure for their own data center. And this is what Microsoft has done. Most stuff came down from Azure, and some bits went from Server into Azure.

Traits:

  • Cloud-inspired infrastructure and design. Using industry standard h/w, integrating cloud design points in s/w. Driving cloud cost efficiencies.
  • Evolving technologies: Flash is transforming storage. Network delivering extreme performance. Maturity in s/w based solutions. VMs and containers. Expect 100 Gbps to make an impact, according to MSFT. According to Mellanox, they think the sweet sport will be 25 Gbps.
  • Data explosion: device proliferation, modern apps, unstructured data analytics
  • Scale out with simplicity: integrated solutions, rapid time to solution, policy-based management

Customer Choice

The usual 3 clouds story. Then some new terms:

  • Private cloud with traditional storage: SAN/NAS
  • Microsoft Azure Stack Storage is private cloud with Microsoft SDS.
  • Hybrid Cloud Storage: StorSimple
  • Azure storage: public cloud

The WS2012 R2 Story

The model of shared JBOD + Windows Server = Scale-Out File Server is discussed. Microsoft has proven that it scales and performs quite cost effectively.

Storage Spaces is the storage system that replaces RAID to aggregate disks into resilient pools in the Microsoft on-premises cloud.

In terms of management, SCVMM allows bare metal deployment of an SOFS, and then do the storage provisioning, sharing and permissions from the console. There is high performance with tiered storage with SSD and HDD.

Microsoft talks about CPS – ick! – I’ll never see one of these overpriced and old h/w solutions, but the benefit of Microsoft investing in this old Dell h/w is that the software solution has been HAMMERED by Microsoft and we get the fixes via Windows Update.

Windows Server 2016

Goals:

  • Reliability: Cross-site replication, improved tolerance to transient failures.
  • Scalability: Manage noisy neighours and demand surges of VMs
  • Manageability: Easier migration to the new OS version. Improved monitoring and incident costs.
  • Reduced cost: again. More cost-effective by using volume h/w. Use SATA and NVMe in addition to SAS.

Distributed Storage QoS

Define min and max policies on the SOFS. A rate limiter (hosts) and IO scheduler communicate and coordinate to enforce your rules to apply fair distribution and price banding of IOPS.

SCVMM and OpsMgr management with PowerShell support. Do rules per VHD, VM, service or tenant.

Rolling Upgrades

Check my vNext features list for more. The goal is much easier “upgrades” of a cluster so you can adopt a newer OS more rapidly and easily. Avoid disruption of service.

VM Storage Resiliency

When you lose all paths to VM’s physical storage, even redirected IO, then there needs to be a smooth process to deal with this, especially if we’re using more affordable standardized hardware. In WS2016:

  • The VM stack is notified.
  • The VM moves into a PausedCritical state and will wait for storage to recover
  • The VM can smoothly resume when storage recovers

Storage Replica

Built-in synchronous and asynchronous replication. Can be used to replicate different storage systems, e.g. SAN to SAN. It is volume replication. Can be used to create synch (stretch) or asynch (different) clusters across 2 sites.

Ned Pyle does a live demo of a synchronously replicated CSV that stores a VM. He makes a change in the VM. He then fails the cluster node in site 1, and the CSV/VM fail over to site 2.

Storage Spaces Direct (S2D)

No shared JBODs or SAS network. The cluster uses disks like SAS, SATA (SSD and/or HDD) or NVMe and stretches Storage Spaces across the physical nodes. NVMe offers massive performance. SATA offers really low pricing. The system is simple: 4+ servers in a cluster, with Storage Spaces aggregating all the disks. If a node fails, high-speed networking will recover the data to fault tolerant nodes.

Use cases:

  • Hyper-V IaaS
  • Storage for backup
  • Hyper-converged
  • Converged

There are two deployment models:

  • Converged (storage cluster + Hyper-V cluster) with SMB 3.0 networking between the tiers.
  • Hyper-Converged: Hyper-V + storage on 1 tier of servers

Customers have the choice:

  • Storage Spaces with shared JBOD
  • CiB
  • S2D hyper-converged
  • S2D converged

There is a reference profile for hardware vendors to comply with for this solution. E.g. Dell PowerEdge R730XD. HP appollo 2000. C3160 UCS, Lenovo x3650 M5, and a couple more.

In the demo:

4 NVMe + bunch of SATA disks in each of 5 nodes. S2D aggregates the disks into a single pool. A number of virtual disks are created from the pool. They have a share per vDisk, and VMs storage in the shares.

There’s a demo of stress test of IOPS. He’s added a node (5th added to 4 node cluster). IOPS on just the old nodes. Starts a live rebalancing of Storage Spaces (where the high speed RDMA networking is required). Now we see IOPS spike as blocks are rebalanced to consume an equal amount of space across all 5 nodes. This mechanism is how you expand a S2D cluster. It takes a few minutes to complete. Compare that to your SAN!!!

In summary: great networking + ordinary servers + cheap SATA disk gives you great volume at low cost, combined with SATA SSD or NVMe for peak performance for hot blocks.

Storage Health Monitoring

Finally! A consolidated subsystem for monitoring health events of all storage components (spindle up). Simplified: problem identication and alerting.

Azure-Consistent Storage

This is coming in a future release. Coming to SDS. Delivers Azure blobs, tables and account management services for private and hosted clouds. Deployed on SOFS and Storage Spaces. Deployed as Microsoft Azure Stack cloud services. Uses Azure cmdlets with no changes. Can be used for PaaS and IaaS.

More stuff:

  • SMB Security
  • Deduplication scalability
  • ReFS performance: Create/extend fixed VHDX and merge checkpoints with ODX-like (promised) speed without any hardware dependencies.

Jose runs a test: S2D running diskspp against local disk: 8.3 GigaBYTES ps  with 0.003 seconds latency. He does the same from a Hyper-V VM and gets the same performance (over 100 Gbps Connectx-4 card from Mellanox).

Now he adds 3 NVMe cards from Micron. Latency is down to 0.001 MS with throuput of 11 GigaBYTES per second. Can they do it remotely – yup, over a single ConnectX-4 NIC they get the same rate of throughput. Incredible!

Less than 15% CPU utilization.

Ignite 2015 – Platform Vision & Strategy Network Overview

Speakers: Yousef Khaladi, Rajeev nagar, Bala Rajagopalan

I could not get into the full session on server virtualization strategy – meanwhile larger rooms were 20% occupied. I guess having the largest business in Microsoft doesn’t get you a decent room. There are lots of complaints about room organization here. We could also do with a few signs and some food.

Yousef Khaladi – Azure Networking

He’s going to talk about the backbone. Features:

  • Hyper-scale
  • Enterprise grade
  • Hybrid

There are 19 regions which are bigger than AWS and Google combined. There are 85 iXP points, 4400+ connections to 1695 networks. There are 1.4 million miles of fiber in Azure. The NA fiber can wrap around the world 4 times. Microsoft has 15 billion dollars in cloud investment. Note: in Ireland, the Azure connection comes in through Derry.

Azure has automated provisioning with integrated process with L3 at all layers. It has automated monitoring and remediation with low human involvement.

They have moved intelligence from locked in switch vendors to the SDN stack. They use software load balancers in the fabric.

Layered support:

  1. DDOS
  2. ACLs
  3. Viftual network isolation
  4. NSG
  5. VM firewall

Network security groups (NSGs):

  • Network ACLs that can be assigned to subnets or VMs
  • 5-tuple rules
  • Enables DMZ subnets
  • Updated independent of VMs

Build an n-tier application in a single virtual network and isolate the public front end using NSGs.

ExpressRoute:

  • Now supports Office 365 and Skype for Business
  • The Premium Add-on adds virtual network global connectivity, up to 10,000 routes (instead of 4000) and up to 100 connected virtual networks

Cloud Inspired Infrastructure

It takes time to deploy a service on your own infrastructure. The processes are there as a caution against breaking already complicated infrastructure. You can change this with SDN.

Today’s solution first: Lots of concepts and pretty pictures. Not much to report.

New Stuff

VXLAN is coming to Microsoft SDN. They are taking convergence a step further. RDMA storage NICs can be converged and also used for tenant traffic. There will be a software load balancer. There will be a control layer in WS2016 called a network controller. This is taken from Azure. There is a distributed load balancer and software load balancer in the fabric.

IPAM can handle multiple AD forests. IPAM adds DNS management across multiple forests.

Back to RDMA – if you’re using RDMA then you cannot converge it on WS2012 R2. That means you have to deploy extra NICs for VMs, In WS2016, you can enable RDMA on management OS vNICs. This means you can converge those NICs for VM and host traffic.

TrafficDirect moves interrupt handing from the parent partition to the virtual switch where it can be handled more efficiently. In a stress test, he doubles traffic into a VM via a stress test, over 3+ million packets per second.

Summary

The networking of Azure is coming to on-premises in WS2016 and the Azure Stack. This SDN frees you from the inflexibility of legacy systems. We get additional functionality that will increase security and HA, while reducing costs.

Ignite 2015–Keynote

_V2E8381

The room is huge. The screens feature a sports-style “before the event” set of reports and interviews to entertain the audience – necessary because the wifi is frakked. Brad Anderson goes through lots of stats, including that there are 24,000 IT pros attending Ignite – which is actually small for this venue according to a taxi driver we talked to over the weekend.

Here goes the start of Ignite – Spark the Future

Here goes the start of Ignite – Spark the Future

Rapper, Common, opens the show, walking through the crowd, evangelizing us to spark the future and to drive change.

_V2E8393

Satya Nadella

The Microsoft CEO comes out to laucnh Ignite. He tells us that this was an important time to bring more IT pros together, thus the conference merger that makes Ignite. It makes sense – Microsoft products are not vertical solutions; they’re integrated. This is more than technology; it’s how we do business, partner, and meet the real world needs of our customers.

_V2E8398

What is mobile first, cloud first. Mobile first is not about the monbility of  a single device. What matters is the mobility of our experiences across all devices. Cloud first is the back-end enabler that adds intelligence. We are not there yet – there’s years of evolution.

There will be more devices than people on the planet – see IoT. Cloud will be required to support them.

There is a tension to manage this changing IT landscape. They want to enable users to have friction free and mobile choice computing, while maintaining security and privacy. Business needs to choose SaaS of choice, but with control and efficiency. We will manage the data deluge. Big data is not equal to big insights, but that’s what Mirosoft is chasing. There are questions about what clouds you will work with – the diversity of workloads will drive decisions about hybrid computing. Here lies the opportunity for IT.

image

And here is where Microsoft wants to win with 3 interlocking ambitions:

1) The era of more personal computing
2) Reinventing the process of how we work
3) Building intelligence the intelligent backend for these applications

Creating more personal computing.  What matters most is the mobility of experience. Human interaction should be natural. They are investing in mouse, keyboard, hologram, ink and touch.

We will see Surface Hub and HoloLens, samples of new types of devices that will change how we work. Innovation of silicon hardware and software together enables this. The most profound change is the new generation of Windows, Windows 10 delivered as a service.

Announcing a new capability in Windows 10. Windows Update today has great reach in consumer world. A new business capability will drive improvement for business users. Details not shared.

He wants to reinvent productivity and business process. New tools like Cortana (5 countries only), Sway, Delve, PowerBI and sales productivity (CRM online). Microsoft is building a control plane to enforce compliance.

Announcing Office 2016 public preview. Skype for Business Broadcasting. Office Delve organizational something.

Building the intelligent cloud:

Having data alone is useless. You need the tooling to get insights. The data center must be transformed to enable choice between public and private cloud, and to enable tiering across the two.

There are some server announcements. Server & System Center 2016 preview. SQL Server 2016 preview too. The Operations Management Suite is one IT control plane for all virtual machines and servers irrespective of which data center they are in: server health availability, backup, orchestration. Avcanced Threat Analytics is a new security solution.

The CEO of Real Madrid is brought out to share their story. FC Barcelona fans storm out in a huff.

Real Madrid is a members club with 90,000 owners that requires great social media. They are using big data analysis for partner operations, player analystics, and fan engagement. They used Microsoft technology to transform their business.

Satya wants to close out by talking about the rest of the keynote.

Joe Belfiore

His mission: your end users are going to love and desire Windows 10.

image

Need to make it easy for XP/Windows 7 users to get a familiar experience with a balance of things are where they expect them, and new features with help.

There are new ways to interact: with Edge, pen, Cortana, and the security improvements make Windows 10 more human friendly.

Straight into the demos, starting with the beginning for Windows 7 users. The start menu must balance familiarty and new features. They think they are near the final design now. Jump lists are back in for Windows 7 users. Live Tiles are there for Windows 8 users in the menu – this is a more natural approach on the PC for Windows 7 users IMO.

The task bar has a button for ALT-TAB to switch between apps. Only 5-10% of users use ALT-TAB. Universal Apps work just like programs from the user perspective. Now some new stuff. CTRL Windows plus arrow flips between desktops. You can drag and drop apps to another desktop now (applause).

Cortana. Boo! 5 countries that it will support care. I fall asleep. Cortana via PowerBI and Azure AD can tell Joe how many people were registered for Ignite as of a week ago. Very useful demo of real business usage: simple questions asked of the PC, and useful answers pulled from big data.

Next on to Edge, the new browser, a universal app with protections and high performance. Joe talks about extension support. He has a BBC Mundo (Spanish) page. He goes into reading mode and a translator extension automatically translates the page into his native language.

He has a phone and PC. Outlook mail is open on both. Both are similar looking – it is literally the same code. Same with Word. Adaptive UI capabilities in Windows re-lays out an app for the screen size and input methods. Everywhere from HoloLens and phone to massive Surface Hub: 1 app.

Continuum transforms your device for mobile scenarios without compromise. he opens apps on dekstop mode in a Surface Pro 3. Takes off the keyboard. A popup asks if he wants to go into tablet mode. And the primary app now goes full screen. He can still swipe from left to switch apps. The Action Center is there. And the start menu is a full screen.

Continuum also goes the other way. There’s an 8” Lenovo tablet where the default usage is in tablet mode. The start screen scrolls vertically like on the phone.  Same task switching and menu buttons. There is a system-wide back button for app navigation, like in Android. The tablet can run Win32 apps. He docks it, and the machine goes into PC mode with a nice big desktop on the monitor. The apps are now in Windows on the desktop. Users get natural UI for the way the device is currently being used.

Now: Windows Phone docks via MiraCast and Bluetooth (simulation now due to lack of phone hardware) but you get a desktop on a monitor and can run apps on the phone via the mouse and keyboard. It’s the same programs as on a tablet or PC: Universal Apps. The Start Menu is the start screen of the phone. This will revolutionise mobile computing IMO. The phone is the dominant form factor and Microsoft is the first to offer this package.

Joe promises that users will love Windows. Security will “smile and wink at them” while keeping your business secure.

Windows Hello and Microsoft Passport demo combination. Passport replaces passwords. It enables 2-factor authentication (e.g. phone and PC). Hello uses biometrics with special hardware. He has covered a camera with a black cloth. He pulls the cloth, and is logged in instantly.

BitLocker is up next. You get more control over how data moves between apps. He has a “secret” doc in Word. The default save action is to save and encrypt the document. Some docs are green (encrypted) and black (not encrypted).. Selects some old docs in File Explorer and encrypts them. Files can be shared via USB key but the docs remain encrypted and useable by authorised users. IT gets tools to set the right policies to make actions default and/or natural. This supports third party apps.

Gurdeep Singh Pall

Another corporate VP, wanting to talk to us about reinventing productivity. 20 years ago he worked on TCP-IP to work on pre-Internet slow networks.

If the rate of change on the outside exceeds the rate of change on the inside, the end is near – Jack Welch. Once, companies stayed 75 years on the Fortune 500. Now they stay there for 15 years. The Miillenials are the carriers of that change. By 2020, the majority of workers will be post-Internet millenials. They work different, the talk different, they use different.

  • Work is what you do, not where you go.
  • Individual productivity is important, but teams of people accomplish things.

Serial workflow of the past will not succeed in the future (is that now?). Microsoft considers themselves the custodians of productivity:

  • Teams: very dynamic and needs to be simple to create/disband teams via self-service. This is empowered by O365 via groups.
  • Work from anywhere: Focused on mobile experience. The phone is not mobile – it cannot move without you. The experience is important: across all devices. Skype had 500 million downloads on Google play on Valentines day.
  • Meetings: Most meetings have remote attendees. The remote person is usually the special guest so they cannot be second class. Video is a huge bet for Microsoft: Skype for Business. Half of all Skype calls are video – bringing that experience to work. 55% of communications is body language. Average meeting takes 30 minutes to get going. They want to eliminate that with Surface Hub and partners. “Stop using Webex and other tech from the last decade and use your money on better things” – after showing HoloLens video.
  • Content co-creation; Office 2016. Innate collaboration built-in.
  • Intelligence: 4.4 zetabytes in 2013. 44 zetabytes in 2020. That data will inundate you! Requires intelligence. This is where things like Delve and PowerBI become important.

Julia White

The fast-talking Julia White comes out to demo the previous 5 concepts with Microsoft tech. Delve is first, pulling in information from Office 365 and SalesForce (possible via API). Office 365 is the YouTube of the enterprise. Office 365 groups is a dynamic team with content gathered in one place via Delve. Skype for Business has federation and consumer connectivity.

Sneak Peek of a feature coming to O365 later this year. It’s a dashboard of social interaction to breakdown how time us being used, connections made, interactions, etc. She can see that lots of time is used on meetings and dives into that to get analysis. Outlook supports O365 groups for meeting invitations. When she clicks attach, the most recently used docs are in a nice jump list – TIME SAVER!

Does a Skype for Business video meeting. Opens Word and shares a doc so everyone can co-create. Nothing new about the concept but … now this works with the desktop app via Office 2016.  Now we’re cooking with diesel.

Sway is coming to O365 business and education plans in June. Julia has the Surface Hub out on stage. Feature list: all that’s missing is coffee maker. Does some whiteboard notes on the device. Adds in a person mid-meeting. Does stuff with PowerBI to visualise data. Hard to keep up because … well .. she talks fast.

Skype Broadcast is shown. There’s a sentiment chart for what’s happening at the moment. The producer view enables you to switch between feeds, etc. It’s a part of Skype and O365.

Lots of productivity solutions made from integrated solutions.

Back to Gurdeep.

Once again, we get the "IT is at the interaction" slide gluing togther the 3 concepts from Satya’s keynote. We are Q in James Bond. Take the products to our users and make them productive.

Brad Anderson

image

Essential features in cloud and computing:

  • Tustworthy
  • Flexible
  • Integrated
  • Intelligent

The nature of threats has changed. Damage and theft are caused primarily by compromised identity. 

Security starts with the device. Your ID managment, Azure AD, stretches ID to the cloud so you can control ID policy. EMS provides a way to manage devices and applications that empowers users but keeps security and protection under IT control. This is a modern architecture for what the user and business both want: mobile first and cloud first.

MSFT offers defense in depth. Protect:

  • device
  • apps
  • files
  • identity (the glue)

Windows 10 was designed for enterprise defence against modern attack methods. There are a variety of uses cases: factory automation, IoT, personal phones, etc.

Brad opens an email on Windows 7. It looks legit. Some code will execute: the firewall and antimalware services are turned off. Same attack on Windows 10. Device Guard prevents this unauthorised code from running.

There is a full set of MDM features for ConfigMgr and Intune.

Application management is about separating personal from corporate apps and data. Feedback was that users wanted Outlook support. He has an iPad running with Outlook. The demo gods descend and prevent Outlook from starting – it freezes and crashes. Intune can now enforce policies in Outlook. He copies text from an email and attemps to paste it in. Paste works fine. Now he opens Twitter and the paste option is missing. Policy prevents data from moving from corporate to personal apps.

Feedback: users want to use apps for personal and professional stuff. IT can allow this now.

Data Leakage Protection (DLP) is in Windows 10 too. The message for this allows users to override the block, but this is logged for later auditing. This works in programs natively: no wrapping required.

They want people to distribute apps via the Windows Store. RDS is also available. Now he’s showing off azure RemoteApp. We get a demo on the iPad with a Windows touch app.

Files can be self-protecting: Azure Rights Management. Telemetry is sent to a central management site so IT/security/auditors can track file usage and transport. In a demo we see a person tried to open a file unsuccessfully a number of times. A world map shows good/failed opens with names and a timeline.  The business can track the usage of their sensitive information.

On to identity. AD is the traditional system we have used on premises. Cloud App Discovery finds the SaaS apps that people are using, and therefore  using bad ID. You can bring these apps under the control of Azure AD for single/shared sign-in with IT control over shadow IT. If a person leaves the company, they lose access to SalesForce.

Advanced Threat Analytics allows IT to track log-ins. For example, a user logs into machines in 2 countries at the same time. MSFT are searching for ID that is for sale in the dark web to alert you. This works with Azure AD and on-premises AD (via acquisition). There’s an on-premises demo. A developer is trying to access LOB apps that are outisde his scope of work. All of this is audited and presented in a dashboard. His device tried to run a couple of attacks against a DC. There was a brute force attack on his account that succeeded. All of this is shown in a timeline in the dashboard. (applause).

Here comes Terry Myerson to talk about how Windows 10 adds more.

Terry Myerson

He’s talking about a new update mechanism for Windows 10. The room is starting to empty. This keynote is too long. @cloud_girl_mwh tells me that the floor at the back is full of people sitting on the floor – not enough space.

858 million diverse Windows devices will be updated by Windows Update. Android: Google takes no responsibility for updating their devices: up to the telephone companies who rarely issue updates.

Windows 10 is introducing long-term servicing branches. Only security updates will be in the long-term branches, keeping mission critical devices secure. Consumers will get Windows as a Service, continually getting innovation. They’ll get security updates from Windows Update and feature updates. They’ll also spread this out over more than just 1 day per month. There will be distribution rings. Some want it fast, and others are more cautious. Windows Update for Consumers will offer this, as in the preview.

They want to address issues with updates for business customers doing selective patching. This can leave security holes and configuration fragmentation. The process is thankless and tiresome. Today they are announcing Windows Update for Business: best of both worlds. IT control over the automated process of delivering innovation and security updates. Free for Windows 10 Pro and Enterprise.

  • Distribution rings
  • Maintenance windows
  • peer-peer delivery: better usage of bandwidth for remote sites
  • Integrates with existing tools (Intune and SCCM) for single pane of glass management

At this point 1/5 of the room has emptied. There’s a huge queue of people at the back trying to exit the room.

System Center 2012 SCCM will offer support for Windows 10.

40% of IT spend is in shadow IT on SaaS apps, outside of the control of IT. Microsoft is offering a solution to bring these under IT control.

Stretch database. Stretch SQL Server from your data center into Azure. You can stretch a part of a table (cold data) and place it in the cloud where storage is cheap.

Onto Windows Server and System Center and Azure. The Azure Pack is evolving. People want all of Azure in the data center. The Azure Stack provides the entire IaaS and PaaS environment in private or hosted deployments. Customers can have their own cloud-inspired infrastructure. This includes service load balancing.

Windows Server 2016 technical preview 2 is out today. System Center 2016 preview is out next week. Micro applications are possible on Windows Server in 2016. This is Docker on Windows. There is desired configuration management on Linux.

Here comes Jeff Woolsey

Jeff Woolsey

He’s doing an Azure Stack demo. He shows RBAC in Azure. Now he shows the on-premises Azure Stack. This has the same blade-based Ibiza UI as Azure. The UI looks identical. RBAC, blob storage, etc, all there. Software defined networking from Azure fabric comes to Windows Server 2016. We see JSON based IaaS template deployment.

Back to Brad.

Microsoft Operations management Suite (OMS) gives you any location/cloud/OS/application management: DR, Hyper-V, VMware, backup, etc. This is EMS for data centers. Here’s Jeff to demo.

OMS will be avaialble for free this week. It appears to be a re-labelled Operational Insights. You can link things like SCOM or Azure Storage Accounts. He can import custom logs. Marketing has definitely made over Operational Insights. This is probably still not a SCOM replacement – probably still needed to aggregate health/performance stuff (guess). Data analytics is done by Hadoop in the back.

Back to Brad.

And that was it. In my opinion:

  • I would have like to have seen more Windows Server & System Center
  • The types of demo were prefect: solutions from integrated products.
  • There were lots of announcements
  • It was 1 hour too long.

Altaro Webinar Recording and Slides – What’s New in Hyper-V vNext

I recently co-presented a webinar by Altaro with Rick Claus (Microsoft) and Andrew Syrewicze (MVP) on what’s coming in the next version of Windows Server Hyper-V. Altaro has a recording of the webinar online. That page will be updated soon with a written Q&A from the ssession; we had A LOT of questions and Altaro asked me to write out responses which I did last Friday night. You can also download a PDF copy of the slides from the session.

Thank you to everyone that joined us. We had a great number of people tuned in – I was stunned when the folks at Altaro broke down the numbers. Hopefully, I’ll see some of you tomorrow night in the webinar I am co-presenting for StarWind on using ODX or VAAI to enhance storage performance for Hyper-V or vSphere respectively.

My TechEd Europe 2014 Session Is On Channel 9 Website

Microsoft has published my session from TEE14 (From Demo to Reality: Best Practices Learned from Deploying Windows Server 2012 R2 Hyper-V) onto the event site on Channel 9; In this session I cover the value of Windows Server 2012 R2 Hyper-V:

  • How Microsoft backs up big keynote claims about WS2012 R2 Hyper-V
  • How they enable big demos, like 2,000,000 IOPS from a VM
  • The lesser known features of Hyper-V that can solve real world issues

The deck was 84 slides and 10 demos … in 74 minutes. The final feature I talk about is what makes all that possible.

 

TEE14–Azure Migration Accelerator and ASR Using InMage Scout

Speaker Murali KK

Business Continuity Challenges

Too many roadblocks out there:

  • Too many complications, problems and mistakes.
  • Too much data with insufficient protection
  • Not enough data retention
  • Time-intensive media management
  • Untested DR & decreasing recovery confidence
  • Increasing costs

Businesses need simpler and standardized DR. Costs are too high in terms of OPEX, CAPEX, time, and risk.

Bypassing Obstacles

  • Automate, automate, automate
  • Tigther integration between systems availablity and data protection
  • Increase bradth and depth of continuity protection
  • Eliminate the tape problem. Object? You still using punch cards?
  • Implement simple failover and testing
  • Get predictable and lower costs and operations availability

Moving into Microsoft Solutions …

There is not one solution. There are multiple solutions in the MSFT portfolio.

  • HA is built into clustering for on-premise availability on infrastructure
  • Guest OS HA can be achieved with NLB, clustering, SQL, and Exchange
  • Simple backup protection with Windows Server Backup (for small biz)
  • DPM for scalable backup
  • Integrate backup (WSB or DPM) into Azure to automate off-site backup to affordable tapeless and hugely scalable backup vaults
  • Orchestrated physical, Hyper-V, and VMware replication & DR using Azure Site Recovery. Options include on-premises to on-premises orchestration, or on-premises to Azure orchestration and failover.

image

 

Heterogeneous DR

Covering physical servers and VMware virtual machines. This is a future scenario based on InMage Scout.

A process server is a physical or virtual appliance deployed in the customer site. An Image  Scout data channel allows replication into the customers virtual network/storage account. A configuration server (central managemetn of scout) and master target (repository and retention) run in Azure. A multi-tenant RX server runs in Azure to manage InMage service.

How VMware to VMware Replication Works Now

This is to-on-premises replication/orchestration:

image

Demo

There are two vSphere environments. He is going to replicate from one to another. CS and RX VMs are running as VMs in the secondary site.

There is application consistency leveraging VSS. A bookmarking process (application tags) in VMs enables failover consistency of a group of servers, e.g. a SharePoint farm.

In Scout vContinuum he enters the source vSphere details and credentials. A search brings up the available VMs. Selecting a VM shows the details and allows you to select virtual disks (exclude temp/paging file disks to save bandwidth). Then he enters the target vSphere farm details. A master target (a source Windows VM) that is responsible for receiving the data is selected. The replication policy is configured. You can pick a data store. You can opt to use Raw Device Mapping for larger performance requirements. You can configure retention – the ability to move back to an older copy of the VM in the DR site (playback). This can be defined by hours, days, or a quote of storage space. Application consistency can be enabled via VSS (flushes buffers to get committed changes).

MA Offers

  • Support to migrate heterogenous workloads to Azure. Physical (Windows), Virtual and AWS workloads to Azure
  • Multi-tenant migration portal.
  • And more Smile I can’t type fast enough!

You require a site-to-site VPM or a NAT IP for the cloud gateway. You need to run the two InMage VMs (CS and MT) running in your subscription.

There was a little bit more, but not much. Seems like a simple enough solution.