Speaker: Net Pyle.
What is a Disaster?
Answer: McDonalds running out of food at Ignite. But I digress … you lose your entire server room or data centre.
Hurricane Sandy wiped out Manhattan. Lots of big hosting facilities went offline. Some stayed partially online. And a handful stayed online.
Storage Replica Overview
Synchronous replication between cities. Asynchronous replication between countries. Not just about disaster recovery but also disaster avoidance.
It is volume based. Uses SMB 3.1.1. Works with any Windows data volume. Any fixed disk storage: iSCSI, Spaces, local disk or any storage fabric (iSCSI, FCoE, SAS, etc). You manage it using FCM (does not require a cluster), PowerShell, WMI, and in the future: Azure Site Recovery (ASR).
This is a feature of WS2016 and there is no additional licensing cost.
A demo that was done before, using a 2 node cluster, file changes in a VM in site A, replicates, and change shows up after failover.
Scenarios in the new Technical Preview
- Stretch Cluster
- Server to Server
- Cluster to Cluster, e.g. S2D to S2D
- Server to self
- Single cluster
- Automatic failover
Cluster to Cluster
- Two separate cluster
- Manual failover
- Sync or async replication
Server to Server
- Two separate servers, even with local storage
- Manual failover
- Sync or asynch replication
Server to Self
Replicate one volume to another on the same server. Then move these disks to another server and use them as a seed for replication.
Blocks, not Files
Block based replication. It is not DFS-R. Replication is done way down low. It is unaware of the concept of files so doesn’t know that they are used. It only cares about write IO. Works with CSVFS, NTFS and ReFS.
2 years of work by 10 people to create a disk filter driver that sits between the Volume Manager and the Partition Manager.
A log is kept of each write on primary server. The log is written through to the disk The same log is kept on the secondary site. The write is sent to the log in parallel on both sites. Only when the secondary site has written to the log in both sites is the write acknowledged
The write goes to the log on site A and acknowledged. Continuous replication sends the write to the log in the secondary site. Not interval based.
RDMA/SMB Direct can be used long range with Mellanox InfiBand Metro-X and Chelsio iWarp can do long distance. MSFT have tested 10KM, 25 KM, and 40KM networks to test this. Round trip latencies are hundreds of microseconds for 40 KM one-way (very low latency). SMB 3.1.1 has optimized built-in encryption. They are still working on this and you should get to the point where you want encryption on all the time.
- How Many Nodes? 1 cluster with 64 nodes or 2 clusters with 64 nodes each.
- Is the log based on Jet? No; The log is based on CLFS
- Windows Server Datacenter edition only – yes I know.
- AD is required … no schema updates, etc. They need access to Kerberos.
- Disks must be GPT. MBR is no supported.
- Same disk geometry (between logs, between data) and partition fo rdata.
- No removable drives.
- Free space for logs on a Windows NTFS/ReFS volume (logs are fixed size and manually resized)
- No %Systemroot%, page filem hibernation file or DMP file replication.
Firewall: SMB and WS-MAN
Synch Replication Recommendations
- <5 MS round trip latency. Typically 30-50 KM in the real world.
- > 1 Gbps bandwidth end-end between the servers is a starting point. Depends on a lot.
- Log volume: Flash (SSD, NVME, etc). Larger logs allow faster recovery from larger outages and less rollover, but cost space.
Latency not an issue. Log volume recommendations are the same as above.
Can we make this Easy?
Test-SRTopology cmdlet. Checks requirements and recommendations for bandwidth, log sizes, IPS, etc. Runs for specified duration to analyse a potential source server for sizing replication. Run it before configuration replication against a proposed source volume and proposed destination.
Async crash consistency versus application consistency. Guarantee mountable volume. App must guarantee a usable file
Can replicate VSS snapshots.
Management Rules in SR V1
You cannot use the replica volume. In this release they only do 1:1 replication, e.g. 1 node to 1 node, 1 cluster to 1 cluster, and 1 half cluster to another half cluster. You cannot do legs of replication.
You can do Hyper-V Replica from A to B and SR from B to C.
Resizing replicated volumes interrupts replication. This might change – feedback.
Latest drivers. Most problems are related to drivers, not SR. Filter drivers can be dodgy too.
Understand your performance requirements. Understand storage latency impact on your services. Understand network capacity and latency. PerfMon and DiskSpd are your friends. Test workloads before and after SR.
Where can I run SR?
In a VM. Requires WS2016 DC edition. Work on any hypervisor. It works in Azure, but no support statement yet.
HVR understands your Hyper-V workload. It works with HTTPS and certificates. Also in Std edition.
SR offers synchronous replication. Can create stretched guest clusters. Can work in VMs that are not in Hyper-V.
SQL Availability Groups
Lots of reasons to use SQL AGs. SR doesn’t require SQL Ent. Can replicate VMs at host volume level. SR might be easier than SQL AGs. You must use write ordering/consistency if you use any external replication of SQL VMs – includes HVR/ASR.
- Is there a test failover: No
- Is 5MS a hard rule for sync replication. Not in the code. But over 5 MS will be too slow and degrade performance.
- Overhead? Initial sync can be heavy due to check-summing. There is a built-in throttle to prevent using too much RAM. You cannot control that throttle in TP2 but you will later.
What SR is Not
- It is not shared-nothing clustering. That is Storage Spaces Direct (S2D).
- However, you can use it to create a shared-nothing 2 node cluster.
- It is not a backup – it will replicate deletions of data very very well.
- It is not DFS-R, multi-endpoint, not low bandwidth (built to hammer networks),
- Not a great branch office solution
It is a DR solution with lots of bandwidth between them.
- Synchronous only
- Asymmetric storage,e.g. JBOD in one site and SAN in another site.
- Manage with FCM
- Increase cluster DR capabilities.
- Main use cases are Hyper-V and general use file server.
Not for stretch-cluster SOFS – you’d do cluster-to-cluster replication for that.
Cluster-Cluster or Server-Server
- Synch or asynch
- Supports S2D
DiskSpd Demo on Synch Replication
Runs DiskSpd on volume on source machine.
- Before replication: 63,000 IOPS on source volume
- After replication: In TPv2 it takes around 15% hit. In latest builds, it’s under 10%.
In this demo, the 2 machines were 25 KM apart with an iWarp link. Replaced this with fibre and did 60,000 IOPS.
Azure Site Recovery
Requires SCVMM. You get end-end orchestration. Groups VMs to replicate together. Supports for Azure Automation runbooks. Support for planned/unplanned failover. Preview in July/August.
- Tiered storage spaces: It supports tiering, but the geometry must be identical in both sides.
- Does IO size affect performance? Yes.
The Replication Log
Known Issues in TP2
- PowerShell remoting for server-server does not work
- Performance is not there yet
- There are bugs
A guide was published on Monday on TechNet.
Questions to srfeed <at> microsoft.com