{"id":18128,"date":"2015-05-07T21:49:17","date_gmt":"2015-05-07T21:49:17","guid":{"rendered":"https:\/\/aidanfinn.com\/?p=18128"},"modified":"2015-05-07T23:12:39","modified_gmt":"2015-05-07T23:12:39","slug":"ignite-2015exploring-storage-replica-in-windows-server-2016","status":"publish","type":"post","link":"https:\/\/aidanfinn.com\/?p=18128","title":{"rendered":"Ignite 2015&ndash;Exploring Storage Replica in Windows Server 2016"},"content":{"rendered":"<p>Speaker: Net Pyle.<\/p>\n<h2>What is a Disaster?<\/h2>\n<p>Answer: McDonalds running out of food at Ignite. But I digress \u2026 you lose your entire server room or data centre.<\/p>\n<p>Hurricane Sandy wiped out Manhattan. Lots of big hosting facilities went offline. Some stayed partially online. And a handful stayed online. <\/p>\n<h2>Storage Replica Overview<\/h2>\n<p>Synchronous replication between cities. Asynchronous replication between countries. Not just about disaster recovery but also disaster avoidance.<\/p>\n<p>It is volume based. Uses SMB 3.1.1. Works with any Windows data volume. Any fixed disk storage: iSCSI, Spaces, local disk or any storage fabric (iSCSI, FCoE, SAS, etc). You manage it using FCM (does not require a cluster), PowerShell, WMI, and in the future: Azure Site Recovery (ASR).<\/p>\n<p>This is a feature of WS2016 and there is no additional licensing cost.<\/p>\n<h2>Demo<\/h2>\n<p>A demo that was done before, using a 2 node cluster, file changes in a VM in site A, replicates, and change shows up after failover.<\/p>\n<h2>Scenarios in the new Technical Preview<\/h2>\n<ul>\n<li>Stretch Cluster<\/li>\n<li>Server to Server<\/li>\n<li>Cluster to Cluster, e.g. S2D to S2D<\/li>\n<li>Server to self<\/li>\n<\/ul>\n<h2>Stretch Cluster<\/h2>\n<ul>\n<li>Single cluster<\/li>\n<li>Automatic failover<\/li>\n<li>Synchronous<\/li>\n<\/ul>\n<h2>Cluster to Cluster<\/h2>\n<ul>\n<li>Two separate cluster<\/li>\n<li>Manual failover<\/li>\n<li>Sync or async replication<\/li>\n<\/ul>\n<h2>Server to Server<\/h2>\n<ul>\n<li>Two separate servers, even with local storage<\/li>\n<li>Manual failover<\/li>\n<li>Sync or asynch replication<\/li>\n<\/ul>\n<h2>Server to Self<\/h2>\n<p>Replicate one volume to another on the same server. Then move these disks to another server and use them as a seed for replication.<\/p>\n<h2>Blocks, not Files<\/h2>\n<p>Block based replication. It is not DFS-R. Replication is done way down low. It is unaware of the concept of files so doesn\u2019t know that they are used. It only cares about write IO. Works with CSVFS, NTFS and ReFS.<\/p>\n<p>2 years of work by 10 people to create a disk filter driver that sits between the Volume Manager and the Partition Manager.<\/p>\n<h2>Synch Workflow<\/h2>\n<p>A log is kept of each write on primary server. The log is written through to the disk&#160; The same log&#160; is kept on the secondary site. The write is sent to the log in parallel on both sites. Only when the secondary site has written to the log in both sites is the write acknowledged<\/p>\n<h2>Asynch Workflow<\/h2>\n<p>The write goes to the log on site A and acknowledged. Continuous replication sends the write to the log in the secondary site. Not interval based.<\/p>\n<h2>SMB 3.1.1.<\/h2>\n<p>RDMA\/SMB Direct can be used long range with Mellanox InfiBand Metro-X and Chelsio iWarp can do long distance. MSFT have tested 10KM, 25 KM, and 40KM networks to test this. Round trip latencies are hundreds of microseconds for 40 KM one-way (very low latency). SMB 3.1.1 has optimized built-in encryption. They are still working on this and you should get to the point where you want encryption on all the time.<\/p>\n<h2>Questions<\/h2>\n<ul>\n<li>How Many Nodes? 1 cluster with 64 nodes or 2 clusters with 64 nodes each.<\/li>\n<li>Is the log based on Jet? No; The log is based on CLFS<\/li>\n<\/ul>\n<h2>Requirements<\/h2>\n<ul>\n<li>Windows Server Datacenter edition <strong>only \u2013 <\/strong>yes I know.<\/li>\n<li>AD is required \u2026 no schema updates, etc. They need access to Kerberos.<\/li>\n<li>Disks must be GPT. MBR is no supported.<\/li>\n<li>Same disk geometry (between logs, between data) and partition fo rdata.<\/li>\n<li>No removable drives.<\/li>\n<li>Free space for logs on a Windows NTFS\/ReFS volume (logs are fixed size and manually resized)<\/li>\n<li>No %Systemroot%, page filem hibernation file or DMP file replication.<\/li>\n<\/ul>\n<p>Firewall: SMB and WS-MAN<\/p>\n<h2>Synch Replication Recommendations<\/h2>\n<ul>\n<li>&lt;5 MS round trip latency. Typically 30-50 KM in the real world.<\/li>\n<li>&gt; 1 Gbps bandwidth end-end between the servers is a starting point. Depends on a lot.<\/li>\n<li>Log volume: Flash (SSD, NVME, etc). Larger logs allow faster recovery from larger outages and less rollover, but cost space.<\/li>\n<\/ul>\n<h2>Asynchronous Replication<\/h2>\n<p>Latency not an issue. Log volume recommendations are the same as above.<\/p>\n<h2>Can we make this Easy?<\/h2>\n<p>Test-SRTopology cmdlet. Checks requirements and recommendations for bandwidth, log sizes, IPS, etc. Runs for specified duration to analyse a potential source server for sizing replication. Run it before configuration replication against a proposed source volume and proposed destination.<\/p>\n<h2>Philosophy<\/h2>\n<p>Async crash consistency versus application consistency. Guarantee mountable volume. App must guarantee a usable file<\/p>\n<p>Can replicate VSS snapshots.<\/p>\n<h2>Management Rules in SR V1<\/h2>\n<p>You cannot use the replica volume. In this release they only do 1:1 replication, e.g. 1 node to 1 node, 1 cluster to 1 cluster, and 1 half cluster to another half cluster. You cannot do legs of replication.<\/p>\n<p>You can do Hyper-V Replica from A to B and SR from B to C.<\/p>\n<p>Resizing replicated volumes interrupts replication. This might change \u2013 feedback.<\/p>\n<h2>Management Notes<\/h2>\n<p>Latest drivers. Most problems are related to drivers, not SR. Filter drivers can be dodgy too.<\/p>\n<p>Understand your performance requirements. Understand storage latency impact on your services. Understand network capacity and latency. PerfMon and DiskSpd are your friends. Test workloads before and after SR.<\/p>\n<h2>Where can I run SR?<\/h2>\n<p>In a VM. Requires&#160; WS2016 DC edition. Work on any hypervisor. It works in Azure, but no support statement <em>yet<\/em>.<\/p>\n<h2>Hyper-V Replica<\/h2>\n<p>HVR understands your Hyper-V workload. It works with HTTPS and certificates. Also in Std edition.<\/p>\n<p>SR offers synchronous replication. Can create stretched guest clusters. Can work in VMs that are not in Hyper-V.<\/p>\n<h2>SQL Availability Groups<\/h2>\n<p>Lots of reasons to use SQL AGs. SR doesn\u2019t require SQL Ent. Can replicate VMs at host volume level. SR might be easier than SQL AGs. You must use write ordering\/consistency if you use any external replication of SQL VMs \u2013 includes HVR\/ASR.<\/p>\n<h2>Questions<\/h2>\n<ul>\n<li>Is there a test failover: No<\/li>\n<li>Is 5MS a hard rule for sync replication. Not in the code. But over 5 MS will be too slow and degrade performance.<\/li>\n<li>Overhead? Initial sync can be heavy due to check-summing. There is a built-in throttle to prevent using too much RAM. You cannot control that throttle in TP2 but you will later.<\/li>\n<\/ul>\n<h2>What SR is Not<\/h2>\n<ul>\n<li>It is not shared-nothing clustering. That is Storage Spaces Direct (S2D).<\/li>\n<li>However, you can use it to create a shared-nothing 2 node cluster.<\/li>\n<li>It is not a backup \u2013 it will replicate deletions of data very very well.<\/li>\n<li>It is not DFS-R, multi-endpoint, not low bandwidth (built to hammer networks), <\/li>\n<li>Not a great branch office solution<\/li>\n<\/ul>\n<p>It is a DR solution with lots of bandwidth between them.<\/p>\n<h2>Stretch Clusters<\/h2>\n<ul>\n<li>Synchronous only<\/li>\n<li>Asymmetric storage,e.g. JBOD in one site and SAN in another site.<\/li>\n<li>Manage with FCM<\/li>\n<li>Increase cluster DR capabilities.<\/li>\n<li>Main use cases are Hyper-V and general use file server.<\/li>\n<\/ul>\n<p>Not for stretch-cluster SOFS \u2013 you\u2019d do cluster-to-cluster replication for that.<\/p>\n<h2>Cluster-Cluster or Server-Server<\/h2>\n<ul>\n<li>Synch or asynch<\/li>\n<li>Supports S2D<\/li>\n<\/ul>\n<h2>PowerShell<\/h2>\n<ul>\n<li>New-SrPartnership<\/li>\n<li>Set-SRPartnership<\/li>\n<li>Test-SrTopology<\/li>\n<\/ul>\n<h2>DiskSpd Demo on Synch Replication<\/h2>\n<p>Runs DiskSpd on volume on source machine.<\/p>\n<ul>\n<li>Before replication: 63,000 IOPS on source volume<\/li>\n<li>After replication: In TPv2 it takes around 15% hit. In latest builds, it\u2019s under 10%.<\/li>\n<\/ul>\n<p>In this demo, the 2 machines were 25 KM apart with an iWarp link. Replaced this with fibre and did 60,000 IOPS.<\/p>\n<h2>Azure Site Recovery<\/h2>\n<p>Requires SCVMM. You get end-end orchestration. Groups VMs to replicate together. Supports for Azure Automation runbooks. Support for planned\/unplanned failover. Preview in July\/August.<\/p>\n<h2>Questions:<\/h2>\n<ul>\n<li>Tiered storage spaces: It supports tiering, but the geometry must be identical in both sides.<\/li>\n<li>Does IO size affect performance? Yes.<\/li>\n<\/ul>\n<h2>The Replication Log<\/h2>\n<p>Hidden volume. <\/p>\n<h2>Known Issues in TP2<\/h2>\n<ul>\n<li>PowerShell remoting for server-server does not work<\/li>\n<li>Performance is not there yet<\/li>\n<li>There are bugs<\/li>\n<\/ul>\n<p>A <a href=\"https:\/\/technet.microsoft.com\/en-us\/library\/mt126104.aspx\" target=\"_blank\">guide<\/a> was published on Monday on TechNet.<\/p>\n<p>Questions to srfeed &lt;at&gt; microsoft.com<\/p>\n<p><div id=\"scid:0767317B-992E-4b12-91E0-4F059A8CECA8:0c34cab3-c165-4ca6-be44-32d8fdd9cbde\" class=\"wlWriterEditableSmartContent\" style=\"float: none; padding-bottom: 0px; padding-top: 0px; padding-left: 0px; margin: 0px; display: inline; padding-right: 0px\">Technorati Tags: <a href=\"http:\/\/technorati.com\/tags\/Event+Notes\" rel=\"tag\">Event Notes<\/a>,<a href=\"http:\/\/technorati.com\/tags\/Windows+Server+2016\" rel=\"tag\">Windows Server 2016<\/a>,<a href=\"http:\/\/technorati.com\/tags\/Storage\" rel=\"tag\">Storage<\/a>,<a href=\"http:\/\/technorati.com\/tags\/Failover+Clustering\" rel=\"tag\">Failover Clustering<\/a>,<a href=\"http:\/\/technorati.com\/tags\/Hyper-V\" rel=\"tag\">Hyper-V<\/a><\/div><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Speaker: Net Pyle. What is a Disaster? Answer: McDonalds running out of food at Ignite. But I digress \u2026 you lose your entire server room or data centre. Hurricane Sandy wiped out Manhattan. Lots of big hosting facilities went offline. Some stayed partially online. And a handful stayed online. Storage Replica Overview Synchronous replication between &hellip; <a href=\"https:\/\/aidanfinn.com\/?p=18128\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Ignite 2015&ndash;Exploring Storage Replica in Windows Server 2016&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[14],"tags":[176,63,181,99,137],"class_list":["post-18128","post","type-post","status-publish","format-standard","hentry","category-eventnotes","tag-eventnotes","tag-failover-clustering","tag-hyper-v","tag-storage","tag-windows-server-2016"],"aioseo_notices":[],"jetpack_featured_media_url":"","amp_enabled":true,"_links":{"self":[{"href":"https:\/\/aidanfinn.com\/index.php?rest_route=\/wp\/v2\/posts\/18128","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/aidanfinn.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aidanfinn.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aidanfinn.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/aidanfinn.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=18128"}],"version-history":[{"count":16,"href":"https:\/\/aidanfinn.com\/index.php?rest_route=\/wp\/v2\/posts\/18128\/revisions"}],"predecessor-version":[{"id":18144,"href":"https:\/\/aidanfinn.com\/index.php?rest_route=\/wp\/v2\/posts\/18128\/revisions\/18144"}],"wp:attachment":[{"href":"https:\/\/aidanfinn.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=18128"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aidanfinn.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=18128"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aidanfinn.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=18128"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}