IaaS | Aidan Finn, IT Pro

Video – What is Microsoft Azure?

I’ve posted a short video to help people understand what Microsoft Azure is, how it can impact a business, where it is, how Microsoft has made Azure compliance with lots of regulations and standards, and what Azure can do.

Was This Video Useful?

If you found this information useful, then imagine what 2 days of training might mean to you. I’m delivering a 2-day course in Amsterdam on April 19-20, teaching newbies and experienced Azure admins about Azure Infrastructure. There’ll be lots of in-depth information, covering the foundations, best practices, troubleshooting, and advanced configurations. You can learn more here.

New Virtual Machines Series in Azure Dublin / North Europe

I was helping troubleshoot something for a customer today when I noticed that some of the newer VM series have finally arrived in Azure’s Dublin / North Europe region:

D_v3: The successor to the D_v2 machines (including the “S” Premium Storage variants) that are designed for disk/database workloads. The machine is 28% cheaper than the RRP of the D_v2, but that’s because it offers VMs on hosts with Hyperthreading … which reduces CPU performance by 28%. Common workloads care more about affordable core counts than GHz, which is what the D_v3 offers.
E_v3: The memory-optimized versions (more memory) of the D_v2 are also here, with the same 28% price/GHz reduction.
NV: These are machines with direct (not virtualized) access to NVIDIA M60 chipsets on their hosts, specialized for desktop virtualization.
NC: You can run virtual machines that are designed for computational workloads (simulations, etc) with these machines, using non-virtualized access to NVIDIA Tesla K80 GPUs.

I’ve just upgraded this server (shutdown – resize – restart) from a DS2_v2 to a DS2_v3.

FYI, if you are still using the D_v2 promo offer in North Europe, you had better start planning for upgrading to the D_v3 soon if you want to keep that low price. It’s just a matter of time now until Microsoft announces the end of the pre-D_v3 promotion on D_v2 machines, and the price of the D_v2 returns back to normal (28% higher than the promo).

Was This Post Useful?

Azure IaaS Design & Performance Considerations–Best Practices & Learnings From The Field

Speaker: Daniel Neumann, TSP – Azure Infrastructure, Microsoft (ex-MVP).

Selecting the Best VM Size

Performance of each Azure VM vCPU/core is rated using ACU, based on 100 for the Standard A-Series. E.g. D_v2 offers 210-250 per vCPU. H offers 290-300. Note that the D_v3 has lower speeds than D_v2 because it uses hyprethreading on the host – MS matched this by reducing costs accordingly. Probably not a big deal – DB workloads which are common on the D-family care more about thread count than GHz.

Network Performance

Documentation has been improved to show actual Gbps instead of low/medium/high. Higher-end machines can be created with Accelerated Networking (SR-IOV) which can offer very high speeds. Announced this week: the M128s the VM can hit 30 Gbps.

RSS

Is not always enabled by default for Windows VMs. It is on larger VMs, and it is for all Linux machines. Can greatly improve inbound data transfer performance for multi-core VMs.

Storage Throughput

Listed in the VM sizes. This varies between series, and increases as you go up through the sizes. Watch out when using Premium Storage – lower end machines might not be able to offer the potential of larger disks or storage pools of disks, so you might need a larger VM size to achieve the performance potential of the disks/pool.

Daniel uses a tool called PerfInsights from MS Downloads to demo storage throughput.

Why Use Managed Disks

Storage accounts are limited to 50,0000 IOPS since 20/9/2017. That limits the number of disks that you can have in a single storage account. If you put too many disks in a single storage account, you cannot get the performance potential of each disk.

Lots of reasons to use managed disks. In short:

No more storage accounts
Lots more management features
FYI: no support yet for Azure-to-Azure Site Recovery (replication to other regions)

If you use un-managed disks with availability sets, it can happen that all 3 copies of storage accounts are in the same fault domain. With managed disks, availability set alignment is mirrored by disk placement.

Storage Spaces

Do not use disk mirroring. Use simple virtual disks/LUNs.

Ensure that the column count = the number of disks for performance.

Daniel says to format the volume with 64KB allocation unit size. True, for almost everything except SQL Server. For normal transactional databases, stick with 64KB allocation unit size. For SQL Server data warehouess, go with 256KB allocation unit size – from the SQL Tiger team this week.

Networking

Daniel doesn’t appear to be a fan of micro-segmentation of a subnet using an NVA. Maybe the preview DPDK feature for NVA performance might change that.

He shows the NSG Security Group View in Network Watcher. It allows you to understand how L4 firewall rules are being applied by NSGs. In a VM you also have: effective routes and effective security rules.

Encryption Best Practices

Azure Disk Encryption requires that your key vault and VMs reside in the same Azure region and subscription.

Use the latest version of Azure PowerShell to configure Azure Disk Encryption.

You need an Azure AD Service Principal – the VM cannot talk directly to the key vault, so it goes via the service principal. Best practice is to have 1 service principal for each key vault.

Storage Service Encryption (managed disks) is easier. There is no BYOK at the moment so there’s no key vault function. The keys are managed by Azure and not visible to the customer.

The Test Tools Used In This Session

Comparing Performance with Encryption

There’s lots of charts in this section so best to watch the video on Channel 9/Ignite?YouTube.

In short, ADE encryption causes some throughput performance hits, depending on disk tier, size, and block size of data – CPU 3% utilization, no IOPS performance hit. SSE has no performance impact.

Azure Backup Best Practices

You need a recovery services vault in the same region/subscription as the VM you want to backup.

VMs using ADE encryption must have a Key Encryption Key (KEK).

Best case performance of Azure Backup backups:

Initial backup: 20 Mbps.
Incremental backup: 80 Mbps.

Best practices:

Do not schedule more than 40 VMs to backup at the same time.
Make sure you have Python 2.7 in Linux VMs that you are backing up.

Application-Aware Disaster Recovery For VMware, Hyper-V, and Azure IaaS VMs with Azure Site Recovery

Speaker: Abhishek Hemrajani, Principal Lead Program Manger, Azure Site Recovery, Microsoft

There’s a session title!

The Impact of an Outage

The aviation industry has suffered massive outages over the last couple of years costing millions to billions. Big sites like GitHub have gone down. Only 18% of DR investors feel prepared (Forrester July 2017 The State of Business Technology Resiliency. Much of this is due to immature core planning and very limited testing.

Causes of Significant Disasters

Forrester says 56% of declared disasters are caused by h/w or s/w.
38% are because of power failures.
Only 31% are caused by natural disasters.
19% are because of cyber attacks.

Sourced from the above Forrester research.

Challenges to Business Continuity

Cost
Complexity
Compliance

How Can Azure Help?

The hyper-scale of Azure can help.

Reduced cost – OpEx utility computing and benefits of hyper-scale cloud.
Reduced complexity: Service-based solution that has weight of MS development behind it to simplify it.
Increased compliance: More certifications than anyone.

DR for Azure VMs

Something that AWS doesn’t have. Some mistakenly think that you don’t need DR in Azure. A region can go offline. People can still make mistakes. MS does not replicate your VMs unless you enable/pay for ASR for selected VMs. Is highly certified for compliance including PCI, EU Data Protection, ISO 27001, and many, many more.

Ensure compliance: No-impact DR testing. Test every quarter or, at least, every 6 months.
Meet RPO and RTO goals: Backup cannot do this.
Centralized monitoring and alerting

Cost effective:

“Infrastructure-less” DR sites.
Pay for what you consume.

Simple:

One-click replication
One-click application recovery (multiple VMs)

Demo: Typical SharePoint Application in Azure

3 tiers in availability sets:

SQL cluster – replicated to a SQL VM in a target region or DR site (async)
App – replicated by ASR – nothing running in DR site
Web – replicated by ASR – nothing running in DR site
Availability sets – built for you by ASR
Load balancers – built for you by ASR
Public IP & DNS – abstract DNS using Traffic Manager

One-Click Replication is new and announced this week. Disaster Recovery (Preview) is an option in the VM settings. All the pre-requisites of the VM are presented in a GUI. You click Enable Replication and all the bits are build and the VM is replicated. You can pick any region in a “geo-cluster”, rather than being restricted to the paired region.

For more than one VM, you might enable replication in the recovery services vault (RSV) and multi-select the VMs for configuration. The replication policy includes recovery point retention and app-consistent snapshots.

New: Multi-VM consistent groups. In preview now, up to 8 VMs. 16 at GA. VMs in a group do their application consistent snapshots at the same time. No other public cloud offers this.

Recovery Plans

Orchestrate failover. VMs can be grouped, and groups are failed over in order. You can also demand manual tasks to be done, and execute Azure Automation runbooks to do other things like creating load balancer NAT rules, re-configuring DNS abstraction in Traffic Manager, etc. You run the recovery plan to failover …. and to do test failovers.

DR for Hyper-V

You install the Microsoft Azure Recovery Services (MARS) agent on each host. That connects you to the Azure RSV and you can replicate any VM to that host. No on-prem infrastructure required. No connection broker required.

DR for VMware

You must deploy the ASR management appliance in the data centre. MS learned that the setup experience for this is complex. They had a lot of pre-reqs and configurations to install this in a Windows VM. MS will deliver this appliance as an OVF template from now on – familiar format for VMware admins, and the appliance is configured from the Azure Portal. Replicate Linux and Windows VMs to Azure, as with Hyper-V from then on.

Demo: OVF-Based ASR Management Appliance for VMware

A web portal is used to onboard the downloaded appliance:

Verify the connection to Azure.
Select a NIC for outbound replication.
Choose a recovery services vault from your subscription.
Install any required third-party software, e.g. PowerCLI or MySQL.
Validate the configuration.
Configure vCenter/ESXi credentials – this is never sent to Azure, it stays local. The name of the credential that you choose might appear in the Azure portal.
Then you enter credentials for your Windows/Linux guest OS. This is required to install a mobility service in each VMware VM. This is because VMware doesn’t use VHD/X, it uses VMDK. Again, not sent to MS, but the name of the credential will appear in the Azure Portal when enabling VM replication so you can select the right credentials.
Finalize configuration.

This will start rolling out next month in all regions.

Comprehensive DR for VMware

Hyper-V can support all Linux distros supported by Azure. On VMware they’re close to all. They’ve added Windows Server 2016, Ubuntu 14.04 and 16.04 , Debian 7/8, managed disks, 4 TB disk support.

Achieve Near-Zero Application Data Loss

Tips:

Periodic DR testing of recovery plans – leverage Azure Automation.
Invoke BCP before disasters if you know it’s coming, e.g. hurricane.
Take the app offline before the event if it’s a planned failover – minimize risks.
Failover to Azure.
Resume the app and validate.

Achieve 5x Improvement in Downtime

Minimize downtime: https://aka.ms/asr_RTO

He shows a slide. One VM took 11 minutes to failover. Others took around/less than 2 minutes using the above guidance.

Demo: Broad OS Coverage, Azure Features, UEFI Support

He shows Ubunu, CentOS, Windows Server, and Debian replicating from VMware to Azure. You can failover from VMware to Azure with UEFI VMs now – but you CANNOT failback. The process converts the VM to BIOS in Azure (Generation 1 VMs). OK if there’s no intention to failback, e.g. migration to Azure.

Customer Success Story – Accenture

They deployed ASR. Increased availability. 53% reduction in infrastructure cost. 3x improvement in RPO. Savings in work and personal time. Simpler solution and they developed new cloud skills.

They get a lot of alerts at the weekend when there’s any network glitches. Could be 500 email alerts.

Demo: New Dashboard & Comprehensive Monitoring

Brand new RSV experience for ASR. Lots more graphical info:

Replication health
Failover test success
Configuration issues
Recovery plans
Error summary
Graphical view of the infrastructure: Azure, VMware, Hyper-V. This shows the various pieces of the solution, and a line goes red when a connection has a failure.
Jobs summary

All of this is on one screen.

He clicks on an error and sees the hosts that are affected. He clicks on “Needs Attention” in one of the errors. A blade opens with much more information.

We can see replication charts for a VM and disk – useful to see if VM change is too much for the bandwidth or the target storage (standard VS premium). The disk level view might help you ID churn-heavy storage like a page file that can be excluded from replication.

A message digest will be sent out at the end of the day. This data can be fed into OMS.

Some guest speakers come up from Rackspace and CDW. I won’t be blogging this.

Questions

When are things out: News on the ASR blog in October
The Hyper-V Planner is out this week, and new cost planners for Hyper-V and VMware are out this week.
Failback of managed disks is there for VMware and will be out by end of year for Hyper-V.

Upcoming Webinar: 4 Important Azure IaaS features for building your Hybrid Cloud

I’m going to be presenting in a webinar by Altaro on July 18th. It’s an interesting topic, Azure, because lots of IT pros are wondering if/how they’ll use Azure and how they should get started. As an IT pro, your first ventures into The Cloud will probably be infrastructure, so I’ll talk about a few topics that will hopefully get you better prepared.

Altaro has a big audience around the world, so the webinar will be run twice:

Time for EU attendees: (2pm CEST)
Time for US attendees: (10am PDT / 1pm EDT)

Understand Azure’s New VM Naming Standards

This post will explain how you can quickly understand the new naming standards for Azure VM sizes. My role has given me the opportunity to see how people struggle with picking a series or size of a VM in Azure. Faced with so many options, many people freeze, and never get beyond talking about using Azure.

Starting with the F-Series, Microsoft has introduced a structure for naming the sizes of virtual machines. This is welcome, because the naming of the sizes within the A-Series, D-Series, etc, was … … random at best.

The name of a size in the F-Series, the H-Series and the soon-to-be-released Av2 series is quite structured. The key is the number in the size of the machine; this designated the number of vCPUs in the machine.

Let’s start with the new Av2 series. The name of a size tells you a lot about that machine spec. For example, the A4v2 (note this is an A4 version 2), paying attention to the “4”:

4 vCPUs
8 GB RAM (4 x 2)
Can support up to 8 data disks (4 x 2)
Can have up to 4 vNICs

Let’s look at an F2 VM, paying attention to the “2”:

2 vCPUs
4 GB RAM (2 x 2)
Can support up to 4 data disks (2 x 2)
Can have up to 2 vNICs

You can see from above that there is a “multiplier”, which was 2 in the above 2 examples. The H-Series, is a set of large RAM VMs for HPC workloads, 8 GB RAM is pretty useless for these tasks! So the H-Series multiples things differently, which you can see with a H8, the smallest machine in this series:

8 vCPUs
56 GB RAM (8 x 7)
Can support up to 16 data disks (8 x 2)
Can have up to 2 vNICs

The RAM multiplier changed, but as you can see, the name still tells us about the processor and disk configuration.

Some sizes of virtual machine are specialized. These specializations are designated by a letter. Here are some of that codes:

S (is for SSD) = The machine can support Premium Storage, as well as Standard Storage
R (is for RDMA) = The machine has an additional Infiniband (a form of RDMA that is not Ethernet-based) NIC for high bandwidth, low latency data transfer
M (is for memory) = The machine has a larger multiplier for RAM than is normal for this series.

Let’s look at the A4mv2, noting the 4 (CPUs) and the M code:

4 CPUs, as expected
Can support up to 8 data disks (4 x 2), as expected
Can have up to 4 vNICs, as expected
But it has 32 GB RAM (4 x 8) instead of 8 GB RAM (4 x 2) – the memory multiplier was increased.

The F2s VM, we know has 2 vCPUs, 4 GB RAM, and can have up to 4 data disks and 2 NICs, but it differs slightly from the F2 VM. The S tells us that we can place the OS and data disks on a mixture of Standard Storage (HDD) and Premium Storage (SSD).

Let’s mix it up a little by returning to the HPC world. The H16mr VM does quite a bit:

It has 16 vCPU, as expected.
It has a lot of RAM: 224 GB RAM – the M designated that the expected x7 multiplier for 112 GB RAM was doubled to x14 (16 x 14 = 224).
It can support 32 data disks, as expected (16 x 2)
It can support up to 4 vNICs.
And the VM will have an additional Infiniband/RDMA NIC for high bandwidth and low latency data transfers (the R code).

Technorati Tags: Azure,IaaS,Virtual Machines