I find containers are easy to create and it’s pretty simple to build a library of container images. At least, that’s what I found when I got to play with containers for the first time on a pre-build Windows Server 2016 (WS2016) Technical Preview 3 (TPv3) lab. But I started playing with containers for the first time in my own lab in the last few days and I had some issues; the thing I had never done was create a host, a VM host to be precise (a Hyper-V VM that will host many containers), by myself. In this post I’ll explain how, by default, my containers were not networked and how I fixed it. This post was written for the TPv3 release, and Microsoft might fix things in later releases, but you might find some troubleshooting info that might be of help here.
I guess that most people will deploy Windows Server containers in virtual machines. If you work in the Hyper-V world then you’ll use Hyper-V VMs. In this timeframe the documented process for creating a VM host is to download and run a script called New-ContainerHost.PS1. You can get that by running:
wget -uri https://aka.ms/newcontainerhost -OutFile New-ContainerHost.ps1
You’ll get a script that you download and then you’re told by Microsoft and every other blog that copied & pasted without testing to run:
.\New-ContainerHost.ps1 –VmName <NewContainerHostVMName> -Password <NewContainerHostVMPassword>
What happens then?
- A bunch of stuff is downloaded in a compressed file, including a 12 GB VHD called WindowsServer_en-us_TP3_Container_VHD.vhd.
- The VHD is mounted and some files are dropped into it, including Install-ContainerHost.ps1
- A new VM is created. The C: drive is a differencing VHD that uses the downloaded VHD as the parent
- The VM is booted.
- When the VM is running, Install-ContainerHost is run, and the environment is created in the VM.
- Part of this is the creation of a virtual switch inside the VM. Here’s where things can go wrong by default.
- The script completes and it’s time to get going.
What’s the VM switch inside a VM all about? It’s not just your regular old VM switch. It’s a NATing switch. The idea here is that containers that will run inside of the VM will operate on a private address space. The containers connect to the VM switch which provides the NATing functionality. The VM switch is connected to the vNIC in the VM. The guest OS of the VM is connected to the network via a regular old switch sharing process (a management OS vNIC in the guest OS).
What Goes Wrong?
Let’s assume that you have read some blogs that were published very quickly on the topic of containers and you’ve copied and pasted the setup of a new VM host. I tried that. Let’s see what happened … there were two issues that left me with network-disconnected containers:
Disconnected VM NIC
Almost every example I saw of New-ContainerHost fails to include 1 necessary step: specify the name of a virtual switch on the host to connect the VM to. You can do this after the fact, but I prefer to connect the VM straight away. This cmdlet adds a flag to specify which host to connect the VM to. I’ve also added a cmdlet to skip the installation of Docker.
.\New-ContainerHost.ps1 –VmName <newVMName> –Password <NewVMPassword> -SkipDocker –SwitchName <PhysicalHostSwitch>
This issue is easy enough to diagnose – your VM’s guest OS can’t get a DHCP address so you connect the VM’s vNIC to the host’s virtual switch.
This is the sticky issue because it deals with new stuff. New-NetNat will create:
… a Network Address Translation (NAT) object that translates an internal network address to an external network address. NAT modifies IP address and port information in packet headers.
Fab! Except it kept failing in my lab with this error:
Net-NetNat : No Matching interface was found for prefix (null).
This wrecked my head. I was about to give up on Containers when it hit me. I’d already tried building my own VM and I had downloaded and ran a script called Install-ContainerHost in a VM to enable Containers. I logged into my VM and there I found Install-ContainerHost on the root of C:. I copied it from the VM (running Server Core) to another machine with a UI and I edited it using ISE. I searched for 172.16.0.0 and found a bunch of stuff for parameters. A variable called $NATSubnetPrefix was set to “172.16.0.0/12”.
There was the issue. My lab’s network address is 172.16.0.0/16; this wasn’t going to work. I needed a different range to use behind the NATing virtual switch in the container VM host. I edited the variable to define a network address for NATing of “192.168.250.0/24”:
I removed the VM switch and then re-ran Install-ContainerHost in the VM. The script ran perfectly. Let’s say the VM had an address of 172.16.250.40. I logged in and created a new container (the container OS image is on the C:). I used Enter-PSRemote to log into the container and I saw the container had an IP of 192.168.250.2. This was NATed via the virtual switch in the VM, which in turn is connected to the top-of-rack switch via the physical host’s virtual switch.
Sorted. At least, that was the fix for a broken new container VM host. How do I solve this long term?
I can tell you that mounting the downloaded WindowsServer_en-us_TP3_Container_VHD.vhd and editing New-ContainerHost there won’t work. Microsoft appears to download it every time into the differencing disk.
The solution is to get a copy of Install-ContainerHost.PS1 (from the VHD) and save it onto your host or an accessible network location. Then you run New-ContainerHost with the –ScriptPath to specify your own copy of Install-ContainerHost. Here’s an example where I saved my edited (the new NAT network address) copy of Install-ContainerHost on CSV1:
.\New-ContainerHost.ps1 –VmName NewContainerVM –Password P@ssw0rd -SkipDocker -SwitchName SetSwitch -ScriptPath "C:\ClusterStorage\CSV1\Install-ContainerHost.ps1"
That runs perfectly, and no hacks are required to get containers to talk on the network. I then successfully deployed IIS in a container, enabled a NATing rule, and verified that the new site was accessible on the LAN.