{"id":24725,"date":"2026-03-09T16:59:57","date_gmt":"2026-03-09T16:59:57","guid":{"rendered":"https:\/\/aidanfinn.com\/?p=24725"},"modified":"2026-03-09T17:00:02","modified_gmt":"2026-03-09T17:00:02","slug":"its-not-always-azure","status":"publish","type":"post","link":"https:\/\/aidanfinn.com\/?p=24725","title":{"rendered":"It&#8217;s Not Always Azure"},"content":{"rendered":"\n<p>It&#8217;s easy to blame Azure when something goes wrong. But sometimes, Azure isn&#8217;t at fault. Sometimes, the problem is old-school. The trick in solving the problem is knowing how to diagnose and fix it.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Background<\/h2>\n\n\n\n<p>I helped an Irish Microsoft partner with some Azure VM-based work about a month ago. The partner needed some Azure experience and extra capacity. It was a small job &#8211; I&#8217;m happy doing everything from an hour for a small-medium business partner to a full-blown Cloud Adoption Framework for a large enterprise (both are on the <a href=\"https:\/\/www.cloudmechanix.com\" target=\"_blank\" rel=\"noopener\" title=\"\">Cloud Mechanix<\/a> books). <\/p>\n\n\n\n<p>The partner pinged me last Friday to say that he couldn&#8217;t log into the new VM anymore. I had some free time on Friday afternoon, so I had a quick look.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Diagnostics Progress in Azure<\/h2>\n\n\n\n<p>I verified the problem:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The partner could not RDP directly.<\/li>\n\n\n\n<li>The partner could not RDP via Bastion.<\/li>\n<\/ul>\n\n\n\n<p>An Azure deployment for a smaller business is a different beast. You do not get the privilege of firewalls, Flow Logs, etc.  Those resources provide logs that allow me to trace packets from A to B inside the Azure network. I had to visualise and test. You also find the use of Public IP addresses with NSG inbound rules controlling RDP. I have suggested the switch to Bastion, which the partner is considering.<\/p>\n\n\n\n<p>My first port of call was to double-check NSGs. The NIC has an NSG. I made sure that the subnet did not have an NSG as well &#8211; I&#8217;ve seen people create a rule in a NIC NSG and not in a subnet NSG. The subnet NSG is processed first for inbound traffic, so it could deny traffic that the NSG NIC allows. This was not the case here &#8211; no subnet NSG.<\/p>\n\n\n\n<p>The inbound rules on the NIC NSG allowed RDP from the partner and the customer. I started with a Connection Troubleshoot using the IP address for the developer SKU of Bastion (168.63.129.16). That appeared OK.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/aidanfinn.com\/wp-content\/uploads\/2026\/03\/image.png\"><img loading=\"lazy\" decoding=\"async\" width=\"450\" height=\"546\" src=\"https:\/\/aidanfinn.com\/wp-content\/uploads\/2026\/03\/image.png\" alt=\"\" class=\"wp-image-24732\" srcset=\"https:\/\/aidanfinn.com\/wp-content\/uploads\/2026\/03\/image.png 450w, https:\/\/aidanfinn.com\/wp-content\/uploads\/2026\/03\/image-247x300.png 247w\" sizes=\"auto, (max-width: 450px) 85vw, 450px\" \/><\/a><\/figure>\n\n\n\n<p>I then double-checked with NSG Diagnostics &#8211; Bastion is a supported source. That failed &#8211; looking back on it, this should have triggered a different resolution path.<\/p>\n\n\n\n<p>I got the partner to run a password reset in the guest OS using Help > Reset Password. Note that this process also does some RDP reset work inside the guest OS. The process succeeded but did not fix the issue.<\/p>\n\n\n\n<p>I&#8217;ve seen RDP issues with VMs where the problem is within the platform. Azure provides us with a poorly-named feature called Redeploy. The name implies that in a deployment\/developer-centric environment, a new VM will be deployed. In fact, the action re-hosts the VM, doing something similar to a <em>quick migration<\/em> from the Hyper-V world:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Shutdown the VM<\/li>\n\n\n\n<li>Move the VM to another host<\/li>\n\n\n\n<li>Reinitiate Azure management of the VM &#8211; this is the key piece<\/li>\n\n\n\n<li>Restart the VM<\/li>\n<\/ul>\n\n\n\n<p>Downtime is required. I&#8217;ve used this feature a handful of times over the years to solve similar issues: Everything seems fine networking-wise with the VM but you cannot log in. Running the action resets Azure&#8217;s RDP connection to the VM. The partner ran this action over the weekend but the issue was not fixed.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Diagnostics Process in The VM<\/h2>\n\n\n\n<p>Monday came along and the partner updated me with the bad news. Now I suspected something was wrong inside the guest OS. How was I going to fix the guest OS if I couldn&#8217;t log in.<\/p>\n\n\n\n<p>There are two <em>secure<\/em> back doors into a guest OS in Azure. If you need an interactive prompt then you have <a href=\"https:\/\/petri.com\/serial-console-access-azure-virtual-machines\/\" target=\"_blank\" rel=\"noopener\" title=\"\">serial console access<\/a>.<\/p>\n\n\n\n<p>I wanted to run a couple of PowerShell commands, one at a time. So I opted for Run Command, which allows you to run scripts or single commands in the guest OS via a VM extension (an secure channel, based on your Azure rights).<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/aidanfinn.com\/wp-content\/uploads\/2026\/03\/image-1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"752\" src=\"https:\/\/aidanfinn.com\/wp-content\/uploads\/2026\/03\/image-1-1024x752.png\" alt=\"\" class=\"wp-image-24739\" srcset=\"https:\/\/aidanfinn.com\/wp-content\/uploads\/2026\/03\/image-1-1024x752.png 1024w, https:\/\/aidanfinn.com\/wp-content\/uploads\/2026\/03\/image-1-300x220.png 300w, https:\/\/aidanfinn.com\/wp-content\/uploads\/2026\/03\/image-1-768x564.png 768w, https:\/\/aidanfinn.com\/wp-content\/uploads\/2026\/03\/image-1.png 1056w\" sizes=\"auto, (max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px\" \/><\/a><\/figure>\n\n\n\n<p>The first command I ran was ResetRDPCert. The partner mentioned something about RDP certs and I was worried that some PKI damage was done. That command didn&#8217;t fix the issue.<\/p>\n\n\n\n<p>RDP was working. No NSG rules were blocking the traffic. Networking was fine. BUt I could not RDP into the VM. The connections were IP-based and I was using a local administrator account so DNS (&#8220;it&#8217;s always &#8230;&#8221;) was not the culprit (this time!). There as no custom routing or firewall (small business scenario) so they were not the cause. I knew it was the guest OS, so that left &#8230;<\/p>\n\n\n\n<p>Next I used Run Command to <a href=\"https:\/\/learn.microsoft.com\/en-us\/windows\/security\/operating-system-security\/network-security\/windows-firewall\/configure-with-command-line?tabs=powershell#disable-windows-firewall\" target=\"_blank\" rel=\"noopener\" title=\"\">disable the Windows Firewall with a single PowerShell command<\/a>. I ran the command, waited for the success result, and tried to log in &#8230; and it worked!<\/p>\n\n\n\n<p>I informed the partner who was delighted.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Later That Day &#8230;<\/h2>\n\n\n\n<p>The partner messaged me to let me know that he could not log in. I knew Windows Firewall was at fault, so I reckoned that the firewall was back online. There is a Windows domain, so a GPO might have re-enabled the firewall; that&#8217;s a good thing, not a bad thing. The long-term fix was to accept that a guest OS firewall should be on and add rules to allow the UDP &amp; TCP 3389 traffic.<\/p>\n\n\n\n<p>I added 2 custom rules with pretty obvious names in Windows Firewall. I wanted to be sure that the firewall would not break things after a GPO refresh so I ran gpupdate \/force a few times (veteran domain admins know that run 1 is based on cache, 2 runs the latest version from a DC, and 3 deals with edge cases where 2 downloads but doesn&#8217;t deploy).  I checked the firewall &#8230; and it was <em>still not running!?!?!<\/em> Group Policy was <em>not<\/em> managing the firewall. <\/p>\n\n\n\n<p>What the heck was updating the firewall? What has changed in the last few weeks?<\/p>\n\n\n\n<p>Windows admins are used to another thing (other than DNS) breaking our networks: security software. I quickly checked the system tray and saw a product name that screamed security. I messaged the partner on Teams and got a quick response &#8220;yes, it&#8217;s a security product and it recently got an update&#8221;. A quick check online and I found that this product does activate Windows Firewall. Ah &#8211; finally we found the root cause, not just the effect.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Lesson<\/h2>\n\n\n\n<p>Azure gives us tools. Copilot can be super cool at debugging confusing errors. But what do you do when 1 + 1 = 4096? There is nothing like a techie that learned how the fundamentals work, including the <strong>old <\/strong>fundamentals, has been burned in the past, and has learned how to troubelshoot, even when the assumed basics (monitoring and guest OS access) are not there. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>It&#8217;s easy to blame Azure when something goes wrong. But sometimes, Azure isn&#8217;t at fault. Sometimes, the problem is old-school. The trick in solving the problem is knowing how to diagnose and fix it. Background I helped an Irish Microsoft partner with some Azure VM-based work about a month ago. The partner needed some Azure &hellip; <a href=\"https:\/\/aidanfinn.com\/?p=24725\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;It&#8217;s Not Always Azure&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":18757,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[5],"tags":[425,352,359,197],"class_list":["post-24725","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-azure","tag-network-watcher","tag-rdp","tag-virtual-machine","tag-windows-server"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/aidanfinn.com\/wp-content\/uploads\/2015\/08\/windows-server-blue-a517bed8722d2e781.jpg","amp_enabled":true,"_links":{"self":[{"href":"https:\/\/aidanfinn.com\/index.php?rest_route=\/wp\/v2\/posts\/24725","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/aidanfinn.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aidanfinn.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aidanfinn.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/aidanfinn.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=24725"}],"version-history":[{"count":21,"href":"https:\/\/aidanfinn.com\/index.php?rest_route=\/wp\/v2\/posts\/24725\/revisions"}],"predecessor-version":[{"id":24748,"href":"https:\/\/aidanfinn.com\/index.php?rest_route=\/wp\/v2\/posts\/24725\/revisions\/24748"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/aidanfinn.com\/index.php?rest_route=\/wp\/v2\/media\/18757"}],"wp:attachment":[{"href":"https:\/\/aidanfinn.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=24725"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aidanfinn.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=24725"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aidanfinn.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=24725"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}