Monitoring & Alerting for Windows Defender in Azure VMs

In this post, I will explain how one can monitor Windows Defender and create incidents for it with Azure VMs.

Background

Windows Defender is built into Windows Server 2016 and Windows Server 2019. It’s free and pretty decent. But it surprises me how many of my customers (all) choose Defender over third-parties for their Azure VMs … with no coaching/encouragement from me or my colleagues. There is an integration with the control plane using the antimalwareagent extension. But the level of management is poor-none. There is a Log Analytics solution, but solutions are deprecated and, last time I checked, it required the workspace to be in per-node pricing mode. So I needed something different to operationalise Windows Defender with Azure VMs.

Data

At work, we always deploy the Log Analytics extension with all VMs – along with the antimalware extension and a bunch of others. We also enable data collection in Azure Security Center. We use a single Log Analytics workspace to enable the correlation of data and easy reporting/management.

I recently found out that a table in Log Analytics called ProtectionStatus contains a “heartbeat” record for Windows Defender. Approximately every hour, a record is stored in this table for every VM running Windows Defender. In there, you’ll find some columns such as:

  • DeviceName: The computer name
  • ThreatStatusRank: A code indicating the health of the device according to defender:
    • 150: Health
    • 470: Unknown (no extension/Defender)
    • 350: Quarantined malware
    • 550: Active malware
  • ThreatStatus: A description for the above code
  • ThreatStatusDetails: A longer description
  • And more …

So you can see that you can search this table for malware infection records. First thing, though, is to filter out the machines/records reporting that there is no Defender (Linux machines, for example):

let all_windows_vms =
Heartbeat
| where TimeGenerated > now(-7d)
| where OSType == 'Windows'
| summarize makeset(Resource);
ProtectionStatus
| where Resource in (all_windows_vms)
| sort by TimeGenerated desc

The above will find all active Windows VMs that have been reporting to Log Analytics via the extension heartbeat. Then we’ll store that data in a set, and search that set. Now we can extend that search, for example finding all machines with a non-healthy state (150):

let all_windows_vms = Heartbeat
| where TimeGenerated > now(-7d)
| where OSType == 'Windows'
| summarize makeset(Resource);
ProtectionStatus
| where Resource in (all_windows_vms)
| where ThreatStatusRank <> 150
| sort by TimeGenerated desc

Testing

All the tech content here will be useless without data. So you’ll need some data! Search for the Eicar test string/file and start “infecting” machines – be sure to let people know if there are people monitoring the environment first.

Security Center

Security Center will record incidents for you:

You will get email alerts if you have configured notifications in the subscription’s Security Center settings. Make sure the threshold is set to LOW.

If you want an alternative form of alert then you can use a Log Analytics alert (Scheduled Query Alert resource type) based on the below basic query:

SecurityAlert 
| where TimeGenerated > now(-5m)
| where VendorName == 'Microsoft Antimalware'

The above query will search for Windows Defender alerts stored in Log Analytics (by Security Center) in the last 5 minutes. If the threshold is freater than 0 then you can trigger an Azure Monitor Action Group to tell whomever or start whatever task you want.

Workbooks

Armed with the ability to query the ProtectionStatus table, you can create your own visualisations for easy reporting on Windows Defender across many machines.

 

The pie chart is made using this query:

let all_windows_vms =
Heartbeat
| where TimeGenerated > now(-7d)
| where OSType == 'Windows'
| summarize makeset(Resource);
ProtectionStatus
| where TimeGenerated > now(-7d)
| where Resource in (all_windows_vms)
| where ThreatStatusRank <> '150'
| summarize count(Threat) by Threat

With some reading and practice, you can make a really fancy workbook.

Azure Sentinel

I have enabled the Entity Behavior preview.

Azure Sentinel is supposed to be the central place to monitor all security events, hunt for issues, and where to start investigations – that latter thanks to the new Entity Behavior feature. Azure Sentinel is powered by Log Analytics – if you have data in there then you can query that data, correlate it, and do some clever things.

We have a query that can search for malware incidents reported by Windows Defender. What we will do is create a new Analytic Rule that will run every 5 minutes using 5 minutes of data. If the results exceed 0 (threshold greater than 0) then we will create an incident.

let all_windows_vms =
Heartbeat
| where TimeGenerated > now(-7d)
| where OSType == 'Windows'
| summarize makeset(Resource);
ProtectionStatus
| where TimeGenerated > now(-5m)
| where Resource in (all_windows_vms)
| where ThreatStatus <> 'No threats detected' or ThreatStatusRank <> '150' or Threat <> ''
| sort by Resource asc
| extend HostCustomEntity = Computer

The last line is used to identity an entity. Optionally, we can associate a logic app for an automated response. Once that first malware detection is found:

You can do the usual operational stuff with these incidents. Note that this data is recorded and your effectiveness as a security organisation is visible in the Security Efficiency Workbook in Azure Sentinel – even the watchers are watched! If you open an incident you can click investigate which opens a new Investigation screen that leverages the Entity Behavior data. In my case, the computer is the entity.

The break-out dialogs allow me to query Log Analytics to learn more about the machine and its state at the time and the state of Windows Defender. For example, I can see who was logged into the machine at that time and what processes were running. Pretty nice, eh?

 

Enabling NSG Traffic Analytics Fails

This post will deal with a scenario where you get this error when attempting to enable NSG Traffic Analytics with a Log Analytics Workspace:

Failed to save flow log settings
Failed to update flow logs settings for ‘NSG-NAME’. Error: An error occurred..

NSG Traffic Analytics

I work mostly in Azure networking these days. My customers are typically larger enterprises that are focused on governance and security. When you build Azure network architecture for these kinds of organisations, the networks have many pieces to make micro-segmented security a reality. And that means you need to be able to troubleshoot NSG rules and routing. I find the troubleshooting tools in Network Watcher to be useless. Instead, I use:

  • My own understanding to make up a mental map of the effective routes for the subnet – because this is missing in Azure unless you have an allocated VM NIC in that subnet (often the case)
  • Azure Firewall’s logs
  • NSG Traffic Analytics logs in a Log Analytics Workspace

In my architecture, there is a single, central Log Analytics Workspace that is in a different subscription to the virtual networks/NSGs. And this is where the problem is rooted.

Symptoms

When you attempt to enable Traffic Analytics you get the above error. Interestingly, if you only attempt to enable NSG Flow Logs (data logged to storage account) there is no problem. So the issue is related to getting the Workspace configured as a part of the solution (NSG Traffic Analytics).

The Problem & Fix

The problem is that the Microsoft.Network resource provider must be enabled in the subscription that the Workspace is located in. In my case, as I said, I have a dedicated management subscription so there are no network resources to require/enable that resource provider automatically.

If you go to Subscriptions > Resource Providers in the Azure Portal, you can enable the provider there. Wait (no more than 15 minutes) and things should be OK then.

Thanks to Dalan in Azure Networking for helping fix this one!