I’ve been asked about resource requirements for the dedupe optimization job before but I did not have the answer before now.
The CPU side is … not clear. The dedupe subsystem will schedule one single-threaded job per volume. That means a machine with 8 logical processors is only 1/8th utilized if there is a single data volume. Microsoft says:
To achieve optimal throughput, consider configuring multiple deduplication volumes, up to the number of CPU cores on the file server.
That seems pretty dumb to me. “Go ahead and complicate volume management to optimize the dedupe processing”. Uhhhhh, no thanks.
Microsoft tells us that 1-2 GB RAM is used per 1 TB of data per volume. They clarify this with an example:
|Volume||Volume size||Memory used|
|Volume 1||1 TB||1-2 GB|
|Volume 2||1 TB||1-2 GB|
|Volume 3||2 TB||2-4 GB|
|Total for all volumes||1+1+2 * 1GB up to 2GB||4 – 8 GB RAM|
By default a server will limit the RAM used by the optimization job to 50% of total RAM in the server. So if the above server had just 4 GB RAM, then only 2 GB would be available for the optimization job. You can manually override this:
Start-Dedupjob <volume> -Type Optmization -Memory <50 to 80>
There is an additional note from Microsoft:
Machines where very large amount of data change between optimization job is expected may require even up to 3 GB of RAM per 1 TB of diskspace.
So you might see RAM become a bottleneck or increase pressure (in a VM with Dynamic Memory) if the optimization job hasn’t run in a while or if lots of data is dumped into a deduped volume. Example: you have deployed lots of new personal (dedicated) VMs for new users on a deduped volume.