Skip to main content

DR Testing

A backup that has never been restored is an assumption, not a guarantee. DR testing automates the process of verifying that your backups are actually restorable — not just that the backup job completed successfully.

Why Regular DR Testing Matters

Backup jobs can succeed while producing corrupt data. Reasons include:

  • Storage corruption at the block level (silent bit rot)
  • Encryption key mismatch (backup written with key version N, key rotated, restore fails)
  • Incomplete incremental chain (a missed incremental makes the chain unrestorable)
  • Agent version incompatibility (backup written by agent v1.x, restore attempted on v2.x)

A backup health score of 100% means jobs completed — it does not mean restores will succeed. DR testing closes this gap by actually executing a restore and confirming the data is accessible.

Scheduled DR Test Runs

Configure automated monthly DR tests for any device or group of devices:

  1. Go to Backups consoleDR TestingNew Test Schedule
  2. Select the devices to test (individual devices, a company, or all devices)
  3. Set the test frequency (default: monthly, first Sunday of the month)
  4. Choose the test type:
    • File-level restore test — Restores a sample of files to an isolated environment and verifies SHA-256 hashes
    • Full image test — Restores the complete disk image to an isolated Azure VM and boots it
  5. Configure the notification recipients for the DR test report
  6. Save the schedule

On the scheduled day, the system:

  1. Selects the most recent full backup snapshot for each device
  2. Initiates a restore to an isolated Azure environment
  3. Verifies the restored data (file hash check or boot confirmation)
  4. Records the result per device
  5. Sends the DR test report to the configured recipients
  6. Tears down the isolated environment
ℹ️The isolated Azure VM environment used for full image tests is provisioned and torn down per test run. There is no persistent test infrastructure to maintain. VM compute costs during the test window are included in the platform fee.

DR Test Report

The DR test report shows results for every device that was tested:

ColumnDescription
DeviceHostname and device ID
Backup DateThe snapshot used for the test
Test TypeFile-level or full image
ResultPass / Fail / Error
Files VerifiedNumber of files successfully restored and hash-verified
Failure DetailError message if the test failed
Test DurationTime from restore start to completion

A passing DR test means the backup was successfully restored and the data integrity check passed. A failing test means there is a real problem with the backup — investigate immediately.

Manual DR Test

Run a DR test on demand for any device:

  1. Go to Catalog → select the device
  2. Click Test Restore
  3. Select the snapshot to test
  4. Choose test type (file-level or full image)
  5. Confirm — the test job is queued

Results appear in the DR Testing tab when the job completes.

DR Test Environment

Full image tests restore to an isolated Azure VM:

  • The VM is created in a network-isolated environment with no internet access and no connectivity to your production network
  • The restored OS boots and the platform confirms the boot succeeded by checking for an agent heartbeat
  • The VM is terminated within 30 minutes of the test completing

File-level tests restore to an isolated Azure Blob container:

  • Files are restored from the backup catalog to a temporary container
  • SHA-256 hashes are verified against the catalog entries
  • The container is deleted after verification

Neither test environment has access to your production systems. There is no risk of the restored environment interfering with live devices.

DR Runbook Template

Use the following template as a starting point for documenting client-specific recovery procedures. Customize it for each client and store it in your PSA documentation.

CLIENT DR RUNBOOK — [Client Name]
Last Tested: [Date]
Next Test: [Date]

CRITICAL SYSTEMS
[ ] Server: [hostname] — Backup policy: [policy name] — RTO: [hours]
[ ] Server: [hostname] — Backup policy: [policy name] — RTO: [hours]

RECOVERY PRIORITY ORDER
1. [System] — Business impact if down: [description]
2. [System] — Business impact if down: [description]

RECOVERY PROCEDURES
--- File-Level Recovery ---
1. Open Backups console → Catalog → [device]
2. Select snapshot from [date/time]
3. Restore to [target path] on [target device]
4. Verify with [application team / end user]

--- Server Recovery (Bare Metal) ---
1. Provision replacement hardware or create Azure VM
2. Boot from Backups rescue media (ISO from [location])
3. Connect to backup catalog for [device]
4. Restore full image from [snapshot date]
5. Boot and verify with [system owner]

--- SaaS Recovery (M365) ---
1. Open Backups console → SaaS → [connection name]
2. Browse → [user]
3. Restore All from [snapshot date]
4. Verify with [user]

CONTACTS
MSP escalation: [name] [phone]
Client IT contact: [name] [phone]
Vendor support: [vendor] [number]

LAST DR TEST RESULTS
Date: [date]
Devices tested: [N]
Pass: [N] | Fail: [N]
Notes: [any failures or observations]

Failure Response

If a DR test fails:

  1. Do not wait for the next scheduled test — investigate immediately
  2. Check the failure detail in the test report for the specific error
  3. Run a manual restore of the affected file or system to confirm the failure
  4. Common causes: see Troubleshooting
  5. After resolving the root cause, run another manual DR test to confirm the fix
  6. Update the DR test report status

A failed DR test means that if a real disaster occurred today, that device could not be recovered from backup. Treat this as a critical incident.