Skip to main content

On-Call Reporting

On-Call provides operational metrics that help service managers understand incident volume, response performance, escalation patterns, and on-call burden distribution.

Dashboard Metrics

The Dashboard (/) shows live real-time metrics:

MetricDescription
Active IncidentsCount of incidents in triggered or acknowledged state right now
TriggeredIncidents not yet acknowledged — actively escalating
AcknowledgedIncidents being actively worked
MTTAMean Time To Acknowledge — average minutes from incident creation to first acknowledgment (rolling window)
MTTRMean Time To Resolution — average minutes from incident creation to resolution (rolling window)
On-Call NowCurrent on-call users across all active schedules with shift end times
Recent IncidentsLast 10 incidents with status, severity, and service

The dashboard refreshes live — no page reload needed.

Reports Page

The Reports page (/reports) shows 30-day aggregate analytics:

Summary Metrics

MetricDescription
Total IncidentsAll incidents created in the last 30 days
Resolution RatePercentage of incidents that reached resolved status
MTTA30-day mean time to acknowledge (minutes)
MTTR30-day mean time to resolution (minutes)

Incidents by Severity

Bar chart showing incident count per severity level over the 30-day window:

  • Critical (red)
  • High (orange)
  • Medium (yellow)
  • Low (blue)
  • Info (gray)

Use this to understand your alert mix and whether your escalation policies are calibrated appropriately for the volume at each severity level.

Incidents by Service

Summary table showing incident count per service. Identifies which services are generating the most pages — useful for routing changes, alert threshold tuning, or staffing adjustments.

Reports API

For custom reporting and data export, use the Reports API:

Summary Report

GET /api/reports?report=summary&days=30

Response includes:

{
"total_incidents": 47,
"resolved_count": 42,
"resolution_rate": 89.36,
"mtta_minutes": 8.3,
"mttr_minutes": 34.7,
"by_severity": {
"critical": 5,
"high": 18,
"medium": 20,
"low": 4
},
"by_service": {
"Tier 1 Alerts": 28,
"Defend Critical": 12,
"Backup Failures": 7
}
}

Timeline Report

GET /api/reports?report=timeline&days=30

Returns daily incident counts for the requested window — useful for building trend charts in external dashboards.

Incident List (Default)

GET /api/reports

Returns the full incident list for the last 30 days with all fields, suitable for export to CSV or BI tools.

Escalation Rate

To calculate your escalation rate (incidents that required more than one escalation step):

  1. Export the incident list via GET /api/reports
  2. Filter for incidents where current_escalation_step > 0 at resolution time
  3. Divide by total incidents

A high escalation rate (>20%) indicates your on-call technicians are not responding within your defined delay windows — consider shortening delays, adding backup notification methods, or reviewing coverage.

On-Call Burden Analysis

To understand how on-call load is distributed across your team:

  1. Export the incident list for a reporting period
  2. Group by acknowledged_by to see incident counts per technician
  3. Compare to on-call hours covered by each technician (from the Schedules API)

Uneven burden distribution — one technician handling 70% of incidents — is a signal to rebalance the rotation or add team members.

Bridge API for External Reporting

For integration with external analytics tools (Power BI, Grafana, Tableau):

GET /api/bridge/incidents
X-Integration-Key: <key>
X-Tenant-Id: <tenant_id>

Returns incidents in a normalized format suitable for BI ingestion.

ℹ️Advanced reporting dashboards (on-call burden per technician, escalation rate over time, alert source volume trends) are on the roadmap for a future release. The Bridge API provides raw data for building these views externally today.

Interpreting MTTA and MTTR

MTTA (Mean Time To Acknowledge)

MTTA measures how quickly your on-call team responds to pages. Industry benchmarks for MSPs:

MTTAAssessment
< 5 minutesExcellent
5–15 minutesGood
15–30 minutesNeeds improvement
> 30 minutesCritical gap

High MTTA often indicates: technicians not receiving notifications (notification config issue), escalation delays set too long, or understaffed coverage windows.

MTTR (Mean Time To Resolution)

MTTR measures how long it takes to fully resolve incidents after they are created. This includes both response time and fix time.

MTTRAssessment
< 30 minutesExcellent for most alert types
30–60 minutesGood
1–4 hoursAcceptable for complex issues
> 4 hoursReview resolution workflows

Very low MTTR may indicate issues are being resolved without full investigation (premature resolution). Very high MTTR may indicate incidents are being acknowledged but not actively worked.