On-Call Reporting
On-Call provides operational metrics that help service managers understand incident volume, response performance, escalation patterns, and on-call burden distribution.
Dashboard Metrics
The Dashboard (/) shows live real-time metrics:
| Metric | Description |
|---|---|
| Active Incidents | Count of incidents in triggered or acknowledged state right now |
| Triggered | Incidents not yet acknowledged — actively escalating |
| Acknowledged | Incidents being actively worked |
| MTTA | Mean Time To Acknowledge — average minutes from incident creation to first acknowledgment (rolling window) |
| MTTR | Mean Time To Resolution — average minutes from incident creation to resolution (rolling window) |
| On-Call Now | Current on-call users across all active schedules with shift end times |
| Recent Incidents | Last 10 incidents with status, severity, and service |
The dashboard refreshes live — no page reload needed.
Reports Page
The Reports page (/reports) shows 30-day aggregate analytics:
Summary Metrics
| Metric | Description |
|---|---|
| Total Incidents | All incidents created in the last 30 days |
| Resolution Rate | Percentage of incidents that reached resolved status |
| MTTA | 30-day mean time to acknowledge (minutes) |
| MTTR | 30-day mean time to resolution (minutes) |
Incidents by Severity
Bar chart showing incident count per severity level over the 30-day window:
- Critical (red)
- High (orange)
- Medium (yellow)
- Low (blue)
- Info (gray)
Use this to understand your alert mix and whether your escalation policies are calibrated appropriately for the volume at each severity level.
Incidents by Service
Summary table showing incident count per service. Identifies which services are generating the most pages — useful for routing changes, alert threshold tuning, or staffing adjustments.
Reports API
For custom reporting and data export, use the Reports API:
Summary Report
GET /api/reports?report=summary&days=30
Response includes:
{
"total_incidents": 47,
"resolved_count": 42,
"resolution_rate": 89.36,
"mtta_minutes": 8.3,
"mttr_minutes": 34.7,
"by_severity": {
"critical": 5,
"high": 18,
"medium": 20,
"low": 4
},
"by_service": {
"Tier 1 Alerts": 28,
"Defend Critical": 12,
"Backup Failures": 7
}
}
Timeline Report
GET /api/reports?report=timeline&days=30
Returns daily incident counts for the requested window — useful for building trend charts in external dashboards.
Incident List (Default)
GET /api/reports
Returns the full incident list for the last 30 days with all fields, suitable for export to CSV or BI tools.
Escalation Rate
To calculate your escalation rate (incidents that required more than one escalation step):
- Export the incident list via
GET /api/reports - Filter for incidents where
current_escalation_step > 0at resolution time - Divide by total incidents
A high escalation rate (>20%) indicates your on-call technicians are not responding within your defined delay windows — consider shortening delays, adding backup notification methods, or reviewing coverage.
On-Call Burden Analysis
To understand how on-call load is distributed across your team:
- Export the incident list for a reporting period
- Group by
acknowledged_byto see incident counts per technician - Compare to on-call hours covered by each technician (from the Schedules API)
Uneven burden distribution — one technician handling 70% of incidents — is a signal to rebalance the rotation or add team members.
Bridge API for External Reporting
For integration with external analytics tools (Power BI, Grafana, Tableau):
GET /api/bridge/incidents
X-Integration-Key: <key>
X-Tenant-Id: <tenant_id>
Returns incidents in a normalized format suitable for BI ingestion.
Interpreting MTTA and MTTR
MTTA (Mean Time To Acknowledge)
MTTA measures how quickly your on-call team responds to pages. Industry benchmarks for MSPs:
| MTTA | Assessment |
|---|---|
| < 5 minutes | Excellent |
| 5–15 minutes | Good |
| 15–30 minutes | Needs improvement |
| > 30 minutes | Critical gap |
High MTTA often indicates: technicians not receiving notifications (notification config issue), escalation delays set too long, or understaffed coverage windows.
MTTR (Mean Time To Resolution)
MTTR measures how long it takes to fully resolve incidents after they are created. This includes both response time and fix time.
| MTTR | Assessment |
|---|---|
| < 30 minutes | Excellent for most alert types |
| 30–60 minutes | Good |
| 1–4 hours | Acceptable for complex issues |
| > 4 hours | Review resolution workflows |
Very low MTTR may indicate issues are being resolved without full investigation (premature resolution). Very high MTTR may indicate incidents are being acknowledged but not actively worked.