Incident Severity Matrix Template
Feb 26, 2026 | by openstatus | [template]
Use these templates when classifying and communicating production incidents. Based on real-world patterns from GitHub, Stripe, and Vercel status pages. Use the interactive builder to classify incidents and customize thresholds for your team.
When to Use This Guide
- An incident has just been detected and you need to classify it fast
- You're writing a status page update and want consistent, professional language
- You're setting up your team's incident response process for the first time
- A postmortem revealed your classification or communication was inconsistent
Severity Matrix
Copy these tables into your runbook, wiki, or Notion page.
Each row maps a severity level to its user impact threshold, required response time, and communication protocol. Classification should be deterministic — given the same inputs, every engineer on your team should reach the same row.
| Severity | Users Affected | Security | Response Time | Status Page Label | Communication | Postmortem |
|----------|---------------|----------|---------------|-------------------|---------------|------------|
| 🔴 SEV0 – Critical | ≥80% OR security incident | Yes | 15 minutes | Major Outage | Immediate public update + all-hands | Required |
| 🟠 SEV1 – High | ≥50% | No | 30 minutes | Partial Outage | Public update within 15 min | Required |
| 🟡 SEV2 – Medium | ≥10% | No | 2 hours | Degraded Performance | Status page update + ticket | Required (team) |
| 🟢 SEV3 – Low | <10% | No | 1 business day | Minor Issue | Internal ticket only | Optional |
Once the severity is set, this table tells every engineer exactly who owns it and when to escalate. If an incident isn't resolved within the auto-escalate window, re-classify upward immediately — don't wait.
| Severity | First Response | Update Cadence | Escalation Path | Auto-Escalate If |
|----------|---------------|----------------|-----------------|-----------------|
| SEV0 | 15 min | Every 15 min | VP Engineering + on-call | — |
| SEV1 | 30 min | Every 30 min | Engineering lead | Not resolved in 2h → SEV0 review |
| SEV2 | 2 hours | Every 2 hours | Team lead | Not resolved in 4h → SEV1 |
| SEV3 | 1 business day | Daily | Assigned engineer | Spreads to more systems → re-classify |
Status Page Message Templates
Never go more than one hour without a public update on any active SEV0 or SEV1. Even if there's no new information, a "we're still investigating" update is better than silence.
SEV0 – Critical
Investigating
We are investigating a critical incident affecting [description of impact]. Our on-call team has been paged and we are actively working to identify the root cause. Next update in 15 minutes.
Identified
We have identified the root cause as [brief description]. A fix is being deployed. We will continue to provide updates every 15 minutes until service is fully restored.
Monitoring
A fix has been deployed. We are monitoring for full recovery and are seeing signs of improvement. Next update in 15 minutes.
Resolved
This incident has been resolved. All systems are operating normally. We apologize for the disruption. A postmortem will be published within 48 hours.
SEV1 – High
Investigating
We are investigating degraded performance affecting [description of impact]. Our team is actively working on this. Next update in 30 minutes.
Identified
We have identified the cause of the degradation. A fix is in progress. Next update in 30 minutes.
Monitoring
A fix has been deployed and we are monitoring for full recovery. Next update in 30 minutes.
Resolved
Service has been fully restored. We apologize for the disruption. Our engineering team will publish a postmortem with root cause and action items.
SEV2 – Medium
Investigating
We are investigating reports of degraded performance affecting a subset of users. Core functionality remains available. We will provide an update within 2 hours.
Identified
We have identified the root cause. A fix is being prepared and we expect resolution within [timeframe].
Monitoring
A fix has been deployed. We are monitoring for full resolution.
Resolved
This issue has been resolved. All systems are operating normally.
SEV3 – Low
SEV3 incidents typically do not require a public status page update. If you choose to communicate externally, use these templates.
Investigating
We are aware of a minor issue affecting a small number of users. Core functionality is not impacted. No immediate action is required on your end.
Identified
We have identified the cause. A fix will be deployed in the normal course of work.
Monitoring
A fix has been deployed and we are monitoring for full resolution.
Resolved
This minor issue has been resolved.
Postmortem Requirements
Closing the incident loop with a postmortem prevents recurrence and builds team knowledge.
| Severity | Postmortem | Who attends | Timeline |
|---|---|---|---|
| SEV0 | Required | Full engineering leadership | Within 48 hours |
| SEV1 | Required | Engineering leads + on-call | Within 72 hours |
| SEV2 | Required (team) | Relevant engineering team | Within 1 week |
| SEV3 | Optional | Assigned engineer | As needed |
When you post the "Resolved" update for a SEV0 or SEV1, commit to the postmortem publicly: "A postmortem will be published at [link] within 48 hours." This sets expectations and holds the team accountable.
Severity vs Priority
Severity and priority are often conflated, and that conflict tends to surface during live incidents at exactly the wrong time.
Severity is objective — it measures blast radius: how many users are affected and how badly. It does not change based on who's asking.
Priority is contextual — it reflects how urgently the team should act given business context. The same severity level can warrant different priorities.
Priority levels run P0 (drop everything) through P3 (low urgency), independent of severity:
| Scenario | Severity | Priority | Why |
|---|---|---|---|
| API down for 90% of users | SEV0 | P0 | Total outage + business impact |
| Button misaligned on pricing page | SEV3 | P1 | Low impact but costs conversions |
| Slow dashboard for 5% of users | SEV2 | P2 | Limited impact, no SLA risk |
| Auth bug during enterprise demo | SEV2 | P0 | Low blast radius, high business risk |
Classify severity based on impact. Decide priority separately during triage. A higher severity level also grants broader authority to take riskier recovery actions — a SEV0 may justify taking a service down entirely to restore stability.
Roles During a Severity Incident
Assign an Incident Commander at the start of every SEV0 or SEV1. One person owns:
- The current severity classification
- All public status page updates
- The escalation decision
This prevents contradictory messages on your status page. If two engineers post updates independently, users see conflicting information at the worst possible time. The IC is the single source of truth for external communication until the incident is resolved.
For SEV2 and below, the assigned engineer handles communication directly without a dedicated IC.
Real-World Examples
Database cluster failover
Scenario: Primary database fails over to replica. 60% of users experience 2 minutes of downtime. No data loss. Classification: 🟠 SEV1 – High (60% users affected)
Full communication arc:
Investigating
We are investigating degraded performance affecting the majority of users. Our team is actively working to restore service. Next update in 30 minutes.
Identified
We have identified the cause as a database failover. Service is being restored and we are monitoring recovery. Next update in 30 minutes.
Monitoring
The failover has completed and service is recovering. We are monitoring to confirm full stability. Next update in 30 minutes.
Resolved
Service has been fully restored. We apologize for the disruption. Our engineering team will publish a postmortem with root cause and action items.
API authentication breach
Scenario: Unauthorized access detected on API keys. Only 5% of users affected, but security is compromised. Classification: 🔴 SEV0 – Critical (security override)
We are investigating a security incident affecting API authentication. Impacted API keys have been revoked as a precaution. Our security team is actively investigating. Next update in 15 minutes.
CDN edge node degradation
Scenario: One CDN region serving stale assets. 15% of users see outdated content. Workaround: hard refresh. Classification: 🟡 SEV2 – Medium (15% users affected)
We are investigating reports of stale content being served to users in [region]. A workaround is available: clear your browser cache or perform a hard refresh. We will provide an update within 2 hours.
Payment processor timeout
Scenario: Stripe webhook failures causing 90% checkout failures. SLA breach triggered. Classification: 🔴 SEV0 – Critical (90% users affected)
We are investigating a critical issue affecting checkout. The majority of payment attempts are currently failing. Our team is working urgently with our payment provider to restore service. Next update in 15 minutes.
CSS regression on settings page
Scenario: Button misaligned on settings page. 3% of users affected. Functional workaround exists. Classification: 🟢 SEV3 – Low (3% users affected) Status page message: No public update needed. Internal ticket created and assigned.
Tips for Using the Severity Matrix
- During a live incident, err high. Over-escalating one incident costs less than under-escalating and letting user impact compound. Calibrate downward in the postmortem if you overshot.
- Watch for severity inflation. If your team regularly classifies 30%+ of incidents as SEV1, you've lost the signal. Measurable thresholds prevent this from drifting over time.
- Auto-escalate on duration. If a SEV2 is not resolved within 4 hours, trigger a SEV1 review. Unresolved incidents spreading to new systems always warrant re-classification.
- Use UTC in all public updates. For distributed teams and global users, UTC timestamps remove timezone ambiguity and prevent confusion during long-running incidents.
- Pin the matrix where it will be found. In your incident Slack channel, your runbook, and your on-call documentation. A severity matrix only works if engineers can find it in the first 30 seconds of an incident.
- Review thresholds quarterly. Or after any incident where the classification was debated. If your team is consistently arguing about SEV1 vs SEV2, adjust the threshold.
Use the Incident Severity Matrix Builder to classify incidents interactively, test your thresholds against real scenarios, and customize the matrix for your team.