Severity levels aren't just bureaucratic labels. They are the trigger mechanism for your entire incident response process: who gets paged, how fast they need to respond, whether to wake up executives, and how you communicate with customers. A team that consistently misclassifies incidents either over-responds to minor blips (burning out engineers) or under-responds to real outages (burning trust with customers).
The SEV0–SEV5 Framework at a Glance
| Level | Name | Customer Impact | Response | Who Responds |
|---|---|---|---|---|
| SEV0 | Critical / P0 | Complete outage, all users affected | Immediate (<5 min) | All hands + leadership |
| SEV1 | Major | Core feature down, significant % affected | <15 min | On-call + team lead |
| SEV2 | Moderate | Feature degraded, workaround available | <30 min | On-call engineer |
| SEV3 | Minor | Non-core feature, most users unimpacted | <2 hrs | On-call engineer |
| SEV4 | Low | Cosmetic bug, no service disruption | Next business day | Standard ticket |
| SEV5 | Informational | No user impact, monitoring anomaly | Scheduled review | Engineering team |
SEV0 — The "All Hands" Situation
SEV0 is reserved for complete service failures that affect all or nearly all users. Think total payment processing outage, authentication service down (nobody can log in), or data loss events. These are the incidents that make it to the status page, the press, and the CEO's inbox.
Real examples:
- Your primary database region goes down and failover fails to trigger automatically
- A botched deployment takes the entire application offline across all regions
- A DDoS attack saturates your network and all endpoints return 503
- A security breach requiring immediate service shutdown
What SEV0 triggers: Immediate wake-up page to all on-call engineers. Engineering leadership notified within 5 minutes. Customer communication drafted within 15 minutes. War room opened. No new deployments until resolved and post-mortem is scheduled.
SEV0 is the one level where you shouldn't be making judgement calls in real-time. If you're unsure whether something is a SEV0, it almost certainly isn't — true SEV0 incidents are usually obvious.
SEV1 — Core Feature Down
SEV1 is the most common "middle-of-the-night" scenario. A core feature is broken or severely degraded. Not everything is down, but something important enough to cause real, measurable customer pain isn't working.
Real examples:
- Checkout flow is failing for 40% of users due to a payment gateway timeout
- Email notifications have stopped sending
- Search is returning no results due to an Elasticsearch indexing failure
- API response times have spiked to 8+ seconds (SLO breach) for a core endpoint
What differentiates SEV1 from SEV0: Users can still use your service, but something they rely on is broken. With SEV0, the product is essentially unusable.
Response expectations: Acknowledge within 15 minutes. First status update within 30 minutes. Resolution or rollback attempted within 1 hour.
SEV2 — Degraded Service
SEV2 incidents are real problems that affect a subset of users or reduce the quality of service without breaking core flows entirely. A workaround usually exists.
Real examples:
- Image uploads are failing but the rest of the product works fine
- Report generation is taking 5× longer than usual due to a slow database query
- A non-critical API endpoint is returning incorrect data for ~10% of requests
- A third-party integration (Slack, Zapier, etc.) is broken but native functionality is unaffected
SEV2s often don't require a middle-of-the-night response unless they're worsening rapidly. The on-call engineer should assess and decide whether to escalate or schedule a fix for the next business day.
SEV3–SEV5 — Minor to Informational
SEV3 is for bugs or degradations that cause inconvenience but no material impact on core user flows. These are handled during business hours: UI bugs in non-critical screens, slow admin dashboards, deprecated endpoints still being called by a small number of legacy clients.
SEV4 encompasses cosmetic bugs, minor UX issues, and small feature regressions that users may not even notice. These go into the standard backlog.
SEV5 (used by some teams) is for monitoring anomalies and informational alerts that don't indicate any current user impact — things like disk usage approaching a threshold, or an unusual but non-critical error rate spike that resolved itself.
The Most Common Severity Classification Mistakes
- Defaulting to SEV1 "to be safe." When engineers aren't sure, they often call SEV1. Over time this inflates on-call load and causes alert fatigue. If you find your team declaring multiple SEV1s a week, your classification criteria probably need tightening.
- Downgrading severity to avoid escalation. An engineer doesn't want to wake up the team lead, so they call SEV2 when the situation is genuinely SEV1. This delays resolution and makes the eventual escalation harder.
- Using revenue impact as the only criterion. A security incident that exposes a significant vulnerability might have zero immediate revenue impact but still warrants SEV1 treatment. Build your matrix around both customer impact and business risk.
- Not updating severity mid-incident. Incidents evolve. A SEV2 that turns out to be a database corruption issue should be upgraded to SEV1 or SEV0 as that becomes clear. Your incident process should explicitly allow severity upgrades and downgrades.
Building Your Own Severity Matrix
Every organisation's criteria will be slightly different. Use this structure as a starting point, then tailor the thresholds:
- What percentage of users are affected? (100%, 50%, 10%, <1%)
- Is core revenue-generating functionality broken?
- Is there a workaround available?
- Is data integrity or security at risk?
- Is the situation stable or worsening?
DevOpsArsenal Incident Severity Matrix Calculator
Answer a few questions about the incident's impact and get a recommended severity level instantly. Useful for onboarding new on-call engineers or for quickly double-checking a classification under pressure.
Try Severity Matrix Calculator →Communicating During an Incident
Severity levels should drive your external communication cadence. Under-communicating is almost always worse than over-communicating — customers who know what's happening are far more forgiving than customers left in the dark.
| Severity | Status Page | Customer Email | Executive Update |
|---|---|---|---|
| SEV0 | Within 10 min | Within 30 min | Within 15 min |
| SEV1 | Within 20 min | If duration > 1hr | If duration > 2hr |
| SEV2 | If duration > 2hr | Usually not | No |
| SEV3 | No | No | No |