Severity levels aren't just bureaucratic labels. They are the trigger mechanism for your entire incident response process: who gets paged, how fast they need to respond, whether to wake up executives, and how you communicate with customers. A team that consistently misclassifies incidents either over-responds to minor blips (burning out engineers) or under-responds to real outages (burning trust with customers).

The SEV0–SEV5 Framework at a Glance

LevelNameCustomer ImpactResponseWho Responds
SEV0Critical / P0Complete outage, all users affectedImmediate (<5 min)All hands + leadership
SEV1MajorCore feature down, significant % affected<15 minOn-call + team lead
SEV2ModerateFeature degraded, workaround available<30 minOn-call engineer
SEV3MinorNon-core feature, most users unimpacted<2 hrsOn-call engineer
SEV4LowCosmetic bug, no service disruptionNext business dayStandard ticket
SEV5InformationalNo user impact, monitoring anomalyScheduled reviewEngineering team

SEV0 — The "All Hands" Situation

SEV0 is reserved for complete service failures that affect all or nearly all users. Think total payment processing outage, authentication service down (nobody can log in), or data loss events. These are the incidents that make it to the status page, the press, and the CEO's inbox.

Real examples:

  • Your primary database region goes down and failover fails to trigger automatically
  • A botched deployment takes the entire application offline across all regions
  • A DDoS attack saturates your network and all endpoints return 503
  • A security breach requiring immediate service shutdown

What SEV0 triggers: Immediate wake-up page to all on-call engineers. Engineering leadership notified within 5 minutes. Customer communication drafted within 15 minutes. War room opened. No new deployments until resolved and post-mortem is scheduled.

SEV0 is the one level where you shouldn't be making judgement calls in real-time. If you're unsure whether something is a SEV0, it almost certainly isn't — true SEV0 incidents are usually obvious.

SEV1 — Core Feature Down

SEV1 is the most common "middle-of-the-night" scenario. A core feature is broken or severely degraded. Not everything is down, but something important enough to cause real, measurable customer pain isn't working.

Real examples:

  • Checkout flow is failing for 40% of users due to a payment gateway timeout
  • Email notifications have stopped sending
  • Search is returning no results due to an Elasticsearch indexing failure
  • API response times have spiked to 8+ seconds (SLO breach) for a core endpoint

What differentiates SEV1 from SEV0: Users can still use your service, but something they rely on is broken. With SEV0, the product is essentially unusable.

Response expectations: Acknowledge within 15 minutes. First status update within 30 minutes. Resolution or rollback attempted within 1 hour.

SEV2 — Degraded Service

SEV2 incidents are real problems that affect a subset of users or reduce the quality of service without breaking core flows entirely. A workaround usually exists.

Real examples:

  • Image uploads are failing but the rest of the product works fine
  • Report generation is taking 5× longer than usual due to a slow database query
  • A non-critical API endpoint is returning incorrect data for ~10% of requests
  • A third-party integration (Slack, Zapier, etc.) is broken but native functionality is unaffected

SEV2s often don't require a middle-of-the-night response unless they're worsening rapidly. The on-call engineer should assess and decide whether to escalate or schedule a fix for the next business day.

SEV3–SEV5 — Minor to Informational

SEV3 is for bugs or degradations that cause inconvenience but no material impact on core user flows. These are handled during business hours: UI bugs in non-critical screens, slow admin dashboards, deprecated endpoints still being called by a small number of legacy clients.

SEV4 encompasses cosmetic bugs, minor UX issues, and small feature regressions that users may not even notice. These go into the standard backlog.

SEV5 (used by some teams) is for monitoring anomalies and informational alerts that don't indicate any current user impact — things like disk usage approaching a threshold, or an unusual but non-critical error rate spike that resolved itself.

The Most Common Severity Classification Mistakes

  1. Defaulting to SEV1 "to be safe." When engineers aren't sure, they often call SEV1. Over time this inflates on-call load and causes alert fatigue. If you find your team declaring multiple SEV1s a week, your classification criteria probably need tightening.
  2. Downgrading severity to avoid escalation. An engineer doesn't want to wake up the team lead, so they call SEV2 when the situation is genuinely SEV1. This delays resolution and makes the eventual escalation harder.
  3. Using revenue impact as the only criterion. A security incident that exposes a significant vulnerability might have zero immediate revenue impact but still warrants SEV1 treatment. Build your matrix around both customer impact and business risk.
  4. Not updating severity mid-incident. Incidents evolve. A SEV2 that turns out to be a database corruption issue should be upgraded to SEV1 or SEV0 as that becomes clear. Your incident process should explicitly allow severity upgrades and downgrades.

Building Your Own Severity Matrix

Every organisation's criteria will be slightly different. Use this structure as a starting point, then tailor the thresholds:

  • What percentage of users are affected? (100%, 50%, 10%, <1%)
  • Is core revenue-generating functionality broken?
  • Is there a workaround available?
  • Is data integrity or security at risk?
  • Is the situation stable or worsening?
🚨

DevOpsArsenal Incident Severity Matrix Calculator

Answer a few questions about the incident's impact and get a recommended severity level instantly. Useful for onboarding new on-call engineers or for quickly double-checking a classification under pressure.

Try Severity Matrix Calculator →

Communicating During an Incident

Severity levels should drive your external communication cadence. Under-communicating is almost always worse than over-communicating — customers who know what's happening are far more forgiving than customers left in the dark.

SeverityStatus PageCustomer EmailExecutive Update
SEV0Within 10 minWithin 30 minWithin 15 min
SEV1Within 20 minIf duration > 1hrIf duration > 2hr
SEV2If duration > 2hrUsually notNo
SEV3NoNoNo

Frequently Asked Questions

What is the difference between SEV1 and P1?
SEV (severity) and P (priority) are sometimes used interchangeably but technically measure different things. Severity describes the impact on users. Priority describes how urgently the team should respond. Most teams align them directly — SEV1 equals P1 — to avoid confusion, but larger organisations sometimes distinguish between them.
How many severity levels should we have?
Most teams find 4 to 5 levels to be the practical sweet spot. Fewer than 4 and you lose the granularity needed for consistent classification. More than 6 and engineers have to think too hard about distinctions mid-incident. Start with 4 or 5 and add levels only when you consistently find yourself debating where an incident sits.
Should every alert trigger an incident?
No. Alerts notify engineers of potential issues; incidents are the structured response to confirmed user impact. Not every alert will escalate to an incident, and that is by design. A noisy alerting setup that generates more incidents than the team can meaningfully respond to is itself an SRE problem worth addressing.
When should we write a post-mortem?
Most teams write blameless post-mortems for all SEV0 and SEV1 incidents, and optionally for SEV2s if the incident was novel or had an interesting failure mode. The goal is to learn, not to assign blame — your post-mortem process should make it psychologically safe for engineers to be honest about what went wrong.
A well-defined severity matrix is one of the highest-leverage investments a growing engineering team can make. It removes ambiguity at the worst possible time — 2 AM, under stress, with limited information. When everyone knows what a SEV1 looks like and exactly what it triggers, the team can move from "is this bad?" to "let's fix this" in seconds instead of minutes. Start with this framework, customise it for your product and team, and use the Incident Severity Matrix Calculator to help new team members classify incidents consistently.