What Are Latency Percentiles?
A latency percentile tells you the response time below which a given percentage of requests fall.
- P50 (median) โ 50% of requests complete in this time or faster. This is the "typical" experience.
- P75 โ 75% of requests are at or below this value. Getting into the slower end of normal.
- P90 โ 90% of requests are at or below this. One in ten users is slower than this.
- P95 โ 19 out of 20 requests are at or below this. Used for SLAs in many SaaS products.
- P99 โ 99% of requests are at or below this. Only your slowest 1% are worse. This is where you find your most frustrated users.
- P999 (P99.9) โ Used in high-throughput systems like payment processors or trading platforms where even the rarest outliers matter.
The math is straightforward: sort all your response times for a given period, then find the value at the relevant position in that sorted list. For 1,000 requests, the P99 is the response time of the 990th entry.
Why the Average Is a Terrible Metric
Here's a concrete example. Imagine an API with these 10 response times (in milliseconds):
12, 14, 11, 13, 15, 12, 14, 13, 11, 890
Average: 100.5ms | P50: 13ms | P99: ~890ms
The average is dragged up by that single 890ms outlier, making it look like performance is mediocre across the board. But in reality, 9 out of 10 users are getting sub-15ms responses. The average obscures both the good (most users are happy) and the bad (some users are waiting almost a second).
Now invert the scenario. What if 1,000 requests have most at 50ms, but 10 requests time out at 30,000ms? The average might be 340ms โ alarming on the surface โ but your P99 tells you those 10 requests are catastrophically broken while everything else is fine. Only percentiles give you the information to tell these two situations apart.
P99 Is Where User Frustration Lives
There's a useful mental model here: your P50 is what your happy users experience. Your P99 is what your unhappy users experience. Both matter, but your P99 is the one most likely to generate support tickets, churn, and negative reviews.
Research from Google and Amazon has consistently shown that users begin to notice latency around 100โ200ms and start abandoning pages around 1โ3 seconds. If your P99 is 2.5 seconds and you're only watching the P50, you're blind to the experience of 1 in every 100 users โ and in a system handling 1 million requests per day, that's 10,000 users with a bad experience every single day.
How to Set Latency SLOs Using Percentiles
Service Level Objectives (SLOs) should be tied to percentiles, not averages. Here's how most SRE teams structure them:
| SLO | Metric | Typical Target |
|---|---|---|
| Availability | % of successful requests | โฅ 99.9% |
| Latency (typical) | P50 response time | โค 100ms |
| Latency (acceptable) | P90 response time | โค 300ms |
| Latency (ceiling) | P99 response time | โค 1,000ms |
The specific numbers will vary based on your product, but the structure is almost universal. You define what "good enough" looks like at the median, at the 90th percentile, and at the 99th percentile, then alert when any of them breach their budget.
Calculating Percentiles from Your Own Data
If you have a raw list of API response times (from a load test, APM export, or server logs), you can calculate percentiles in a few ways.
In Python:
import numpy as np
latencies = [12, 14, 11, 13, 15, 12, 14, 13, 11, 890, 22, 18, 16, 45, 88]
print(f"P50: {np.percentile(latencies, 50):.1f}ms")
print(f"P90: {np.percentile(latencies, 90):.1f}ms")
print(f"P95: {np.percentile(latencies, 95):.1f}ms")
print(f"P99: {np.percentile(latencies, 99):.1f}ms")
In a shell one-liner (sort + awk):
cat latencies.txt | sort -n | awk '
BEGIN { lines=0 }
{ data[lines++]=$1 }
END {
print "P50:", data[int(lines*0.50)]"ms"
print "P90:", data[int(lines*0.90)]"ms"
print "P99:", data[int(lines*0.99)]"ms"
}
'
DevOpsArsenal Latency Percentile Calculator
Paste your raw latency values and get a full P50/P90/P95/P99 breakdown instantly โ no Python, no spreadsheets, no code. Useful during incident investigations or after a load test.
Try Latency Calculator Free โWhat to Do When Your P99 Is High
Finding a bad P99 is only step one. Here's how to dig into it:
- Correlate with infrastructure events โ Is the P99 spike aligned with garbage collection pauses, database connection pool exhaustion, or autoscaling events?
- Check for slow query patterns โ In database-backed APIs, P99 spikes almost always point to a missing index or a query that occasionally does a full table scan.
- Look at resource contention โ CPU throttling in containers (CPU limits set too low) is a common, non-obvious cause of tail latency spikes.
- Examine retry storms โ If your P99 latency triggers client retries, those retries can create a feedback loop that makes the P99 worse.
- Isolate by endpoint โ An aggregate P99 across all endpoints can be misleading. Break it down โ often one or two slow endpoints drag up the entire service's tail latency.