What Are Latency Percentiles?

A latency percentile tells you the response time below which a given percentage of requests fall.

  • P50 (median) โ€” 50% of requests complete in this time or faster. This is the "typical" experience.
  • P75 โ€” 75% of requests are at or below this value. Getting into the slower end of normal.
  • P90 โ€” 90% of requests are at or below this. One in ten users is slower than this.
  • P95 โ€” 19 out of 20 requests are at or below this. Used for SLAs in many SaaS products.
  • P99 โ€” 99% of requests are at or below this. Only your slowest 1% are worse. This is where you find your most frustrated users.
  • P999 (P99.9) โ€” Used in high-throughput systems like payment processors or trading platforms where even the rarest outliers matter.

The math is straightforward: sort all your response times for a given period, then find the value at the relevant position in that sorted list. For 1,000 requests, the P99 is the response time of the 990th entry.

Why the Average Is a Terrible Metric

Here's a concrete example. Imagine an API with these 10 response times (in milliseconds):

12, 14, 11, 13, 15, 12, 14, 13, 11, 890

Average: 100.5ms  |  P50: 13ms  |  P99: ~890ms

The average is dragged up by that single 890ms outlier, making it look like performance is mediocre across the board. But in reality, 9 out of 10 users are getting sub-15ms responses. The average obscures both the good (most users are happy) and the bad (some users are waiting almost a second).

Now invert the scenario. What if 1,000 requests have most at 50ms, but 10 requests time out at 30,000ms? The average might be 340ms โ€” alarming on the surface โ€” but your P99 tells you those 10 requests are catastrophically broken while everything else is fine. Only percentiles give you the information to tell these two situations apart.

P99 Is Where User Frustration Lives

There's a useful mental model here: your P50 is what your happy users experience. Your P99 is what your unhappy users experience. Both matter, but your P99 is the one most likely to generate support tickets, churn, and negative reviews.

Research from Google and Amazon has consistently shown that users begin to notice latency around 100โ€“200ms and start abandoning pages around 1โ€“3 seconds. If your P99 is 2.5 seconds and you're only watching the P50, you're blind to the experience of 1 in every 100 users โ€” and in a system handling 1 million requests per day, that's 10,000 users with a bad experience every single day.

How to Set Latency SLOs Using Percentiles

Service Level Objectives (SLOs) should be tied to percentiles, not averages. Here's how most SRE teams structure them:

SLOMetricTypical Target
Availability% of successful requestsโ‰ฅ 99.9%
Latency (typical)P50 response timeโ‰ค 100ms
Latency (acceptable)P90 response timeโ‰ค 300ms
Latency (ceiling)P99 response timeโ‰ค 1,000ms

The specific numbers will vary based on your product, but the structure is almost universal. You define what "good enough" looks like at the median, at the 90th percentile, and at the 99th percentile, then alert when any of them breach their budget.

Calculating Percentiles from Your Own Data

If you have a raw list of API response times (from a load test, APM export, or server logs), you can calculate percentiles in a few ways.

In Python:

import numpy as np

latencies = [12, 14, 11, 13, 15, 12, 14, 13, 11, 890, 22, 18, 16, 45, 88]

print(f"P50:  {np.percentile(latencies, 50):.1f}ms")
print(f"P90:  {np.percentile(latencies, 90):.1f}ms")
print(f"P95:  {np.percentile(latencies, 95):.1f}ms")
print(f"P99:  {np.percentile(latencies, 99):.1f}ms")

In a shell one-liner (sort + awk):

cat latencies.txt | sort -n | awk '
  BEGIN { lines=0 }
  { data[lines++]=$1 }
  END {
    print "P50:", data[int(lines*0.50)]"ms"
    print "P90:", data[int(lines*0.90)]"ms"
    print "P99:", data[int(lines*0.99)]"ms"
  }
'
โšก

DevOpsArsenal Latency Percentile Calculator

Paste your raw latency values and get a full P50/P90/P95/P99 breakdown instantly โ€” no Python, no spreadsheets, no code. Useful during incident investigations or after a load test.

Try Latency Calculator Free โ†’

What to Do When Your P99 Is High

Finding a bad P99 is only step one. Here's how to dig into it:

  • Correlate with infrastructure events โ€” Is the P99 spike aligned with garbage collection pauses, database connection pool exhaustion, or autoscaling events?
  • Check for slow query patterns โ€” In database-backed APIs, P99 spikes almost always point to a missing index or a query that occasionally does a full table scan.
  • Look at resource contention โ€” CPU throttling in containers (CPU limits set too low) is a common, non-obvious cause of tail latency spikes.
  • Examine retry storms โ€” If your P99 latency triggers client retries, those retries can create a feedback loop that makes the P99 worse.
  • Isolate by endpoint โ€” An aggregate P99 across all endpoints can be misleading. Break it down โ€” often one or two slow endpoints drag up the entire service's tail latency.

Frequently Asked Questions

What is P99 latency in simple terms? โ–ผ
P99 latency is the response time that 99% of your requests complete within. If your P99 is 500ms, it means 99 out of every 100 requests finish in under half a second โ€” and 1 in 100 takes longer.
Is P99 more important than average latency? โ–ผ
For user-facing systems, yes. The average smooths out outliers and gives a misleadingly optimistic view of performance. P99 captures the experience of your slowest users, which is where dissatisfaction and churn originate.
What is a good P99 latency for an API? โ–ผ
It depends on the use case. For interactive web APIs, under 500ms P99 is generally acceptable, under 200ms is good, and under 100ms is excellent. For background processing or batch jobs, 2 to 5 seconds P99 may be perfectly fine.
How is P99 different from P99.9? โ–ผ
P99 means 1 in 100 requests is slower. P99.9 (also written P999) means 1 in 1,000 requests is slower. P99.9 is used in very high-throughput systems โ€” databases, payment processors, real-time bidding โ€” where even rare outliers have business impact.
Average response time is a comfortable metric โ€” easy to display on a dashboard and easy to celebrate. But it's a poor proxy for user experience. P99 is uncomfortable because it forces you to look at your worst cases. Start by adding P50, P90, and P99 to your existing dashboards. Set SLO thresholds. Alert on P99 breaches. Your average might look fine. Your P99 will tell you the truth.