📊 SRE

API Latency Percentile Calculator

Enter response times (one per line, in ms) to calculate P50, P90, P95, P99, P99.9 percentiles with a latency histogram. Essential for SLA planning and performance monitoring.

📖 How to Use
1
Paste response times in milliseconds — one per line
2
P50, P90, P95, P99 percentiles calculate automatically
3
View the histogram to see the latency distribution shape
📝 Examples
Healthy API
12, 15, 18, 22, 25, 14, 16, 45, 120, 8
P50: 16ms, P95: 82ms, P99: 120ms
📝 Response Times (ms, one per line)

What is a Latency Percentile Calculator?

A latency percentile calculator takes a set of response time measurements and computes the statistical distribution, specifically the percentile values that SRE and performance engineering teams use to measure and govern API behaviour. Unlike average latency, which can be misleading when outliers are present, percentile metrics describe what specific fractions of your user base actually experience. P99 latency, for example, tells you the response time that 99 out of every 100 requests fall under — a far more accurate picture of worst-case user experience than the arithmetic mean.

Percentile-based SLAs are the industry standard for defining and measuring API performance commitments. Cloud providers publish P99 latency guarantees in their SLAs. Load balancers and APM tools like Datadog, New Relic, and AWS CloudWatch all expose P50, P90, P95, and P99 as first-class metrics. This tool lets you compute those values from raw response time data and visualize the distribution as a histogram to understand whether latency is uniformly distributed or driven by occasional extreme outliers.

When to Use This Tool

How It Works

The calculator sorts the input values in ascending order, then uses linear interpolation to compute each percentile. For a percentile P over N sorted values, the position is calculated as (P/100) * (N-1). If this position falls exactly on an integer index, that value is returned directly; if it falls between two indices, the result is linearly interpolated between the two surrounding values. This is the same algorithm used by NumPy's percentile() function and most statistics libraries. The histogram divides the value range into up to 25 buckets and counts how many values fall in each, visualizing whether the distribution is normal, right-skewed (common in API latency), or bimodal (often indicating two distinct populations of requests).

Frequently Asked Questions

What is P99 latency?

P99 (99th percentile) latency means that 99% of requests complete faster than this value — or equivalently, only 1 in 100 requests takes longer. If your P99 is 500ms, then the slowest 1% of requests each take more than 500ms. P99 is the industry standard metric for API performance SLAs because it captures tail latency: the worst-case experience that your slowest users encounter. Using the average instead of P99 can dramatically underestimate how slow the experience feels for real users during traffic spikes or contention events.

What is the difference between P50 and P99?

P50 (the median) represents the typical user experience — exactly half of all requests are faster and half are slower. P99 represents the worst-case experience for the slowest 1% of users. A healthy, well-optimized API has a small ratio between P50 and P99; a ratio of 2x to 5x is common for database-backed APIs. A very large gap — for example P50=20ms and P99=2000ms — almost always indicates an intermittent problem: database connection pool exhaustion, garbage collection pauses, lock contention, or cold cache misses that hit only occasionally but very hard when they do.

What is a healthy P99 latency for a REST API?

A healthy P99 depends heavily on the type of operation. For read-only endpoints backed by an in-memory cache or Redis, P99 under 50ms is achievable and expected. For database-backed read queries on tables of reasonable size, P99 under 200ms is a common target. For write operations that involve database transactions, email sending, or external API calls, P99 under 500ms is generally acceptable. Any synchronous user-facing endpoint with P99 above 1 second warrants immediate investigation. The P50-to-P99 ratio is often more informative than the absolute number: a large ratio signals that something is failing intermittently.