P99 (99th percentile) latency means that 99% of requests complete faster than this value. If your P99 is 500ms, only 1 in 100 requests takes longer than 500ms. P99 is the industry standard metric for API performance SLAs because it captures the tail latency that affects your slowest users.

📊 SRE

API Latency Percentile Calculator

Enter response times (one per line, in ms) to calculate P50, P90, P95, P99, P99.9 percentiles with a latency histogram. Essential for SLA planning and performance monitoring.

📖 How to Use

▼

Paste response times in milliseconds — one per line

P50, P90, P95, P99 percentiles calculate automatically

View the histogram to see the latency distribution shape

📝 Examples

Healthy API

12, 15, 18, 22, 25, 14, 16, 45, 120, 8

P50: 16ms, P95: 82ms, P99: 120ms

📝 Response Times (ms, one per line)

The Dashboard That Said Everything Was Fine

A believable version of a story most SRE teams have lived through: a checkout API's average response time dashboard sits steady at 180ms all week, well inside its target, while support tickets about "the site being slow sometimes" keep trickling in. Nobody can reproduce it on demand, and the average keeps insisting nothing is wrong. The gap gets explained the moment someone pulls the raw request-duration samples and computes percentiles instead of an average: P50 sits at 90ms, exactly as expected, but P99 is sitting at 4.2 seconds — meaning one request in every hundred is over twenty times slower than typical, and at real traffic volumes that's dozens of genuinely bad experiences every minute, each one invisible to a metric that averages the fast majority against the slow minority and reports a number that describes neither. The average wasn't lying, exactly — it just wasn't answering the question anyone actually cared about.

Percentiles, Averages, and the Number Nobody Should Trust

An average collapses an entire distribution into one number by treating every sample as equally informative, which is precisely the wrong assumption for latency data: a handful of very slow outliers get diluted into invisibility by the mass of fast requests around them, so the average systematically understates how bad the worst experiences actually are. The maximum has the opposite problem — a single garbage-collection pause or one cold-start request can produce a max that's ten times worse than anything a typical user will ever see, making it too noisy and unstable to build an SLA around. Percentiles sit between these two failure modes by design: P50 describes the median experience, P95 and P99 describe what your worst-treated users actually encounter, and because they're computed from the sorted distribution rather than summed and divided, a single extreme outlier can't distort the P50 or P90 the way it distorts an average.

Handling Real Latency Data With Care

Response-time samples are usually safe to paste as raw numbers, since a duration in milliseconds carries no identifying information on its own — but the surrounding context engineers paste alongside them sometimes does. A CSV exported straight from an APM tool can carry a request-ID, user-ID, or full URL column next to the duration column, and pasting that whole export into any browser-based tool — this one included — puts those identifiers into browser history and any screen-share happening at the time. This tool only needs the bare numeric values to compute percentiles correctly, so the safer habit is to strip everything but the duration column before pasting, the same discipline you'd apply before attaching a log excerpt to an external support ticket.

Getting the Same Numbers From a Terminal

The calculation this tool performs — sort the samples, then linearly interpolate at position (P/100) × (N-1) in the sorted array — is the identical algorithm behind NumPy's percentile() function, so a Python one-liner like python3 -c "import numpy as np; print(np.percentile(data, 99))" against the same dataset will reproduce this tool's P99 exactly. On the command line without Python available, awk or sort -n piped through a small script can approximate the same result for a quick check. At the infrastructure layer, Prometheus's histogram_quantile() function computes percentiles from pre-bucketed histogram data rather than raw samples, which scales to millions of requests per second but trades away exact precision for bucket-boundary approximation — this tool is for the exact, small-to-medium-sample calculation you'd do with a CSV of raw request durations pulled from a load test or an incident window, not for querying a live time-series database.

Frequently Asked Questions

What is P99 latency?

P99 (99th percentile) latency means that 99% of requests complete faster than this value — or equivalently, only 1 in 100 requests takes longer. If your P99 is 500ms, then the slowest 1% of requests each take more than 500ms. P99 is the industry standard metric for API performance SLAs because it captures tail latency: the worst-case experience that your slowest users encounter. Using the average instead of P99 can dramatically underestimate how slow the experience feels for real users during traffic spikes or contention events.

What is the difference between P50 and P99?

P50 (the median) represents the typical user experience — exactly half of all requests are faster and half are slower. P99 represents the worst-case experience for the slowest 1% of users. A healthy, well-optimized API has a small ratio between P50 and P99; a ratio of 2x to 5x is common for database-backed APIs. A very large gap — for example P50=20ms and P99=2000ms — almost always indicates an intermittent problem: database connection pool exhaustion, garbage collection pauses, lock contention, or cold cache misses that hit only occasionally but very hard when they do.

What is a healthy P99 latency for a REST API?

A healthy P99 depends heavily on the type of operation. For read-only endpoints backed by an in-memory cache or Redis, P99 under 50ms is achievable and expected. For database-backed read queries on tables of reasonable size, P99 under 200ms is a common target. For write operations that involve database transactions, email sending, or external API calls, P99 under 500ms is generally acceptable. Any synchronous user-facing endpoint with P99 above 1 second warrants immediate investigation. The P50-to-P99 ratio is often more informative than the absolute number: a large ratio signals that something is failing intermittently.