⚖️ API

API Rate Limit Calculator

Input your rate limit policy and expected traffic pattern to calculate burst headroom, requests-per-second capacity, safe concurrency and retry-after timing. Great for API design and client implementation planning.

⚙️ Rate Limit Configuration

Request Limit

Time Window

Your Expected Requests per Minute

📊 Analysis

Usage (your traffic vs limit)

🕐 Traffic vs Limit (per-minute view, 10 windows)

Now+10 min

📋 Standard Rate Limit Response Headers

💡 Client-Side Strategies

Sliding vs Fixed Window: Most APIs use fixed windows (counter resets at exact intervals). Some use sliding windows (rolling average). With fixed windows, a burst at the end of one window + start of next can effectively double your rate — always implement backoff even when under the limit.

📖 How to Use This Tool

▼

Enter limit and time window

Enter your expected request rate

View burst headroom and safe RPS

Try presets for GitHub, Stripe, OpenAI

📝 Examples

GitHub

Input: 5000/hr, 40/min

Output: Safe RPS:1.11, 83% used

Rules of Thumb Worth Baking Into Your Gateway

Budget to roughly 80% of the published limit, never 100%: Vendor-published ceilings are often enforced with clock skew or a slightly different rolling window than documented, so a client that runs flat against the stated number will see intermittent 429s that are maddening to reproduce in a test environment.
Retry with jittered exponential backoff, never a fixed delay: If every client waits exactly the Retry-After value and then retries, you recreate the identical spike one window later. Randomizing the wait by roughly ±20% spreads retries out instead of synchronizing them into a second wave.
Pick the limiting algorithm to match your traffic shape, not the easiest one to implement: A fixed window counter is simplest but allows a 2x burst across a window boundary; if your clients send legitimately bursty traffic, a token bucket with a properly sized burst allowance usually serves users better than simply lowering the average rate.
Return rate-limit headers on every response, not just on 429s: Well-behaved clients use X-RateLimit-Remaining and X-RateLimit-Reset to self-throttle before they ever hit the wall — omitting them on successful responses means clients only find out they're close to the limit after they've already crossed it.

The Retry Storm That Took Down a Checkout Flow

A representative, illustrative scenario: an e-commerce platform's checkout service calls a third-party payment API with a documented limit of 100 requests per minute. During a flash sale, traffic briefly exceeds that limit and a batch of requests gets 429 responses. Every client instance was configured to retry failed payment calls after a fixed 2-second delay — a reasonable-looking default that nobody had stress-tested against a real spike. All the throttled requests retry at almost exactly the same moment, recreate the exact same burst against the payment API two seconds later, get throttled again, and retry again two seconds after that. The payment provider's own abuse detection sees a sustained flood from one client ID and temporarily blocks it entirely, turning a brief legitimate traffic spike into a 20-minute checkout outage. The fix that ended the incident wasn't raising the rate limit — it was replacing the fixed 2-second retry with jittered exponential backoff, so retries spread out across several seconds instead of hitting the same instant.

Where the 429 Status and Rate-Limit Headers Come From

The HTTP 429 Too Many Requests status code is formally defined in RFC 6585, which extended the original HTTP/1.1 status code registry specifically to give servers a standard way to signal rate limiting — before RFC 6585 existed in 2012, APIs had to overload other codes like 403 or 503 for the same purpose, which made client-side handling ambiguous. The Retry-After header it's typically paired with comes from the base HTTP specification and predates rate limiting entirely; it was originally designed for 503 Service Unavailable responses. The X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers this calculator displays are not part of a single ratified RFC — they emerged as a de facto convention popularized by early API providers like Twitter and GitHub, and have since been formalized in the IETF's draft RateLimit header fields specification, which several major API gateways now implement natively.

How Rate Limits Behave Once Traffic Actually Spikes

A rate limit that looks generous against average traffic can still fail under real-world burstiness, because average request rate and peak request rate are different numbers entirely. A service averaging 200 requests per minute might legitimately spike to 600 requests in a 10-second window when a batch job runs or a cache expires across many clients simultaneously — a fixed window counter sized only for the average will reject that entire burst, while a token bucket with burst capacity sized for the actual peak lets it through without raising the sustained average limit at all. This is why the calculator's burst headroom figure matters more at scale than the raw requests-per-second number: as traffic grows, the gap between average and peak tends to widen rather than shrink, and a limit configuration that worked fine at low volume can start rejecting legitimate traffic the moment usage patterns become bursty rather than smooth.

Frequently Asked Questions

What is API rate limiting?

API rate limiting is a traffic control mechanism that restricts how many requests a client or user can send to an API within a defined time window. It protects backend services from being overwhelmed by excessive traffic — whether from a malfunctioning client, a denial-of-service attack, or legitimate but unexpectedly high demand. Rate limits are also used to implement fair-use quotas in multi-tenant SaaS products, ensuring that one customer's traffic spike cannot degrade the experience for other customers. Common implementations include fixed window counters, sliding window logs, token bucket algorithms, and leaky bucket algorithms, each with different characteristics around burst tolerance and fairness.

What does a 429 status code mean?

HTTP status code 429 Too Many Requests indicates that the client has sent more requests than allowed by the rate limit policy within the current time window. The server should include a Retry-After header indicating how many seconds the client should wait before retrying, and optionally X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers providing additional context about the limit. Clients receiving a 429 should implement exponential backoff with jitter: wait for the Retry-After period, then double the wait on each subsequent failure (up to a maximum), and add a small random jitter to prevent all clients from retrying simultaneously when the window resets — which would recreate the original spike.

What is the difference between token bucket and fixed window rate limiting?

A fixed window counter tracks the number of requests within a hard time boundary (e.g. from 14:00:00 to 14:01:00) and resets to zero at the window boundary. This is simple to implement but has a boundary burst vulnerability: a client can send the full limit at 13:59:59 and again immediately at 14:00:00, effectively sending 2x the limit in a two-second span. A token bucket maintains a pool of tokens that refills at a steady rate — each request consumes a token, and burst capacity is naturally limited by the bucket size. This allows short legitimate bursts while smoothing sustained traffic. A sliding window log tracks the exact timestamp of each request and enforces the limit over a true rolling window, eliminating boundary bursts at the cost of higher memory usage per client. Most production API gateways use token bucket or sliding window counters because they provide fairer behavior under real traffic patterns.