Rate Limit Intelligence

Apidepth tracks 429 responses and captures rate limit quota headers to give you live burn-down forecasts — before your quota runs out.

How it works

Rate Limit Intelligence is split into two layers that work together:

429 frequency tracking — the collector aggregates status = 429 responses hourly per vendor and endpoint. No SDK changes required; this works as soon as the SDK is installed.
Burn-down prediction — the SDK extracts rate limit quota headers from every response and sends them to the collector. The collector maintains a live quota snapshot per vendor and projects how long your remaining quota will last at the current burn rate.

Header extraction

The SDK reads rate limit headers inline with each outgoing request. Three canonical fields are extracted:

Field	Meaning
rl_remaining	Requests left in the current rate limit window
rl_limit	Total quota for the window
rl_reset_at	When the window resets (epoch milliseconds, normalised from any vendor format)

Vendor header coverage

Headers are checked in priority order — the first matching header wins per field.

Vendor	Remaining	Limit	Reset
OpenAI	x-ratelimit-remaining-requests	x-ratelimit-limit-requests	x-ratelimit-reset-requests
Anthropic	x-ratelimit-remaining-requests	x-ratelimit-limit-requests	x-ratelimit-reset-requests
GitHub	x-ratelimit-remaining	x-ratelimit-limit	x-ratelimit-reset
HubSpot / Fastly	ratelimit-remaining	ratelimit-limit	ratelimit-reset
Stripe (429 only)	—	—	retry-after

Reset format normalisation

rl_reset_atis always stored as epoch milliseconds, regardless of the vendor's original format. The SDK normalises three formats:

Format	Example	Vendors
Unix timestamp (seconds)	1716000000	GitHub, most IETF-draft APIs
Seconds from now	30	Stripe Retry-After, generic 429 responses
OpenAI duration string	1m30s, 20ms, 2h	OpenAI, Anthropic

Burn-down forecasts

The collector stores one quota snapshot per (customer, vendor, environment) tuple. On every event batch, the most recent rate limit data for each vendor is upserted into this snapshot (older data is never written over newer).

Burn rate is calculated over the last 5 minutes of event traffic:

burn_rate = events_last_5min / 5.0 # requests per minute

minutes_to_throttle = rl_remaining / burn_rate

The dashboard shows a risk badge based on projected time to throttle:

High — under 10 minutes
Medium — 10–30 minutes
Low — over 30 minutes

Burn-down alerts

When minutes_to_throttle drops below 20 minutes, Apidepth sends an email alert to your account address. Alerts have a 30-minute cooldown per vendor to prevent flooding.

Alert email example:

[Apidepth] openai quota nearly exhausted — ~8 min remaining

Your openai API quota will be exhausted in approximately ~8 minutes at the current burn rate.

Quota: 340 of 10,000 remaining (window resets at 03:15 UTC)
Current burn rate: 42.0 requests/min (measured over last 5 min)