Rate Limit Intelligence
Apidepth tracks 429 responses and captures rate limit quota headers to give you live burn-down forecasts — before your quota runs out.
How it works
Rate Limit Intelligence is split into two layers that work together:
- 429 frequency tracking — the collector aggregates
status = 429responses hourly per vendor and endpoint. No SDK changes required; this works as soon as the SDK is installed. - Burn-down prediction — the SDK extracts rate limit quota headers from every response and sends them to the collector. The collector maintains a live quota snapshot per vendor and projects how long your remaining quota will last at the current burn rate.
Header extraction
The SDK reads rate limit headers inline with each outgoing request. Three canonical fields are extracted:
| Field | Meaning |
|---|---|
| rl_remaining | Requests left in the current rate limit window |
| rl_limit | Total quota for the window |
| rl_reset_at | When the window resets (epoch milliseconds, normalised from any vendor format) |
Vendor header coverage
Headers are checked in priority order — the first matching header wins per field.
| Vendor | Remaining | Limit | Reset |
|---|---|---|---|
| OpenAI | x-ratelimit-remaining-requests | x-ratelimit-limit-requests | x-ratelimit-reset-requests |
| Anthropic | x-ratelimit-remaining-requests | x-ratelimit-limit-requests | x-ratelimit-reset-requests |
| GitHub | x-ratelimit-remaining | x-ratelimit-limit | x-ratelimit-reset |
| HubSpot / Fastly | ratelimit-remaining | ratelimit-limit | ratelimit-reset |
| Stripe (429 only) | — | — | retry-after |
Reset format normalisation
rl_reset_atis always stored as epoch milliseconds, regardless of the vendor's original format. The SDK normalises three formats:
| Format | Example | Vendors |
|---|---|---|
| Unix timestamp (seconds) | 1716000000 | GitHub, most IETF-draft APIs |
| Seconds from now | 30 | Stripe Retry-After, generic 429 responses |
| OpenAI duration string | 1m30s, 20ms, 2h | OpenAI, Anthropic |
Burn-down forecasts
The collector stores one quota snapshot per (customer, vendor, environment) tuple. On every event batch, the most recent rate limit data for each vendor is upserted into this snapshot (older data is never written over newer).
Burn rate is calculated over the last 5 minutes of event traffic:
burn_rate = events_last_5min / 5.0 # requests per minuteminutes_to_throttle = rl_remaining / burn_rateThe dashboard shows a risk badge based on projected time to throttle:
- High — under 10 minutes
- Medium — 10–30 minutes
- Low — over 30 minutes
Burn-down alerts
When minutes_to_throttle drops below 20 minutes, Apidepth sends an email alert to your account address. Alerts have a 30-minute cooldown per vendor to prevent flooding.
Alert email example:
[Apidepth] openai quota nearly exhausted — ~8 min remaining
Your openai API quota will be exhausted in approximately ~8 minutes at the current burn rate.
- Quota: 340 of 10,000 remaining (window resets at 03:15 UTC)
- Current burn rate: 42.0 requests/min (measured over last 5 min)