Rate Limit Intelligence

Apidepth tracks 429 responses and captures rate limit quota headers to give you live burn-down forecasts — before your quota runs out.

How it works

Rate Limit Intelligence is split into two layers that work together:

  1. 429 frequency tracking — the collector aggregates status = 429 responses hourly per vendor and endpoint. No SDK changes required; this works as soon as the SDK is installed.
  2. Burn-down prediction — the SDK extracts rate limit quota headers from every response and sends them to the collector. The collector maintains a live quota snapshot per vendor and projects how long your remaining quota will last at the current burn rate.

Header extraction

The SDK reads rate limit headers inline with each outgoing request. Three canonical fields are extracted:

FieldMeaning
rl_remainingRequests left in the current rate limit window
rl_limitTotal quota for the window
rl_reset_atWhen the window resets (epoch milliseconds, normalised from any vendor format)

Vendor header coverage

Headers are checked in priority order — the first matching header wins per field.

VendorRemainingLimitReset
OpenAIx-ratelimit-remaining-requestsx-ratelimit-limit-requestsx-ratelimit-reset-requests
Anthropicx-ratelimit-remaining-requestsx-ratelimit-limit-requestsx-ratelimit-reset-requests
GitHubx-ratelimit-remainingx-ratelimit-limitx-ratelimit-reset
HubSpot / Fastlyratelimit-remainingratelimit-limitratelimit-reset
Stripe (429 only)retry-after

Reset format normalisation

rl_reset_atis always stored as epoch milliseconds, regardless of the vendor's original format. The SDK normalises three formats:

FormatExampleVendors
Unix timestamp (seconds)1716000000GitHub, most IETF-draft APIs
Seconds from now30Stripe Retry-After, generic 429 responses
OpenAI duration string1m30s, 20ms, 2hOpenAI, Anthropic

Burn-down forecasts

The collector stores one quota snapshot per (customer, vendor, environment) tuple. On every event batch, the most recent rate limit data for each vendor is upserted into this snapshot (older data is never written over newer).

Burn rate is calculated over the last 5 minutes of event traffic:

burn_rate = events_last_5min / 5.0 # requests per minute
minutes_to_throttle = rl_remaining / burn_rate

The dashboard shows a risk badge based on projected time to throttle:

  • High — under 10 minutes
  • Medium — 10–30 minutes
  • Low — over 30 minutes

Burn-down alerts

When minutes_to_throttle drops below 20 minutes, Apidepth sends an email alert to your account address. Alerts have a 30-minute cooldown per vendor to prevent flooding.

Alert email example:

[Apidepth] openai quota nearly exhausted — ~8 min remaining

Your openai API quota will be exhausted in approximately ~8 minutes at the current burn rate.

  • Quota: 340 of 10,000 remaining (window resets at 03:15 UTC)
  • Current burn rate: 42.0 requests/min (measured over last 5 min)