Skip to main content

Rate Limits

Control request rates, token budgets, and key expiry.

Overview

Rate limits protect your LLM spend and availability. All limits are enforced at the gateway before requests reach the provider.

Key Features

  • Per-key rate limits - Requests per minute, hour, or day
  • Token limits - Input and output token budgets per request or over time
  • API key expiration - Configurable epoch time for automatic key expiry
  • Response timeout - Maximum wait time to prevent hung requests from consuming resources

Configuration

SettingDescription
Requests per minuteMaximum API calls per minute per key
Requests per hourMaximum API calls per hour per key
Requests per dayMaximum API calls per day per key
Max input tokensMaximum input tokens per request
Max output tokensMaximum output tokens per request
Token budgetTotal token allowance over a time window
Key expirationEpoch timestamp after which the key is rejected
Response timeoutSeconds before a request is terminated