Rate Limits
API rate limits, response headers, and retry strategies
The Finetuning.ai API enforces rate limits to ensure fair usage and service stability.
Limits
- 60 requests per minute per user
- 5 active API keys per account
Rate limit headers
Every API response includes headers showing your current rate limit status:
| Header | Description |
|---|---|
X-RateLimit-Limit | Max requests allowed per window |
X-RateLimit-Remaining | Requests remaining in current window |
Retry-After | Seconds to wait (only on 429 responses) |
Handling rate limits
When you exceed the rate limit, the API returns 429 Too Many Requests:
{
"error": {
"code": "RATE_LIMITED",
"message": "Too many requests — wait and retry"
}
}Retry strategy
Use exponential backoff with jitter:
async function fetchWithRetry(url, options, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await fetch(url, options);
if (response.status !== 429) return response;
const retryAfter = response.headers.get('Retry-After');
const delay = retryAfter
? parseInt(retryAfter) * 1000
: Math.min(1000 * Math.pow(2, attempt) + Math.random() * 1000, 30000);
await new Promise(r => setTimeout(r, delay));
}
throw new Error('Max retries exceeded');
}Best practices
- Cache responses — Don't re-fetch data that hasn't changed
- Use polling intervals — When checking generation status, poll every 2–5 seconds, not continuously
- Batch requests — If listing generations, use larger
limitvalues instead of many small requests - Handle 429s gracefully — Always implement retry logic with backoff