Rate Limits
API rate limits, response headers, and retry strategies
The Finetuning.ai API enforces rate limits to ensure fair usage and service stability.
Limits
| Endpoint | Limit |
|---|---|
POST /v1/generations | 10 requests per minute per user |
| All other endpoints | 60 requests per minute per user |
| Active API keys per account | 5 |
Rate limit headers
Every API response includes headers showing your current rate limit status:
| Header | Description |
|---|---|
X-RateLimit-Limit | Max requests allowed per window |
X-RateLimit-Remaining | Requests remaining in current window |
Retry-After | Seconds to wait (only on 429 responses) |
Handling rate limits
When you exceed the rate limit, the API returns 429 Too Many Requests. Read endpoints surface RATE_LIMITED; POST /v1/generations surfaces GENERATION_RATE_LIMITED.
{
"error": {
"code": "RATE_LIMITED",
"message": "Too many requests on a read endpoint — wait and retry"
}
}{
"error": {
"code": "GENERATION_RATE_LIMITED",
"message": "Too many POST /v1/generations calls — wait and retry"
}
}Retry strategy
Use exponential backoff with jitter:
async function fetchWithRetry(url, options, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await fetch(url, options);
if (response.status !== 429) return response;
const retryAfter = response.headers.get('Retry-After');
const delay = retryAfter
? parseInt(retryAfter) * 1000
: Math.min(1000 * Math.pow(2, attempt) + Math.random() * 1000, 30000);
await new Promise(r => setTimeout(r, delay));
}
throw new Error('Max retries exceeded');
}Best practices
- Prefer webhooks over polling — Set a
webhookURL onPOST /v1/generationsand we'll deliver the result to your endpoint instead of you polling for it. See Webhooks. - Cache responses — Don't re-fetch data that hasn't changed
- Use sensible polling intervals — If you do poll, check generation status every 2–5 seconds, not continuously
- Batch requests — If listing generations, use larger
limitvalues instead of many small requests - Use the bulk endpoints — Bulk delete and the playlist endpoints accept up to 100 IDs per request, and a bulk call counts as one request against the limit
- Handle 429s gracefully — Always implement retry logic with backoff