Rate Limits

The Finetuning.ai API enforces rate limits to ensure fair usage and service stability.

Limits

Endpoint	Limit
`POST /v1/generations`	10 requests per minute per user
All other endpoints	60 requests per minute per user
Active API keys per account	5

Rate limit headers

Every API response includes headers showing your current rate limit status:

Header	Description
`X-RateLimit-Limit`	Max requests allowed per window
`X-RateLimit-Remaining`	Requests remaining in current window
`Retry-After`	Seconds to wait (only on `429` responses)

Handling rate limits

When you exceed the rate limit, the API returns 429 Too Many Requests. Read endpoints surface RATE_LIMITED; POST /v1/generations surfaces GENERATION_RATE_LIMITED.

{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Too many requests on a read endpoint — wait and retry"
  }
}

{
  "error": {
    "code": "GENERATION_RATE_LIMITED",
    "message": "Too many POST /v1/generations calls — wait and retry"
  }
}

Retry strategy

Use exponential backoff with jitter:

async function fetchWithRetry(url, options, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, options);

    if (response.status !== 429) return response;

    const retryAfter = response.headers.get('Retry-After');
    const delay = retryAfter
      ? parseInt(retryAfter) * 1000
      : Math.min(1000 * Math.pow(2, attempt) + Math.random() * 1000, 30000);

    await new Promise(r => setTimeout(r, delay));
  }

  throw new Error('Max retries exceeded');
}

Best practices

Prefer webhooks over polling — Set a webhook URL on POST /v1/generations and we'll deliver the result to your endpoint instead of you polling for it. See Webhooks.
Cache responses — Don't re-fetch data that hasn't changed
Use sensible polling intervals — If you do poll, check generation status every 2–5 seconds, not continuously
Batch requests — If listing generations, use larger limit values instead of many small requests
Use the bulk endpoints — Bulk delete and the playlist endpoints accept up to 100 IDs per request, and a bulk call counts as one request against the limit
Handle 429s gracefully — Always implement retry logic with backoff

Limits

Rate limit headers

Handling rate limits

Retry strategy

Best practices

On this page