Finetuning.aiFinetuning.ai

Rate Limits

API rate limits, response headers, and retry strategies

The Finetuning.ai API enforces rate limits to ensure fair usage and service stability.

Limits

EndpointLimit
POST /v1/generations10 requests per minute per user
All other endpoints60 requests per minute per user
Active API keys per account5

Rate limit headers

Every API response includes headers showing your current rate limit status:

HeaderDescription
X-RateLimit-LimitMax requests allowed per window
X-RateLimit-RemainingRequests remaining in current window
Retry-AfterSeconds to wait (only on 429 responses)

Handling rate limits

When you exceed the rate limit, the API returns 429 Too Many Requests. Read endpoints surface RATE_LIMITED; POST /v1/generations surfaces GENERATION_RATE_LIMITED.

{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Too many requests on a read endpoint — wait and retry"
  }
}
{
  "error": {
    "code": "GENERATION_RATE_LIMITED",
    "message": "Too many POST /v1/generations calls — wait and retry"
  }
}

Retry strategy

Use exponential backoff with jitter:

async function fetchWithRetry(url, options, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, options);

    if (response.status !== 429) return response;

    const retryAfter = response.headers.get('Retry-After');
    const delay = retryAfter
      ? parseInt(retryAfter) * 1000
      : Math.min(1000 * Math.pow(2, attempt) + Math.random() * 1000, 30000);

    await new Promise(r => setTimeout(r, delay));
  }

  throw new Error('Max retries exceeded');
}

Best practices

  • Prefer webhooks over polling — Set a webhook URL on POST /v1/generations and we'll deliver the result to your endpoint instead of you polling for it. See Webhooks.
  • Cache responses — Don't re-fetch data that hasn't changed
  • Use sensible polling intervals — If you do poll, check generation status every 2–5 seconds, not continuously
  • Batch requests — If listing generations, use larger limit values instead of many small requests
  • Use the bulk endpointsBulk delete and the playlist endpoints accept up to 100 IDs per request, and a bulk call counts as one request against the limit
  • Handle 429s gracefully — Always implement retry logic with backoff

On this page