Rate Limits

LatentKit applies rate limits to protect the platform and ensure fair use across workspaces.

What to expect

Limits may apply at the edge before requests reach origin
Rate-limited responses use standard HTTP error semantics with JSON bodies where applicable
Include backoff and jitter when retrying after 429 or retryable 5xx responses

Request size limits

Alongside rate limits, LatentKit enforces payload size limits and rejects oversized input early:

Oversized file uploads return 413 with a structured JSON error. The limit is enforced while the upload streams, so a missing or dishonest Content-Length header does not bypass it.
Queue requests have a separate payload cap — see Queue.
Remote audio/image URL fetches have modality-specific byte caps and require Content-Length — see Audio and STT and Vision.

A 413 is never retryable with the same payload; reduce the input size instead.

Best practices

Run LatentKit calls server-side so you can centralize retries and logging
Propagate X-LK-Request-ID in your logs
Use queue endpoints for long-running batch work when appropriate

Plan limits

Workspace plans may enforce additional fair-use or billing limits beyond HTTP rate limits. Budget and plan errors return typed JSON — see Error handling.

What to expect

Request size limits

Best practices

Plan limits

On this page