Rate Limits
Fair-use limits and per-token rate limiting on the LatentKit API edge.
LatentKit applies rate limits to protect the platform and ensure fair use across workspaces.
What to expect
- Limits may apply at the edge before requests reach origin
- Rate-limited responses use standard HTTP error semantics with JSON bodies where applicable
- Include backoff and jitter when retrying after
429or retryable5xxresponses
IDE access tokens
Short-lived IDE access tokens (lkia_ prefix) follow separate per-token rate limits. Normal application API keys use the standard app key path documented in Authentication.
Best practices
- Run LatentKit calls server-side so you can centralize retries and logging
- Propagate
X-LK-Request-IDin your logs - Use queue endpoints for long-running batch work when appropriate
Plan limits
Workspace plans may enforce additional fair-use or billing limits beyond HTTP rate limits. Budget and plan errors return typed JSON — see Error handling.