LatentKit

Rate Limits

Fair-use limits and per-token rate limiting on the LatentKit API edge.

LatentKit applies rate limits to protect the platform and ensure fair use across workspaces.

What to expect

  • Limits may apply at the edge before requests reach origin
  • Rate-limited responses use standard HTTP error semantics with JSON bodies where applicable
  • Include backoff and jitter when retrying after 429 or retryable 5xx responses

IDE access tokens

Short-lived IDE access tokens (lkia_ prefix) follow separate per-token rate limits. Normal application API keys use the standard app key path documented in Authentication.

Best practices

  • Run LatentKit calls server-side so you can centralize retries and logging
  • Propagate X-LK-Request-ID in your logs
  • Use queue endpoints for long-running batch work when appropriate

Plan limits

Workspace plans may enforce additional fair-use or billing limits beyond HTTP rate limits. Budget and plan errors return typed JSON — see Error handling.

On this page