Troubleshooting
Diagnose and fix common errors returned by the LLMAI API.
HTTP 401 — Authentication Failed
What it means: The server couldn't validate your API key. This happens when the key is absent, malformed, or has been deleted.
How to fix it:
- Confirm the
Authorizationheader is present and spelled correctly - Make sure there is no extra whitespace around the key value
- If the key may have been deleted or rotated, generate a new one at console.llmai.dev/keys
# The header must look exactly like this
-H "Authorization: Bearer sk-llmaai-your-key-here"HTTP 402 — Account Balance Depleted
What it means: Your prepaid credit has run out. The API will block all requests until you reload.
How to fix it: Add credit at console.llmai.dev/billing. Requests resume working immediately — there is no waiting period after a top-up.
HTTP 404 — Model Not Recognized
What it means: The value in your model field doesn't match any model the API knows about.
How to fix it: Copy the slug exactly from the Models reference. Slugs are case-sensitive — lowercase only.
// Wrong — uppercase
{ "model": "GPT-5.4" }
// Correct — lowercase slug from the docs
{ "model": "gpt-5.4" }HTTP 400 — Invalid Request
What it means: The request body is malformed, missing a required field, or contains an out-of-range value.
Common causes:
messagesarray is absent or emptytemperatureis set outside the 0–2 range- A message object is missing
roleorcontent - JSON is syntactically invalid (unclosed braces, trailing commas)
How to fix it: Validate your payload against the API Reference.
HTTP 429 — Rate Limit Hit
What it means: You are sending requests faster than the server allows.
How to fix it: Slow down your request rate and implement exponential backoff:
import time
def call_with_backoff(client, **kwargs):
delay = 1
for attempt in range(6):
try:
return client.chat.completions.create(**kwargs)
except Exception as exc:
if "429" not in str(exc):
raise
time.sleep(delay)
delay *= 2
raise RuntimeError("Gave up after repeated rate limit errors")Streaming Responses Not Arriving
What it means: You set stream: true but your client isn't reading the SSE stream correctly.
How to fix it: Use a streaming-aware HTTP client or the OpenAI SDK's streaming helper:
stream = client.chat.completions.create(
model="gemini-3-flash-preview",
messages=[{"role": "user", "content": "Tell me a short story."}],
stream=True,
)
for chunk in stream:
text = chunk.choices[0].delta.content
if text:
print(text, end="", flush=True)Output Gets Cut Off
What it means: The model stopped generating because it hit the max_tokens limit you set (or the model's built-in limit).
How to fix it: Raise max_tokens or remove the parameter to let the model use its full default limit.
Still Stuck?
If none of the above resolves your problem, reach out via the community or the console:
- Telegram community: Join us on Telegram — fastest way to get help from the team and other users
- Discord community: Join us on Discord — chat with the team and other developers
- Console: console.llmai.dev
- Contact form: llmai.dev/contact
When writing in, please include the complete error response body, the model slug you were targeting, and the approximate time of the failed request. This speeds up diagnosis significantly.