Troubleshooting

HTTP 401 — Authentication Failed

What it means: The server couldn't validate your API key. This happens when the key is absent, malformed, or has been deleted.

How to fix it:

Confirm the Authorization header is present and spelled correctly
Make sure there is no extra whitespace around the key value
If the key may have been deleted or rotated, generate a new one at console.llmai.dev/keys

# The header must look exactly like this
-H "Authorization: Bearer sk-llmaai-your-key-here"

HTTP 402 — Account Balance Depleted

What it means: Your prepaid credit has run out. The API will block all requests until you reload.

How to fix it: Add credit at console.llmai.dev/billing. Requests resume working immediately — there is no waiting period after a top-up.

HTTP 404 — Model Not Recognized

What it means: The value in your model field doesn't match any model the API knows about.

How to fix it: Copy the slug exactly from the Models reference. Slugs are case-sensitive — lowercase only.

// Wrong — uppercase
{ "model": "Kimi-K2.6" }

// Correct — lowercase slug from the docs
{ "model": "kimi-k2.6" }

HTTP 400 — Invalid Request

What it means: The request body is malformed, missing a required field, or contains an out-of-range value.

Common causes:

messages array is absent or empty
temperature is set outside the 0–2 range
A message object is missing role or content
JSON is syntactically invalid (unclosed braces, trailing commas)

How to fix it: Validate your payload against the API Reference.

HTTP 429 — Rate Limit Hit

What it means: You are sending requests faster than the server allows.

How to fix it: Slow down your request rate and implement exponential backoff:

import time

def call_with_backoff(client, **kwargs):
    delay = 1
    for attempt in range(6):
        try:
            return client.chat.completions.create(**kwargs)
        except Exception as exc:
            if "429" not in str(exc):
                raise
            time.sleep(delay)
            delay *= 2
    raise RuntimeError("Gave up after repeated rate limit errors")

Streaming Responses Not Arriving

What it means: You set stream: true but your client isn't reading the SSE stream correctly.

How to fix it: Use a streaming-aware HTTP client or the OpenAI SDK's streaming helper:

stream = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "Tell me a short story."}],
    stream=True,
)
for chunk in stream:
    text = chunk.choices[0].delta.content
    if text:
        print(text, end="", flush=True)

Output Gets Cut Off

What it means: The model stopped generating because it hit the max_tokens limit you set (or the model's built-in limit).

How to fix it: Raise max_tokens or remove the parameter to let the model use its full default limit.

Still Stuck?

If none of the above resolves your problem, reach out via the community or the console:

Telegram community: Join us on Telegram — fastest way to get help from the team and other users
Discord community: Join us on Discord — chat with the team and other developers
Console: console.llmai.dev
Contact form: llmai.dev/contact

When writing in, please include the complete error response body, the model slug you were targeting, and the approximate time of the failed request. This speeds up diagnosis significantly.

On this page