$LLMAI Docs

API Reference

Complete reference for the LLMAI API — endpoint, authentication, request format, and response structure.

Endpoint

Every request goes to:

https://api.llmai.dev/v1

LLMAI implements the OpenAI Chat Completions interface, so any library or tooling built for that format works out of the box — just swap the base URL and API key.

Authentication

Attach your LLMAI key to every request using the Authorization header:

Authorization: Bearer YOUR_API_KEY

Keys are generated in the developer console. A missing or expired key returns 401 Unauthorized.

Compatibility Summary

ParameterValue
Base URLhttps://api.llmai.dev/v1
Key formatsk-llmaai-...
Chat endpoint/v1/chat/completions
Model list endpoint/v1/models
Wire protocolOpenAI Chat Completions

Sending a Chat Request

Required Fields

{
  "model": "gpt-5.4",
  "messages": [
    { "role": "system", "content": "You are a precise technical writer." },
    { "role": "user",   "content": "Describe API rate limits in two sentences." }
  ]
}
FieldTypeDescription
modelstringExact model slug — see the Models page
messagesarrayList of message objects, each with role and content

Optional Fields

FieldTypeDefaultNotes
temperaturefloat1.0Controls output randomness; valid range 0–2
max_tokensintmodel maxHard cap on tokens generated in this response
streamboolfalseSet true to receive incremental SSE chunks
top_pfloat1.0Nucleus sampling — alternative to temperature

Response Structure

{
  "id": "chatcmpl-xyz789",
  "object": "chat.completion",
  "created": 1714000000,
  "model": "gpt-5.4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Rate limits cap the number of requests per unit time..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 30,
    "completion_tokens": 18,
    "total_tokens": 48
  }
}

The usage block reflects the exact token counts billed for this call.

Streaming

Pass "stream": true to receive a stream of server-sent events (SSE). The connection stays open and each chunk arrives as a data: line:

curl https://api.llmai.dev/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-3-flash-preview",
    "messages": [{"role": "user", "content": "List five programming languages."}],
    "stream": true
  }'

The stream terminates with a final data: [DONE] line. Handle it in your client to know when the response is complete.

Error Reference

StatusMeaningAction
400Malformed request bodyCheck JSON syntax and required fields
401Invalid or missing API keyVerify the Authorization header
402Insufficient balanceAdd credit at console.llmai.dev/billing
404Unknown model slugCross-check slug spelling in the Models reference
429Request rate exceededReduce request frequency or add retry backoff
500Upstream errorRetry; open a support ticket if it persists

On this page