LLM Resayil now supports the Anthropic Messages API format alongside the existing OpenAI-compatible endpoint.
Use POST /v1/messages with the same API keys, same credits, and same backend models.
If you already use the Anthropic SDK, point it at our base URL and start making requests immediately.
The Anthropic Messages API is an alternative request/response format for interacting with LLM Resayil models.
While our OpenAI-compatible /v1/chat/completions endpoint remains fully supported, the Messages API
provides a familiar interface for developers already using the Anthropic ecosystem.
message_start, content_block_delta, etc.).Note: You do not need separate API keys for the Messages API. Your existing LLM Resayil API keys
work with both /v1/chat/completions and /v1/messages.
The Messages API accepts the same authentication methods as all other LLM Resayil endpoints.
You can use either the standard Authorization: Bearer header or the Anthropic-style x-api-key header.
Authorization: Bearer YOUR_API_KEY
x-api-key: YOUR_API_KEY
Both methods are equivalent. The x-api-key header is supported for compatibility with the official
Anthropic SDK, which sends credentials this way by default.
Anthropic SDK users: The SDK sends x-api-key automatically. You only need to
configure the base_url and provide your LLM Resayil API key as the api_key parameter.
Send a POST request to /v1/messages with a JSON body
containing your model selection and messages.
POST https://llmapi.resayil.io/v1/messages
curl -X POST https://llmapi.resayil.io/v1/messages \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "mistral",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "What is the capital of France?"}
]
}'
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "The capital of France is Paris."
}
],
"model": "mistral",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 14,
"output_tokens": 9
}
}
The full request body specification for POST /v1/messages.
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | The model to use (e.g., "mistral", "llama3.1"). See Available Models. |
messages |
array | Yes | Array of message objects with role and content fields. |
max_tokens |
integer | Yes | Maximum number of tokens to generate. Must be greater than 0. |
system |
string | No | System prompt. In the Anthropic format, the system message is a top-level field, not part of the messages array. |
temperature |
float | No | Sampling temperature between 0.0 and 1.0. Default: model-dependent. |
top_p |
float | No | Nucleus sampling parameter. Default: model-dependent. |
top_k |
integer | No | Top-K sampling parameter. Only sample from the top K most likely tokens. |
stop_sequences |
array | No | Array of strings that will cause the model to stop generating when encountered. |
stream |
boolean | No | Set to true for streaming responses via SSE. Default: false. |
tools |
array | No | Array of tool definitions for function calling. See Tool Calling. |
tool_choice |
object | No | Controls tool selection: {"type": "auto"}, {"type": "any"}, or {"type": "tool", "name": "..."}. |
Unlike the OpenAI format where the system prompt is a message with "role": "system",
the Anthropic format uses a top-level system field:
{
"model": "mistral",
"max_tokens": 1024,
"system": "You are a helpful assistant who speaks formally.",
"messages": [
{"role": "user", "content": "Greet me."}
]
}
{
"model": "llama3.1",
"max_tokens": 2048,
"messages": [
{"role": "user", "content": "What is machine learning?"},
{"role": "assistant", "content": "Machine learning is a subset of AI..."},
{"role": "user", "content": "Can you give me a practical example?"}
]
}
Non-streaming responses return a complete message object.
| Field | Type | Description |
|---|---|---|
id |
string | Unique message identifier (prefixed with msg_). |
type |
string | Always "message". |
role |
string | Always "assistant". |
content |
array | Array of content blocks. Each block has a type field ("text" or "tool_use"). |
model |
string | The model that generated the response. |
stop_reason |
string | Why generation stopped: "end_turn", "max_tokens", "stop_sequence", or "tool_use". |
stop_sequence |
string|null | The stop sequence that triggered stopping, if applicable. |
usage |
object | Token usage: input_tokens and output_tokens. |
{
"id": "msg_01A2B3C4D5E6F7G8H9I0J1K2",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Here is a Python function that calculates factorial:\n\n```python\ndef factorial(n):\n if n <= 1:\n return 1\n return n * factorial(n - 1)\n```"
}
],
"model": "mistral",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 22,
"output_tokens": 48
}
}
Set "stream": true to receive the response as a stream of server-sent events (SSE).
Each event has an event type and a data payload.
curl -X POST https://llmapi.resayil.io/v1/messages \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "mistral",
"max_tokens": 1024,
"stream": true,
"messages": [
{"role": "user", "content": "Write a haiku about coding."}
]
}'
| Event | Description |
|---|---|
message_start |
Initial event containing the message object with metadata (id, model, role, usage). |
content_block_start |
Indicates a new content block is starting. Contains index and content_block with its type. |
content_block_delta |
Incremental text content. Contains delta.text with the new text fragment. |
content_block_stop |
Indicates the current content block has finished. |
message_delta |
Final message metadata update. Contains stop_reason and updated usage. |
message_stop |
Signals the end of the streamed message. |
ping |
Keep-alive event. Can be safely ignored. |
event: message_start
data: {"type":"message_start","message":{"id":"msg_01ABC","type":"message","role":"assistant","content":[],"model":"mistral","stop_reason":null,"usage":{"input_tokens":15,"output_tokens":0}}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: ping
data: {"type":"ping"}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Lines"}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" of"}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" code"}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":12}}
event: message_stop
data: {"type":"message_stop"}
The Messages API supports tool (function) calling using the Anthropic tool-use format.
Define tools in your request, and the model may respond with a tool_use content block.
{
"model": "mistral",
"max_tokens": 1024,
"tools": [
{
"name": "get_weather",
"description": "Get the current weather for a given location.",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country, e.g. London, UK"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["location"]
}
}
],
"messages": [
{"role": "user", "content": "What is the weather in Kuwait City?"}
]
}
When the model decides to use a tool, the response includes a tool_use content block:
{
"id": "msg_01XYZ",
"type": "message",
"role": "assistant",
"content": [
{
"type": "tool_use",
"id": "toolu_01ABC",
"name": "get_weather",
"input": {
"location": "Kuwait City, Kuwait",
"unit": "celsius"
}
}
],
"stop_reason": "tool_use",
"usage": {"input_tokens": 85, "output_tokens": 42}
}
Execute the tool on your end and send the result back:
{
"model": "mistral",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "What is the weather in Kuwait City?"},
{
"role": "assistant",
"content": [
{
"type": "tool_use",
"id": "toolu_01ABC",
"name": "get_weather",
"input": {"location": "Kuwait City, Kuwait", "unit": "celsius"}
}
]
},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_01ABC",
"content": "Currently 38C and sunny in Kuwait City."
}
]
}
]
}
Errors from the Messages API follow the Anthropic error format:
{
"type": "error",
"error": {
"type": "invalid_request_error",
"message": "max_tokens: field required"
}
}
| HTTP Status | Error Type | Description |
|---|---|---|
400 |
invalid_request_error |
Malformed request body, missing required fields, or invalid parameter values. |
401 |
authentication_error |
Missing or invalid API key. |
403 |
permission_error |
Your API key does not have permission for the requested action. |
404 |
not_found_error |
The requested model does not exist or is not available. |
429 |
rate_limit_error |
Too many requests. Wait and retry with exponential backoff. |
500 |
api_error |
Internal server error. Retry the request or contact support. |
529 |
overloaded_error |
The API is temporarily overloaded. Retry after a brief wait. |
Retry strategy: For 429 and 529 errors, implement exponential backoff
starting at 1 second. For 500 errors, retry up to 3 times before failing.
Both endpoints coexist on the same backend. Here is a side-by-side comparison:
| Feature | OpenAI Format | Anthropic Format |
|---|---|---|
| Endpoint | /v1/chat/completions |
/v1/messages |
| HTTP Method | POST | POST |
| Auth Header | Authorization: Bearer |
x-api-key or Authorization: Bearer |
| System Message | In messages array: {"role": "system", ...} |
Top-level "system" field |
| max_tokens | Optional | Required |
| Response Content | choices[0].message.content (string) |
content[0].text (array of blocks) |
| Stop Reason | finish_reason: "stop", "length" |
stop_reason: "end_turn", "max_tokens" |
| Usage Tokens | prompt_tokens / completion_tokens |
input_tokens / output_tokens |
| Streaming Format | SSE with data: {} lines |
SSE with typed event: + data: |
| Tool Calling | functions / tools parameter |
tools with input_schema |
| Credit System | Same credits | Same credits |
| API Keys | Same keys | Same keys |
Install the official Anthropic SDK and point it at LLM Resayil:
pip install anthropic
import anthropic
client = anthropic.Anthropic(
api_key="YOUR_API_KEY",
base_url="https://llmapi.resayil.io/v1"
)
message = client.messages.create(
model="mistral",
max_tokens=1024,
messages=[
{"role": "user", "content": "Explain quantum computing in simple terms."}
]
)
print(message.content[0].text)
import anthropic
client = anthropic.Anthropic(
api_key="YOUR_API_KEY",
base_url="https://llmapi.resayil.io/v1"
)
with client.messages.stream(
model="mistral",
max_tokens=1024,
messages=[
{"role": "user", "content": "Write a short story about AI."}
]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic({
apiKey: 'YOUR_API_KEY',
baseURL: 'https://llmapi.resayil.io/v1'
});
const message = await client.messages.create({
model: 'mistral',
max_tokens: 1024,
messages: [
{ role: 'user', content: 'What are the benefits of TypeScript?' }
]
});
console.log(message.content[0].text);
curl -X POST https://llmapi.resayil.io/v1/messages \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "llama3.1",
"max_tokens": 2048,
"system": "You are a senior software architect. Answer concisely.",
"messages": [
{"role": "user", "content": "When should I use microservices vs monolith?"}
]
}'
What is currently supported and what is not in our Messages API implementation:
| Feature | Status | Notes |
|---|---|---|
| Text messages | Supported | Full support for text input/output. |
| System prompt | Supported | Top-level system field. |
| Multi-turn conversations | Supported | Alternating user/assistant messages. |
| Streaming (SSE) | Supported | All standard event types. |
| Tool calling | Supported | Tool definitions, tool_use responses, tool_result messages. |
| Temperature / top_p / top_k | Supported | Standard sampling parameters. |
| Stop sequences | Supported | Custom stop strings. |
| Image input (vision) | Not Supported | Image content blocks are not accepted. Use text-only messages. |
| PDF input | Not Supported | Document content blocks are not accepted. |
| Prompt caching | Not Supported | The cache_control parameter is ignored. |
| Batch API | Not Supported | Use individual requests. |
| Extended thinking | Not Supported | The thinking parameter is not available. |
If you already use the Anthropic SDK, switching to LLM Resayil requires minimal changes.
Set the following environment variables for your application:
ANTHROPIC_BASE_URL=https://llmapi.resayil.io/v1
ANTHROPIC_API_KEY=sk-your-llm-resayil-api-key
Credits are deducted identically whether you use the OpenAI or Anthropic endpoint.
Token usage reported in the response (input_tokens and output_tokens)
maps directly to credit consumption based on the model tier:
Check your credit balance on the dashboard or via the Credits documentation.
If you are migrating from the official Anthropic API:
base_url to https://llmapi.resayil.io/v1.model to one of our available models (e.g., "mistral", "llama3.1").cache_control, image blocks).https://llmdev.resayil.io/v1.That's it. Your existing code, error handling, and streaming logic all work unchanged. The response format is identical to what the official Anthropic API returns.
| Problem | Cause | Solution |
|---|---|---|
| "max_tokens: field required" | Missing max_tokens in request body |
Add "max_tokens": 1024 (required for Messages API, unlike OpenAI). |
| 401 authentication_error | Invalid or missing API key | Check that x-api-key or Authorization: Bearer is set correctly. |
| "model not found" | Using an Anthropic model name like "claude-3-sonnet" |
Use LLM Resayil model names. See Available Models. |
| Empty response content | max_tokens set too low |
Increase max_tokens to give the model room to respond. |
| Streaming not working | Missing "stream": true or incorrect SSE parsing |
Ensure stream is a boolean (not string). Parse event: and data: lines. |
| Tool result not accepted | Mismatched tool_use_id |
The tool_use_id in your tool_result must match the id from the tool_use response. |
Yes. The OpenAI /v1/chat/completions and Anthropic /v1/messages endpoints
coexist independently. You can call either one using the same API key. Credits are shared across both.
No. Your existing LLM Resayil API key works for both endpoints. No additional configuration is needed.
All models available on LLM Resayil work with both endpoints. The model names are the same
(e.g., "mistral", "llama3.1"). Note that Anthropic-specific model names
like "claude-3-sonnet" are not available.
No. Credits are calculated identically regardless of which endpoint you use. The same model with the same token count costs the same credits.
Not currently. Only text content blocks are supported. Image and PDF input will be rejected with an error.
It is recommended but not strictly required. If omitted, the API defaults to the latest supported version.
For production use, pin to 2023-06-01 for consistent behavior.