LLM Resayil

Quick Start

LLM Resayil is OpenAI-compatible — use the HTTP Request node in n8n to call it directly.

1

Add an HTTP Request node

Configure the node with these settings:

HTTP Request Node

Method:         POST
URL:            https://llmapi.resayil.io/v1/chat/completions
Authentication: Header Auth
  Header Name:  Authorization
  Header Value: Bearer YOUR_API_KEY
Content-Type:   application/json

2

Set the JSON body

Switch body mode to JSON and use this payload:

Request Body

{
  "model": "qwen3:14b",
  "messages": [
    {
      "role": "user",
      "content": "Summarize: {{ $json.text }}"
    }
  ],
  "max_tokens": 1000,
  "temperature": 0.7
}

3

Extract the response

The AI response is at {{ $json.choices[0].message.content }} in the next node.

Tip: Get your API key from the Dashboard under API Keys. Each key has its own usage tracking.

Streaming Configuration

Streaming sends tokens as they're generated, reducing perceived latency. Essential for large models that take time to produce the first token.

Enable Streaming

Add "stream": true to your request body:

Streaming Request

{
  "model": "deepseek-v3.2",
  "messages": [
    {"role": "user", "content": "Write a detailed analysis..."}
  ],
  "stream": true,
  "max_tokens": 4000
}

n8n Node Settings for Streaming

In HTTP Request node Options, set Response Format to Stream
Set Timeout to at least 300000 ms (5 min) for standard models
For 200B+ models, set timeout to 600000 ms (10 min)

Important: n8n's default HTTP timeout may be shorter than what large models need. Always set the timeout explicitly in the node Options before assuming the model is too slow.

Timeout Recommendations

Larger models need more time for first-token generation. Our API supports up to 10 minutes for the largest models. Set your n8n timeout accordingly.

Model Size	Examples	n8n Timeout (ms)	Notes
Small (3B–14B)	`llama3.2:3b`, `qwen3:14b`, `mistral:7b`	120,000	Fast — default timeout is fine
Medium (24B–32B)	`mistral-small3.2:24b`, `qwen3-vl:32b`	180,000	Slight delay on first token
Large (70B–120B)	`llama3.1:70b`, `gpt-oss:120b`, `devstral-2:123b`	300,000	Enable streaming recommended
XL (200B+)	`llama3.1:405b`, `deepseek-v3.1:671b`, `qwen3.5:397b`	600,000	Always use streaming, 30–60s first-token wait

Rule of thumb: Use streaming + 600s timeout for any model over 100B. The first-token wait can be 30–60 seconds, but tokens flow fast once generation starts.

Rate Limits

LLM Resayil enforces per-minute and per-burst limits. Exceeding either returns 429 Too Many Requests.

Tier	Per Minute	Burst (per 10s)
Basic	10 requests	5 requests
Pro	30 requests	12 requests
Enterprise	60 requests	25 requests

Burst protection: Even with 30 req/min on Pro, sending 15 requests in 2 seconds triggers the burst limit. Space requests with a Wait node (2–3s delay) between calls.

See the full Rate Limits documentation for response headers and backoff strategies.

Error Handling

Build resilient workflows by handling common error codes. Use n8n's Retry on Fail option or add explicit error handling nodes.

Status	Meaning	n8n Action
`200`	Success	Process response normally
`401`	Invalid API key	Check Authorization header value
`429`	Rate limit exceeded	Add Wait node (10–30s), then retry
`503`	Service unavailable	Retry with backoff (30s, 60s, 120s)
Timeout	Response too slow	Increase timeout or use smaller model

Retry Configuration

In the HTTP Request node settings:

Retry on Fail: Enabled
Max Tries: 3
Wait Between Tries: 10,000 ms
Continue on Fail: Enable to handle errors gracefully downstream

Best Practices

Choose the Right Model

Simple tasks (summarization, classification): qwen3:14b or llama3.2:3b — fast, cheap (0.5x credits)
Complex reasoning (analysis, multi-step): deepseek-v3.2 or llama3.1:70b
Code generation: qwen2.5-coder:14b or devstral-2:123b

Optimize Costs

Set max_tokens to limit response length — saves credits on long outputs
Small models (3B–14B) use 0.5x–1x credits; XL models use 3.5x
Process batch items sequentially with delays, not in parallel

Workflow Design

Always set a timeout matching the model size (see table above)
Use streaming for responses longer than a few sentences
Don't fire parallel requests beyond your burst limit — use Split In Batches with delays
Validate responses — check that choices[0].message.content exists and is non-empty
Log errors to a Google Sheet or database for debugging

Example: Text Summarizer

Complete HTTP Request node configuration for a summarization workflow:

Full Node Config (JSON)

{
  "model": "qwen3:14b",
  "messages": [
    {
      "role": "system",
      "content": "You are a professional summarizer. Output only the summary."
    },
    {
      "role": "user",
      "content": "Summarize in 3 bullet points:\n\n{{ $json.input_text }}"
    }
  ],
  "max_tokens": 500,
  "temperature": 0.3
}

Set the timeout in HTTP Request node Options to 120000 ms for this 14B model. For larger models, increase proportionally.

Troubleshooting

Problem	Cause	Solution
Timeout error	n8n timeout too short	Increase timeout (see table)
429 Too Many Requests	Burst or per-minute limit hit	Add Wait node (10–30s) between calls
Empty response	Model returned no content	Check `max_tokens` is set, verify model name
401 Unauthorized	Invalid or missing API key	Verify `Authorization: Bearer KEY`
Streaming not working	n8n response format wrong	Set Response Format to Stream in Options
Response cut off	Connection dropped mid-stream	Increase timeout, use stable network
High credit usage	Large model for simple task	Use 3B–14B models (0.5x multiplier)

Need help? See Error Codes or contact us via WhatsApp.

n8n Integration Guide