Home Documentation n8n Integration

n8n Integration Guide

Connect LLM Resayil to your n8n workflows for automated AI processing. This guide covers setup, streaming configuration, timeout tuning for large models, error handling patterns, and production best practices.

Quick Start

LLM Resayil is OpenAI-compatible — use the HTTP Request node in n8n to call it directly.

1

Add an HTTP Request node

Configure the node with these settings:

HTTP Request Node
Method:         POST
URL:            https://llmapi.resayil.io/v1/chat/completions
Authentication: Header Auth
  Header Name:  Authorization
  Header Value: Bearer YOUR_API_KEY
Content-Type:   application/json
2

Set the JSON body

Switch body mode to JSON and use this payload:

Request Body
{
  "model": "qwen3:14b",
  "messages": [
    {
      "role": "user",
      "content": "Summarize: {{ $json.text }}"
    }
  ],
  "max_tokens": 1000,
  "temperature": 0.7
}
3

Extract the response

The AI response is at {{ $json.choices[0].message.content }} in the next node.

Tip: Get your API key from the Dashboard under API Keys. Each key has its own usage tracking.

Streaming Configuration

Streaming sends tokens as they're generated, reducing perceived latency. Essential for large models that take time to produce the first token.

Enable Streaming

Add "stream": true to your request body:

Streaming Request
{
  "model": "deepseek-v3.2",
  "messages": [
    {"role": "user", "content": "Write a detailed analysis..."}
  ],
  "stream": true,
  "max_tokens": 4000
}

n8n Node Settings for Streaming

  • In HTTP Request node Options, set Response Format to Stream
  • Set Timeout to at least 300000 ms (5 min) for standard models
  • For 200B+ models, set timeout to 600000 ms (10 min)

Important: n8n's default HTTP timeout may be shorter than what large models need. Always set the timeout explicitly in the node Options before assuming the model is too slow.

Timeout Recommendations

Larger models need more time for first-token generation. Our API supports up to 10 minutes for the largest models. Set your n8n timeout accordingly.

Model Size Examples n8n Timeout (ms) Notes
Small (3B–14B) llama3.2:3b, qwen3:14b, mistral:7b 120,000 Fast — default timeout is fine
Medium (24B–32B) mistral-small3.2:24b, qwen3-vl:32b 180,000 Slight delay on first token
Large (70B–120B) llama3.1:70b, gpt-oss:120b, devstral-2:123b 300,000 Enable streaming recommended
XL (200B+) llama3.1:405b, deepseek-v3.1:671b, qwen3.5:397b 600,000 Always use streaming, 30–60s first-token wait

Rule of thumb: Use streaming + 600s timeout for any model over 100B. The first-token wait can be 30–60 seconds, but tokens flow fast once generation starts.

Rate Limits

LLM Resayil enforces per-minute and per-burst limits. Exceeding either returns 429 Too Many Requests.

Tier Per Minute Burst (per 10s)
Basic10 requests5 requests
Pro30 requests12 requests
Enterprise60 requests25 requests

Burst protection: Even with 30 req/min on Pro, sending 15 requests in 2 seconds triggers the burst limit. Space requests with a Wait node (2–3s delay) between calls.

See the full Rate Limits documentation for response headers and backoff strategies.

Error Handling

Build resilient workflows by handling common error codes. Use n8n's Retry on Fail option or add explicit error handling nodes.

StatusMeaningn8n Action
200SuccessProcess response normally
401Invalid API keyCheck Authorization header value
429Rate limit exceededAdd Wait node (10–30s), then retry
503Service unavailableRetry with backoff (30s, 60s, 120s)
TimeoutResponse too slowIncrease timeout or use smaller model

Retry Configuration

In the HTTP Request node settings:

  • Retry on Fail: Enabled
  • Max Tries: 3
  • Wait Between Tries: 10,000 ms
  • Continue on Fail: Enable to handle errors gracefully downstream

Best Practices

Choose the Right Model

  • Simple tasks (summarization, classification): qwen3:14b or llama3.2:3b — fast, cheap (0.5x credits)
  • Complex reasoning (analysis, multi-step): deepseek-v3.2 or llama3.1:70b
  • Code generation: qwen2.5-coder:14b or devstral-2:123b

Optimize Costs

  • Set max_tokens to limit response length — saves credits on long outputs
  • Small models (3B–14B) use 0.5x–1x credits; XL models use 3.5x
  • Process batch items sequentially with delays, not in parallel

Workflow Design

  • Always set a timeout matching the model size (see table above)
  • Use streaming for responses longer than a few sentences
  • Don't fire parallel requests beyond your burst limit — use Split In Batches with delays
  • Validate responses — check that choices[0].message.content exists and is non-empty
  • Log errors to a Google Sheet or database for debugging

Example: Text Summarizer

Complete HTTP Request node configuration for a summarization workflow:

Full Node Config (JSON)
{
  "model": "qwen3:14b",
  "messages": [
    {
      "role": "system",
      "content": "You are a professional summarizer. Output only the summary."
    },
    {
      "role": "user",
      "content": "Summarize in 3 bullet points:\n\n{{ $json.input_text }}"
    }
  ],
  "max_tokens": 500,
  "temperature": 0.3
}

Set the timeout in HTTP Request node Options to 120000 ms for this 14B model. For larger models, increase proportionally.

Troubleshooting

ProblemCauseSolution
Timeout errorn8n timeout too shortIncrease timeout (see table)
429 Too Many RequestsBurst or per-minute limit hitAdd Wait node (10–30s) between calls
Empty responseModel returned no contentCheck max_tokens is set, verify model name
401 UnauthorizedInvalid or missing API keyVerify Authorization: Bearer KEY
Streaming not workingn8n response format wrongSet Response Format to Stream in Options
Response cut offConnection dropped mid-streamIncrease timeout, use stable network
High credit usageLarge model for simple taskUse 3B–14B models (0.5x multiplier)

Need help? See Error Codes or contact us via WhatsApp.