Connect LLM Resayil to your n8n workflows for automated AI processing. This guide covers setup, streaming configuration, timeout tuning for large models, error handling patterns, and production best practices.
LLM Resayil is OpenAI-compatible — use the HTTP Request node in n8n to call it directly.
Configure the node with these settings:
Method: POST
URL: https://llmapi.resayil.io/v1/chat/completions
Authentication: Header Auth
Header Name: Authorization
Header Value: Bearer YOUR_API_KEY
Content-Type: application/json
Switch body mode to JSON and use this payload:
{
"model": "qwen3:14b",
"messages": [
{
"role": "user",
"content": "Summarize: {{ $json.text }}"
}
],
"max_tokens": 1000,
"temperature": 0.7
}
The AI response is at {{ $json.choices[0].message.content }} in the next node.
Tip: Get your API key from the Dashboard under API Keys. Each key has its own usage tracking.
Streaming sends tokens as they're generated, reducing perceived latency. Essential for large models that take time to produce the first token.
Add "stream": true to your request body:
{
"model": "deepseek-v3.2",
"messages": [
{"role": "user", "content": "Write a detailed analysis..."}
],
"stream": true,
"max_tokens": 4000
}
Stream300000 ms (5 min) for standard models600000 ms (10 min)Important: n8n's default HTTP timeout may be shorter than what large models need. Always set the timeout explicitly in the node Options before assuming the model is too slow.
Larger models need more time for first-token generation. Our API supports up to 10 minutes for the largest models. Set your n8n timeout accordingly.
| Model Size | Examples | n8n Timeout (ms) | Notes |
|---|---|---|---|
| Small (3B–14B) | llama3.2:3b, qwen3:14b, mistral:7b |
120,000 | Fast — default timeout is fine |
| Medium (24B–32B) | mistral-small3.2:24b, qwen3-vl:32b |
180,000 | Slight delay on first token |
| Large (70B–120B) | llama3.1:70b, gpt-oss:120b, devstral-2:123b |
300,000 | Enable streaming recommended |
| XL (200B+) | llama3.1:405b, deepseek-v3.1:671b, qwen3.5:397b |
600,000 | Always use streaming, 30–60s first-token wait |
Rule of thumb: Use streaming + 600s timeout for any model over 100B. The first-token wait can be 30–60 seconds, but tokens flow fast once generation starts.
LLM Resayil enforces per-minute and per-burst limits. Exceeding either returns 429 Too Many Requests.
| Tier | Per Minute | Burst (per 10s) |
|---|---|---|
| Basic | 10 requests | 5 requests |
| Pro | 30 requests | 12 requests |
| Enterprise | 60 requests | 25 requests |
Burst protection: Even with 30 req/min on Pro, sending 15 requests in 2 seconds triggers the burst limit. Space requests with a Wait node (2–3s delay) between calls.
See the full Rate Limits documentation for response headers and backoff strategies.
Build resilient workflows by handling common error codes. Use n8n's Retry on Fail option or add explicit error handling nodes.
| Status | Meaning | n8n Action |
|---|---|---|
200 | Success | Process response normally |
401 | Invalid API key | Check Authorization header value |
429 | Rate limit exceeded | Add Wait node (10–30s), then retry |
503 | Service unavailable | Retry with backoff (30s, 60s, 120s) |
| Timeout | Response too slow | Increase timeout or use smaller model |
In the HTTP Request node settings:
qwen3:14b or llama3.2:3b — fast, cheap (0.5x credits)deepseek-v3.2 or llama3.1:70bqwen2.5-coder:14b or devstral-2:123bmax_tokens to limit response length — saves credits on long outputschoices[0].message.content exists and is non-emptyComplete HTTP Request node configuration for a summarization workflow:
{
"model": "qwen3:14b",
"messages": [
{
"role": "system",
"content": "You are a professional summarizer. Output only the summary."
},
{
"role": "user",
"content": "Summarize in 3 bullet points:\n\n{{ $json.input_text }}"
}
],
"max_tokens": 500,
"temperature": 0.3
}
Set the timeout in HTTP Request node Options to 120000 ms for this 14B model. For larger models, increase proportionally.
| Problem | Cause | Solution |
|---|---|---|
| Timeout error | n8n timeout too short | Increase timeout (see table) |
| 429 Too Many Requests | Burst or per-minute limit hit | Add Wait node (10–30s) between calls |
| Empty response | Model returned no content | Check max_tokens is set, verify model name |
| 401 Unauthorized | Invalid or missing API key | Verify Authorization: Bearer KEY |
| Streaming not working | n8n response format wrong | Set Response Format to Stream in Options |
| Response cut off | Connection dropped mid-stream | Increase timeout, use stable network |
| High credit usage | Large model for simple task | Use 3B–14B models (0.5x multiplier) |
Need help? See Error Codes or contact us via WhatsApp.