Streaming is a technique that delivers generated tokens from a large language model (LLM) to a client as soon as they are produced, instead of waiting for the entire completion to finish. In real‑time chat applications this means the user sees the assistant’s answer appear word by word, creating a far more interactive experience. For developers building a Next.js chat UI, streaming eliminates the noticeable lag that can make a conversation feel robotic.
Introduction to Streaming Anthropic Responses with LLM Resayil
Streaming is a technique that delivers generated tokens from a large language model (LLM) to a client as soon as they are produced, instead of waiting for the entire completion to finish. In real‑time chat applications this means the user sees the assistant’s answer appear word by word, creating a far more interactive experience. For developers building a Next.js chat UI, streaming eliminates the noticeable lag that can make a conversation feel robotic.
LLM Resayil (https://llm.resayil.io) acts as an Anthropic‑compatible gateway that sits on top of a catalog of 39 models. Because it implements the same /v1/messages contract that the Anthropic API uses, you can swap the endpoint URL and keep using the official Anthropic SDK without code changes. The gateway also supports streaming, function calling, and tool use, all billed on a simple pay‑per‑use credit model. Hosting is in the USA, and payments are accepted via Stripe or PayPal in USD. This article walks you through everything you need to stream Anthropic‑style responses from LLM Resayil in a Next.js project.
Quick Comparison
| Feature | LLM Resayil | Direct Anthropic API | |---|---|---| | OpenAI compatibility | ✅ | ❌ | | Anthropic compatibility | ✅ | ✅ | | Streaming responses | ✅ | ✅ | | Function calling | ✅ | ✅ | | Tool use | ✅ | ✅ | | Pay‑per‑use credits | ✅ (USD) | ✅ (USD) | | Integrated SDKs | OpenAI SDK, Anthropic SDK, Python, JavaScript, cURL, LangChain, LiteLLM, n8n | Anthropic SDK, OpenAI SDK | | Hosting location | USA | US regions (varies) | | Payment methods | Stripe, PayPal | Stripe |
What LLM Resayil Offers
LLM Resayil delivers an Anthropic‑compatible API that includes all the modern features developers expect: streaming token delivery, function calling, and tool use. Because the service is also OpenAI compatible, you can reuse existing OpenAI SDK code when you need to switch providers. The platform supports multi‑language generation, with a special emphasis on Arabic language support, making it a versatile choice for global applications.
All models are accessed through a single endpoint suite (/v1/chat/completions, /v1/messages, etc.), and the pay‑per‑use pricing model means you only pay for the tokens you actually generate. Billing is handled in USD via Stripe or PayPal, and the service is hosted in the USA for low latency to North American users. Integrations with popular developer tools like LangChain, LiteLLM, n8n, and the standard SDKs make it easy to embed LLM Resayil into any stack.
Setting Up LLM Resayil in a Next.js Project
- Create a Next.js app (if you don’t have one already):
npx create-next-app@latest my‑chat‑app cd my‑chat‑app - Install the Anthropic SDK (the SDK works because LLM Resayil follows the same contract). You can also use the generic OpenAI SDK, but the Anthropic SDK gives you built‑in streaming helpers.
npm install @anthropic-ai/sdk - Add your LLM Resayil API key to a
.env.localfile. The gateway expects the standardANTHROPIC_API_KEYheader.ANTHROPIC_API_KEY=sk‑your‑resayil‑key‑here RESAYIL_BASE_URL=https://llm.resayil.io - Create a thin wrapper that points the SDK at the Resayil base URL. The Anthropic SDK allows you to override the base URL via the
baseURLoption.// lib/resayilClient.js import Anthropic from "@anthropic-ai/sdk"; const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY, baseURL: process.env.RESAYIL_BASE_URL, // points to LLM Resayil }); export default client; - Test the connection with a simple request in a Next.js API route.
Visiting// pages/api/health.js import client from "../../lib/resayilClient"; export default async function handler(req, res) { try { const health = await fetch(`${process.env.RESAYIL_BASE_URL}/v1/health`); const data = await health.json(); res.status(200).json(data); } catch (e) { res.status(500).json({ error: e.message }); } }/api/healthshould return a JSON payload confirming the service is up.
With these steps you have a fully authenticated, Anthropic‑compatible client ready to stream responses.
Using the /v1/messages Endpoint for Streaming
The Anthropic‑compatible /v1/messages endpoint is the preferred entry point for chat‑style interactions. To receive a live token stream you set the stream flag to true in the request body. Below is a complete example using the Anthropic SDK’s messages.create method.
// lib/streamChat.js
import client from "./resayilClient";
export async function streamChat(messages, onToken) {
const response = await client.messages.create({
model: "deepseek-v4-flash", // any slug from the catalog works
max_tokens: 1024,
stream: true,
messages,
});
// The SDK returns an async iterator when stream:true
for await (const chunk of response) {
// Each chunk contains a partial delta of the assistant's output
if (chunk.type === "content_block_delta") {
const token = chunk.delta?.text ?? "";
onToken(token);
}
}
}
If you prefer a lower‑level fetch call, the same request can be made with the Web Streams API:
export async function fetchStream(messages, onToken) {
const resp = await fetch(`${process.env.RESAYIL_BASE_URL}/v1/messages`, {
method: "POST",
headers: {
"Content-Type": "application/json",
"x-api-key": process.env.ANTHROPIC_API_KEY,
},
body: JSON.stringify({
model: "deepseek-v4-flash",
max_tokens: 1024,
stream: true,
messages,
}),
});
const reader = resp.body.getReader();
const decoder = new TextDecoder("utf-8");
let done = false;
while (!done) {
const { value, done: streamDone } = await reader.read();
done = streamDone;
if (value) {
const chunk = decoder.decode(value);
// Anthropic streams Server‑Sent Events (SSE) – each line starts with "data:"
const lines = chunk.split("\n").filter(l => l.startsWith("data:"));
for (const line of lines) {
const json = JSON.parse(line.replace(/^data:\s*/, ""));
if (json.type === "content_block_delta") {
onToken(json.delta?.text ?? "");
}
}
}
}
}
Both approaches respect the streaming feature listed in the LLM Resayil feature set and will deliver tokens to the client as soon as they are generated.
Handling Streaming Responses in Next.js Server Components and API Routes
Option 1 – Proxy the Stream via an API Route
Create a route that forwards the LLM Resayil stream directly to the browser. This keeps the API key on the server and lets the client treat the response as a regular ReadableStream.
// pages/api/chat/stream.js
import { fetchStream } from "../../../lib/streamChat"; // using the fetch version above
export const config = {
runtime: "edge", // enables streaming on Vercel Edge Functions
};
export default async function handler(req) {
const { messages } = await req.json();
const { readable, writable } = new TransformStream();
const writer = writable.getWriter();
// Start streaming from Resayil and pipe each token to the client
fetchStream(messages, (token) => {
writer.write(new TextEncoder().encode(token));
}).finally(() => writer.close());
return new Response(readable, {
headers: {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache",
Connection: "keep-alive",
},
});
}
The client can then use the native fetch API with ReadableStream to consume the tokens.
Option 2 – Stream Directly in a Server Component (Next.js 13+)
Next.js server components can return a ReadableStream that is rendered on the fly. Below is a minimal example.
// app/chat/stream.tsx
import { fetchStream } from "@/lib/streamChat";
export default async function ChatStream({ messages }: { messages: any[] }) {
const { readable, writable } = new TransformStream();
const writer = writable.getWriter();
fetchStream(messages, (token) => {
writer.write(new TextEncoder().encode(token));
}).finally(() => writer.close());
return (
<div>
<pre>
{/* The stream is consumed by the browser as it arrives */}
{await readable.getReader().read().then((r) => new TextDecoder().decode(r.value))}
</pre>
</div>
);
}
Error handling – both patterns should catch network errors and forward a JSON error payload with an appropriate HTTP status. Back‑pressure is automatically managed by the TransformStream API; the writer will pause when the client’s buffer is full, preventing memory blow‑up.
Integrating Streaming into a Next.js Chat UI
On the front end, you typically maintain a message list in React state and append new tokens as they arrive. Here is a simple hook that consumes the streaming API route created above.
// hooks/useChatStream.ts
import { useState, useCallback } from "react";
export function useChatStream() {
const [messages, setMessages] = useState<Array<{ role: string; content: string }>>([]);
const [isLoading, setIsLoading] = useState(false);
const sendMessage = useCallback(async (userText: string) => {
setIsLoading(true);
const userMsg = { role: "user", content: userText };
setMessages((prev) => [...prev, userMsg]);
const response = await fetch("/api/chat/stream", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ messages: [...messages, userMsg] }),
});
const reader = response.body?.getReader();
const decoder = new TextDecoder();
let assistantContent = "";
while (reader) {
const { value, done } = await reader.read();
if (done) break;
assistantContent += decoder.decode(value);
// Update UI with the partial content
setMessages((prev) => {
const last = prev[prev.length - 1];
if (last?.role === "assistant") {
// replace the last assistant message with updated content
return [...prev.slice(0, -1), { role: "assistant", content: assistantContent }];
}
return [...prev, { role: "assistant", content: assistantContent }];
});
}
setIsLoading(false);
}, [messages]);
return { messages, sendMessage, isLoading };
}
In a component you can render the messages list and call sendMessage on form submit. Because the UI updates after each token, users see a typing effect that feels natural.
Advanced Scenarios – Function Calling & Tool Use
LLM Resayil supports function calling and tool use while streaming. When you include a tools array in the request payload, the model can emit tool_use deltas in the same stream. Your client logic can watch for type: "tool_use" chunks, invoke the corresponding server‑side function, and feed the result back into the conversation. This enables powerful patterns such as real‑time data lookup, code execution, or image generation without breaking the streaming flow.
Pricing and Billing Considerations for Streaming Usage
LLM Resayil follows a pay‑per‑use credits model billed in USD. You purchase credits via Stripe or PayPal; there are no hidden subscription fees. Streaming does not add extra cost – you are charged only for the tokens that are generated, whether they are sent in a single response or streamed token‑by‑token.
Ready to try Resayil LLM API?
Start FreeTo see the current rates, call the /v1/pricing endpoint or visit the Pricing page. The endpoint returns a JSON object with per‑1k‑token pricing for each model. Because the same token count is used for streaming and non‑streaming calls, you can predict costs accurately.
Troubleshooting Common Streaming Issues
| Symptom | Likely Cause | Fix |
|---|---|---|
| Stream stops after a few tokens | Network timeout or server‑side limit | Increase the timeout on the client (fetch signal) and ensure the model’s max_tokens is high enough. |
| No tokens arrive, only a 401 error | Invalid or missing API key | Verify ANTHROPIC_API_KEY in .env.local and that the key belongs to your Resayil account. |
| Empty response body | Service outage | Call /v1/health to confirm the gateway is up. If it returns unhealthy, wait or contact support. |
| Partial JSON parsing errors | Improper handling of SSE lines | Make sure you split on \n and filter lines that start with data: before JSON.parse. |
For persistent problems, check the Health endpoint and consult the Resayil docs. Implement exponential back‑off retries for transient network glitches.
FAQ
Q: How do I enable streaming when calling the Anthropic‑compatible /v1/messages endpoint on LLM Resayil?
A: Set the stream parameter to true in the request body. The response will be a Server‑Sent Events (SSE) stream that delivers token deltas as they are generated.
Q: Can I use the official Anthropic SDK with LLM Resayil for streaming?
A: Yes. LLM Resayil is Anthropic compatible, so you only need to change the SDK’s base URL to https://llm.resayil.io. All streaming methods provided by the Anthropic SDK work unchanged.
Q: Does LLM Resayil support streaming with function calling or tool use?
A: Absolutely. Both function calling and tool use are supported features, and they work seamlessly with streaming responses. You will receive tool_use deltas in the same SSE stream.
Q: How do I handle streaming responses in Next.js API routes?
A: Use a route handler that creates a TransformStream, pipes the token chunks from LLM Resayil into the writable side, and returns the readable side as the HTTP response. This proxies the live stream to the client while keeping your API key secure.
Q: What payment methods are accepted for streaming usage on LLM Resayil?
A: Payments are processed via Stripe or PayPal. Billing is in USD and follows a simple pay‑per‑use credit model.
Why LLM Resayil Wins for Real‑Time Next.js Chats
When building a real‑time chat app, you need three things: low‑latency streaming, a familiar SDK, and predictable pricing. LLM Resayil gives you all of these while also adding OpenAI compatibility, Arabic language support, and a wide catalog of 39 models. Because the gateway is hosted in the USA and integrates with Stripe and PayPal, developers in North America experience fast network routes and straightforward billing. The ability to use the official Anthropic SDK means you can adopt streaming with minimal code changes, and the built‑in support for function calling and tool use lets you grow the chat’s capabilities over time.
What You Get by Using LLM Resayil
- Anthropic‑compatible streaming – identical request shape to the official API.
- Pay‑per‑use credits in USD, billed via Stripe or PayPal.
- Full SDK support – Anthropic, OpenAI, Python, JavaScript, cURL, plus integrations like LangChain and LiteLLM.
- Multi‑language generation with special emphasis on Arabic.\nAll of this is delivered from a USA‑hosted service that guarantees reliability and low latency.
Code Example: Streaming a Chat with deepseek-v4-flash
import client from "./lib/resayilClient"; // Anthropic SDK configured for Resayil
async function chatStream() {
const messages = [{ role: "user", content: "Explain quantum entanglement in simple terms." }];
const response = await client.messages.create({
model: "deepseek-v4-flash", // catalog slug
max_tokens: 1024,
stream: true,
messages,
});
for await (const chunk of response) {
if (chunk.type === "content_block_delta") {
process.stdout.write(chunk.delta?.text ?? "");
}
}
}
chatStream();
Running this script prints the assistant’s answer token‑by‑token, demonstrating the live streaming capability.
Call to Action
Ready to add real‑time LLM responses to your Next.js app? Register for an API key, check the Pricing page for credit rates, and dive into the Docs for full integration guides. Start building smarter, faster, and more interactive chat experiences today!