Streaming is a technique that delivers generated tokens from a large language model (LLM) to a client as soon as they are produced, instead of waiting for the entire completion to finish. In real‑time chat applications this means the user sees the assistant’s answer appear word by word, creating a far more interactive experience. For developers building a Next.js chat UI, streaming eliminates the noticeable lag that can make a conversation feel robotic.

Introduction to Streaming Anthropic Responses with LLM Resayil

Streaming is a technique that delivers generated tokens from a large language model (LLM) to a client as soon as they are produced, instead of waiting for the entire completion to finish. In real‑time chat applications this means the user sees the assistant’s answer appear word by word, creating a far more interactive experience. For developers building a Next.js chat UI, streaming eliminates the noticeable lag that can make a conversation feel robotic.

LLM Resayil (https://llm.resayil.io) acts as an Anthropic‑compatible gateway that sits on top of a catalog of 39 models. Because it implements the same /v1/messages contract that the Anthropic API uses, you can swap the endpoint URL and keep using the official Anthropic SDK without code changes. The gateway also supports streaming, function calling, and tool use, all billed on a simple pay‑per‑use credit model. Hosting is in the USA, and payments are accepted via Stripe or PayPal in USD. This article walks you through everything you need to stream Anthropic‑style responses from LLM Resayil in a Next.js project.


Quick Comparison

| Feature | LLM Resayil | Direct Anthropic API | |---|---|---| | OpenAI compatibility | ✅ | ❌ | | Anthropic compatibility | ✅ | ✅ | | Streaming responses | ✅ | ✅ | | Function calling | ✅ | ✅ | | Tool use | ✅ | ✅ | | Pay‑per‑use credits | ✅ (USD) | ✅ (USD) | | Integrated SDKs | OpenAI SDK, Anthropic SDK, Python, JavaScript, cURL, LangChain, LiteLLM, n8n | Anthropic SDK, OpenAI SDK | | Hosting location | USA | US regions (varies) | | Payment methods | Stripe, PayPal | Stripe |


What LLM Resayil Offers

LLM Resayil delivers an Anthropic‑compatible API that includes all the modern features developers expect: streaming token delivery, function calling, and tool use. Because the service is also OpenAI compatible, you can reuse existing OpenAI SDK code when you need to switch providers. The platform supports multi‑language generation, with a special emphasis on Arabic language support, making it a versatile choice for global applications.

All models are accessed through a single endpoint suite (/v1/chat/completions, /v1/messages, etc.), and the pay‑per‑use pricing model means you only pay for the tokens you actually generate. Billing is handled in USD via Stripe or PayPal, and the service is hosted in the USA for low latency to North American users. Integrations with popular developer tools like LangChain, LiteLLM, n8n, and the standard SDKs make it easy to embed LLM Resayil into any stack.


Setting Up LLM Resayil in a Next.js Project

  1. Create a Next.js app (if you don’t have one already):
    npx create-next-app@latest my‑chat‑app
    cd my‑chat‑app
    
  2. Install the Anthropic SDK (the SDK works because LLM Resayil follows the same contract). You can also use the generic OpenAI SDK, but the Anthropic SDK gives you built‑in streaming helpers.
    npm install @anthropic-ai/sdk
    
  3. Add your LLM Resayil API key to a .env.local file. The gateway expects the standard ANTHROPIC_API_KEY header.
    ANTHROPIC_API_KEY=sk‑your‑resayil‑key‑here
    RESAYIL_BASE_URL=https://llm.resayil.io
    
  4. Create a thin wrapper that points the SDK at the Resayil base URL. The Anthropic SDK allows you to override the base URL via the baseURL option.
    // lib/resayilClient.js
    import Anthropic from "@anthropic-ai/sdk";
    
    const client = new Anthropic({
      apiKey: process.env.ANTHROPIC_API_KEY,
      baseURL: process.env.RESAYIL_BASE_URL, // points to LLM Resayil
    });
    
    export default client;
    
  5. Test the connection with a simple request in a Next.js API route.
    // pages/api/health.js
    import client from "../../lib/resayilClient";
    
    export default async function handler(req, res) {
      try {
        const health = await fetch(`${process.env.RESAYIL_BASE_URL}/v1/health`);
        const data = await health.json();
        res.status(200).json(data);
      } catch (e) {
        res.status(500).json({ error: e.message });
      }
    }
    
    Visiting /api/health should return a JSON payload confirming the service is up.

With these steps you have a fully authenticated, Anthropic‑compatible client ready to stream responses.


Using the /v1/messages Endpoint for Streaming

The Anthropic‑compatible /v1/messages endpoint is the preferred entry point for chat‑style interactions. To receive a live token stream you set the stream flag to true in the request body. Below is a complete example using the Anthropic SDK’s messages.create method.

// lib/streamChat.js
import client from "./resayilClient";

export async function streamChat(messages, onToken) {
  const response = await client.messages.create({
    model: "deepseek-v4-flash", // any slug from the catalog works
    max_tokens: 1024,
    stream: true,
    messages,
  });

  // The SDK returns an async iterator when stream:true
  for await (const chunk of response) {
    // Each chunk contains a partial delta of the assistant's output
    if (chunk.type === "content_block_delta") {
      const token = chunk.delta?.text ?? "";
      onToken(token);
    }
  }
}

If you prefer a lower‑level fetch call, the same request can be made with the Web Streams API:

export async function fetchStream(messages, onToken) {
  const resp = await fetch(`${process.env.RESAYIL_BASE_URL}/v1/messages`, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "x-api-key": process.env.ANTHROPIC_API_KEY,
    },
    body: JSON.stringify({
      model: "deepseek-v4-flash",
      max_tokens: 1024,
      stream: true,
      messages,
    }),
  });

  const reader = resp.body.getReader();
  const decoder = new TextDecoder("utf-8");
  let done = false;
  while (!done) {
    const { value, done: streamDone } = await reader.read();
    done = streamDone;
    if (value) {
      const chunk = decoder.decode(value);
      // Anthropic streams Server‑Sent Events (SSE) – each line starts with "data:"
      const lines = chunk.split("\n").filter(l => l.startsWith("data:"));
      for (const line of lines) {
        const json = JSON.parse(line.replace(/^data:\s*/, ""));
        if (json.type === "content_block_delta") {
          onToken(json.delta?.text ?? "");
        }
      }
    }
  }
}

Both approaches respect the streaming feature listed in the LLM Resayil feature set and will deliver tokens to the client as soon as they are generated.


Handling Streaming Responses in Next.js Server Components and API Routes

Option 1 – Proxy the Stream via an API Route

Create a route that forwards the LLM Resayil stream directly to the browser. This keeps the API key on the server and lets the client treat the response as a regular ReadableStream.

// pages/api/chat/stream.js
import { fetchStream } from "../../../lib/streamChat"; // using the fetch version above

export const config = {
  runtime: "edge", // enables streaming on Vercel Edge Functions
};

export default async function handler(req) {
  const { messages } = await req.json();

  const { readable, writable } = new TransformStream();
  const writer = writable.getWriter();

  // Start streaming from Resayil and pipe each token to the client
  fetchStream(messages, (token) => {
    writer.write(new TextEncoder().encode(token));
  }).finally(() => writer.close());

  return new Response(readable, {
    headers: {
      "Content-Type": "text/event-stream",
      "Cache-Control": "no-cache",
      Connection: "keep-alive",
    },
  });
}

The client can then use the native fetch API with ReadableStream to consume the tokens.

Option 2 – Stream Directly in a Server Component (Next.js 13+)

Next.js server components can return a ReadableStream that is rendered on the fly. Below is a minimal example.

// app/chat/stream.tsx
import { fetchStream } from "@/lib/streamChat";

export default async function ChatStream({ messages }: { messages: any[] }) {
  const { readable, writable } = new TransformStream();
  const writer = writable.getWriter();

  fetchStream(messages, (token) => {
    writer.write(new TextEncoder().encode(token));
  }).finally(() => writer.close());

  return (
    <div>
      <pre>
        {/* The stream is consumed by the browser as it arrives */}
        {await readable.getReader().read().then((r) => new TextDecoder().decode(r.value))}
      </pre>
    </div>
  );
}

Error handling – both patterns should catch network errors and forward a JSON error payload with an appropriate HTTP status. Back‑pressure is automatically managed by the TransformStream API; the writer will pause when the client’s buffer is full, preventing memory blow‑up.


Integrating Streaming into a Next.js Chat UI

On the front end, you typically maintain a message list in React state and append new tokens as they arrive. Here is a simple hook that consumes the streaming API route created above.

// hooks/useChatStream.ts
import { useState, useCallback } from "react";

export function useChatStream() {
  const [messages, setMessages] = useState<Array<{ role: string; content: string }>>([]);
  const [isLoading, setIsLoading] = useState(false);

  const sendMessage = useCallback(async (userText: string) => {
    setIsLoading(true);
    const userMsg = { role: "user", content: userText };
    setMessages((prev) => [...prev, userMsg]);

    const response = await fetch("/api/chat/stream", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ messages: [...messages, userMsg] }),
    });

    const reader = response.body?.getReader();
    const decoder = new TextDecoder();
    let assistantContent = "";
    while (reader) {
      const { value, done } = await reader.read();
      if (done) break;
      assistantContent += decoder.decode(value);
      // Update UI with the partial content
      setMessages((prev) => {
        const last = prev[prev.length - 1];
        if (last?.role === "assistant") {
          // replace the last assistant message with updated content
          return [...prev.slice(0, -1), { role: "assistant", content: assistantContent }];
        }
        return [...prev, { role: "assistant", content: assistantContent }];
      });
    }
    setIsLoading(false);
  }, [messages]);

  return { messages, sendMessage, isLoading };
}

In a component you can render the messages list and call sendMessage on form submit. Because the UI updates after each token, users see a typing effect that feels natural.

Advanced Scenarios – Function Calling & Tool Use

LLM Resayil supports function calling and tool use while streaming. When you include a tools array in the request payload, the model can emit tool_use deltas in the same stream. Your client logic can watch for type: "tool_use" chunks, invoke the corresponding server‑side function, and feed the result back into the conversation. This enables powerful patterns such as real‑time data lookup, code execution, or image generation without breaking the streaming flow.


Pricing and Billing Considerations for Streaming Usage

LLM Resayil follows a pay‑per‑use credits model billed in USD. You purchase credits via Stripe or PayPal; there are no hidden subscription fees. Streaming does not add extra cost – you are charged only for the tokens that are generated, whether they are sent in a single response or streamed token‑by‑token.

Ready to try Resayil LLM API?

Start Free

To see the current rates, call the /v1/pricing endpoint or visit the Pricing page. The endpoint returns a JSON object with per‑1k‑token pricing for each model. Because the same token count is used for streaming and non‑streaming calls, you can predict costs accurately.


Troubleshooting Common Streaming Issues

| Symptom | Likely Cause | Fix | |---|---|---| | Stream stops after a few tokens | Network timeout or server‑side limit | Increase the timeout on the client (fetch signal) and ensure the model’s max_tokens is high enough. | | No tokens arrive, only a 401 error | Invalid or missing API key | Verify ANTHROPIC_API_KEY in .env.local and that the key belongs to your Resayil account. | | Empty response body | Service outage | Call /v1/health to confirm the gateway is up. If it returns unhealthy, wait or contact support. | | Partial JSON parsing errors | Improper handling of SSE lines | Make sure you split on \n and filter lines that start with data: before JSON.parse. |

For persistent problems, check the Health endpoint and consult the Resayil docs. Implement exponential back‑off retries for transient network glitches.


FAQ

Q: How do I enable streaming when calling the Anthropic‑compatible /v1/messages endpoint on LLM Resayil?

A: Set the stream parameter to true in the request body. The response will be a Server‑Sent Events (SSE) stream that delivers token deltas as they are generated.

Q: Can I use the official Anthropic SDK with LLM Resayil for streaming?

A: Yes. LLM Resayil is Anthropic compatible, so you only need to change the SDK’s base URL to https://llm.resayil.io. All streaming methods provided by the Anthropic SDK work unchanged.

Q: Does LLM Resayil support streaming with function calling or tool use?

A: Absolutely. Both function calling and tool use are supported features, and they work seamlessly with streaming responses. You will receive tool_use deltas in the same SSE stream.

Q: How do I handle streaming responses in Next.js API routes?

A: Use a route handler that creates a TransformStream, pipes the token chunks from LLM Resayil into the writable side, and returns the readable side as the HTTP response. This proxies the live stream to the client while keeping your API key secure.

Q: What payment methods are accepted for streaming usage on LLM Resayil?

A: Payments are processed via Stripe or PayPal. Billing is in USD and follows a simple pay‑per‑use credit model.


Why LLM Resayil Wins for Real‑Time Next.js Chats

When building a real‑time chat app, you need three things: low‑latency streaming, a familiar SDK, and predictable pricing. LLM Resayil gives you all of these while also adding OpenAI compatibility, Arabic language support, and a wide catalog of 39 models. Because the gateway is hosted in the USA and integrates with Stripe and PayPal, developers in North America experience fast network routes and straightforward billing. The ability to use the official Anthropic SDK means you can adopt streaming with minimal code changes, and the built‑in support for function calling and tool use lets you grow the chat’s capabilities over time.


What You Get by Using LLM Resayil

  • Anthropic‑compatible streaming – identical request shape to the official API.
  • Pay‑per‑use credits in USD, billed via Stripe or PayPal.
  • Full SDK support – Anthropic, OpenAI, Python, JavaScript, cURL, plus integrations like LangChain and LiteLLM.
  • Multi‑language generation with special emphasis on Arabic.\nAll of this is delivered from a USA‑hosted service that guarantees reliability and low latency.

Code Example: Streaming a Chat with deepseek-v4-flash

import client from "./lib/resayilClient"; // Anthropic SDK configured for Resayil

async function chatStream() {
  const messages = [{ role: "user", content: "Explain quantum entanglement in simple terms." }];
  const response = await client.messages.create({
    model: "deepseek-v4-flash", // catalog slug
    max_tokens: 1024,
    stream: true,
    messages,
  });

  for await (const chunk of response) {
    if (chunk.type === "content_block_delta") {
      process.stdout.write(chunk.delta?.text ?? "");
    }
  }
}

chatStream();

Running this script prints the assistant’s answer token‑by‑token, demonstrating the live streaming capability.


Call to Action

Ready to add real‑time LLM responses to your Next.js app? Register for an API key, check the Pricing page for credit rates, and dive into the Docs for full integration guides. Start building smarter, faster, and more interactive chat experiences today!