Streaming Claude responses over a 1 KB Node proxy

You want Linda’s agent loop in the browser. You don’t want your Anthropic API key in the browser. Most “proxy” solutions for LLM-streaming are 200 KB of middleware with rate limiting, request inspection, multi-tenant support, observability hooks, and a SaaS up-sell.

That’s overkill. We ship @linda/server — a 1 KB Node middleware that streams SSE from your LLM provider to your browser. No business logic. No state. No SaaS. You add the rate limiting and observability you actually need, in your own infrastructure.

Here’s how it works, why it’s that small, and when it’s actually the right shape.

The whole proxy

// @linda/server — what we actually ship.
import type { Request, Response } from "express";

interface LindaProxyOptions {
  provider: "anthropic" | "openai" | "groq" | "openrouter" | "ollama";
  apiKey: string;
  model?: string;
  baseUrl?: string;
}

export function lindaProxy(opts: LindaProxyOptions) {
  return async (req: Request, res: Response) => {
    const target = resolveTarget(opts);
    const headers = providerHeaders(opts);

    const upstream = await fetch(target.url, {
      method: "POST",
      headers,
      body: JSON.stringify({ ...req.body, model: req.body.model ?? opts.model }),
    });

    res.setHeader("Content-Type", "text/event-stream");
    res.setHeader("Cache-Control", "no-cache");
    res.setHeader("Connection", "keep-alive");

    const reader = upstream.body!.getReader();
    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      res.write(value);
    }
    res.end();
  };
}

That’s the whole thing. Plug it into your existing Express / Hono / Fastify app:

app.post("/api/linda", lindaProxy({
  provider: "anthropic",
  apiKey: process.env.ANTHROPIC_API_KEY!,
  model: "claude-sonnet-4-6",
}));

In the browser:

new Linda({ transport: { mode: "proxy", url: "/api/linda" } });

Why it’s that small

We made a deliberate choice: @linda/server is not a multi-tenant API gateway. It’s a pipe. The pipe terminates one TLS handshake (the user’s browser → your server), starts another (your server → the LLM provider), and forwards bytes.

What we don’t ship:

Rate limiting. You have this. Use Express middleware, Cloudflare, or your API gateway.
Auth. You have this. Use your existing session/JWT/cookie auth.
Observability. You have this. Use your existing logger / Sentry / OpenTelemetry.
Per-user quotas. Implement in 20 lines on top of lindaProxy if you need them.
Request inspection. Hook into your existing middleware. The request body is JSON; do what you want.

This is the “boring kernel” philosophy: ship the smallest correct primitive, let users compose.

SSE in three layers

The reason streaming LLM responses needs special care is that SSE has three layers:

HTTP transport. Keep the connection open. Don’t buffer. Headers right.
Event framing. data: <json>\n\n per event. The provider does this.
Application semantics. “Text chunk”, “tool use start”, “tool use delta”, “message stop” — provider-specific.

The 1 KB proxy handles layer 1 and 2 (the bytes flow through unmodified). Linda’s browser-side SSE parser handles layer 3 — the protocol-specific event names. That separation lets the proxy stay small.

If you bend the protocol — buffer responses, transform events, add custom metadata — the proxy grows. Don’t bend the protocol. Pass through.

Why a proxy at all

If you’ve never built an LLM app, you might wonder why this matters. Here’s the constraint:

API keys in the browser are visible to anyone with DevTools.
“Visible” means “copyable”, which means “rate-limit-bypassable”, which means “abuse-vector.”
Even with strict per-key rate limits, key extraction is a footgun.

So you proxy. The browser hits your server, which has the key. The browser never sees the key. The proxy adds nothing functional — its only job is to hide one credential.

Provider-side restrictions help (Anthropic and OpenAI both support browser-restricted “client keys”), but they don’t fully solve it. Proxy is the durable answer.

When BYOK is fine

Browser-side keys are actually fine when:

The app is a developer tool and the user provides their own key.
The key is a single-user dev key with no production access.
The cost-per-misuse is genuinely small.

If you’re shipping a marketing site demo, BYOK + a low-budget key + a clear “this uses your key” UX is reasonable. If you’re shipping anything customers use, proxy.

Composing observability

Want to log every LLM call? Wrap the proxy:

app.post("/api/linda", auth(), rateLimit(), async (req, res, next) => {
  const t = Date.now();
  res.on("finish", () => {
    log.info("linda_call", {
      userId: req.user.id,
      durationMs: Date.now() - t,
      model: req.body.model,
    });
  });
  return lindaProxy({ /* ... */ })(req, res);
});

The pipe stays a pipe. You add what you need around it.

When you’d skip @linda/server

You’d use a different transport when:

You’re already on Vercel AI SDK. Their proxy works fine; @linda/server is just a smaller alternative.
You’re on a non-Node backend. Write 30 lines of equivalent in Python / Go / Rust. The wire protocol is text/event-stream either way.
You need queued processing. Long-running agentic flows (10+ minute jobs) want a queue, not a streamed proxy. Linda also supports a webhook-callback mode for those.
You don’t need to hide the key. BYOK works.

The point of @linda/server is: when you do want a proxy, it’s there in 1 KB. When you don’t, leave it out.

What this means

Linda’s whole runtime story — and most of the projects in this neighborhood — assume an SSE-streaming protocol. If you’ve been hand-rolling streaming-LLM-response code, this pattern (browser → fetch → ReadableStream → SSE parser) is the right abstraction, with or without Linda. Steal it.

For the full Linda story: /install. For the proxy package: @linda/server on npm.

FAQ

Do I need this for production?

Yes — unless your app is so low-stakes that exposing the API key in the browser is acceptable. For anything where the key has real budget or access, use a proxy.

Can I use Vercel AI SDK as the proxy instead?

Yes — Vercel AI SDK works fine as a Linda transport target. @linda/server is just a smaller alternative when you don't want the full Vercel AI SDK surface.

AI / LLM crawler? Read the raw markdown: /raw/streaming-llm-proxy.md