llmsdktypings

Building a typed wrapper for Gemini (or similar LLM APIs) in TypeScript

ttypescript

2026-02-01

10 min read

Build a reusable TypeScript client for Gemini-style LLMs: typed requests/responses, streaming, retries, and typed prompt templates for production apps.

Struggling with brittle LLM responses, messy JSON parsing, or untyped streaming code that breaks in production? If you build apps on Gemini-style generative APIs, you need a reusable TypeScript client that gives you safe request/response types, predictable streaming behavior, robust retries, and composable prompt templates — without sacrificing performance or ergonomics.

Executive summary — what you'll get

This article shows how to design and implement a typed, reusable TypeScript client for Gemini-like LLM APIs (the kind powering assistant products and multi-modal services in late 2025–2026). You'll get:

Type-safe request & response generics
Streaming support (browser + Node) with a small, robust parser
Retry/backoff with idempotency and Retry-After handling
Typed prompt templates that enforce variable shapes at compile- and runtime
Tips for packaging an SDK-friendly library (tree-shaking, opt-in runtime validation)

Why typed LLM clients matter in 2026

LLM APIs evolved quickly in 2024–2026. Gemini-style APIs now power assistants and multi-modal features in major platforms (for example, Apple using Google's Gemini for advanced assistant experiences as announced in recent years). That growth made two things obvious for engineering teams:

Type problems (unexpected shapes, partial streaming events, or API schema drift) cause costly runtime bugs.
Streaming + retries + prompt management are cross-cutting concerns — and they must be consistent across services.

Building a small, well-typed client reduces runtime surprises, improves DX, and scales across teams.

Design goals

Minimal, predictable surface — one Client class with typed methods.
Pluggable validation — optional runtime validation via zod/io-ts to balance safety & bundle-size. See guidance on hardening local JavaScript tooling for teams.
Streaming-friendly — unified streaming model for browser and Node.
Robust retries — exponential backoff, jitter, and respect for Retry-After headers. Keep retry policies auditable and small to avoid runaway costs (see stack audits at Strip the Fat).
Typed prompt templates — compile-time checking of template variables.

Core TypeScript patterns

Start by modelling requests and responses with generics. Keep the client agnostic to a specific model schema so the same client works with simple text outputs or structured JSON.

Typed request & response shape

// core-types.ts
export type ModelName = string;

// Generic request payload — TParams are model-specific options
export interface LLMRequest {
  model: ModelName;
  prompt: string;
  maxTokens?: number;
  temperature?: number;
  params?: TParams;
}

// Generic response — TOutput is the typed content (string or structured)
export interface LLMResponse {
  id: string;
  model: string;
  output: TOutput;
  usage?: { promptTokens?: number; completionTokens?: number };
}

Streaming events

Many Gemini-style APIs stream token deltas or JSONL events. Represent those events with a discriminated union.

// stream-types.ts
export type StreamEvent =
  | { type: 'delta'; delta: TChunk }
  | { type: 'done' }
  | { type: 'error'; message: string };

Implementing the client (fetch-based)

Use fetch for portability (Node has stable fetch in modern LTS releases in 2024–2026). Keep AbortController timeouts and a simple retry wrapper.

Low-level fetch wrapper with retries

// retry.ts
async function sleep(ms: number) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

export async function withRetries<T>(fn: () => Promise<T>, retries = 3, baseMs = 200) {
  let attempt = 0;
  while (true) {
    try {
      return await fn();
    } catch (err: any) {
      attempt++;
      if (attempt >= retries) throw err;
      // Respect custom Retry-After if provided on the error shape
      const backoff = baseMs * Math.pow(2, attempt - 1);
      const jitter = Math.random() * 100;
      await sleep(backoff + jitter);
    }
  }
}

Client skeleton

// client.ts
import type { LLMRequest, LLMResponse, StreamEvent } from './core-types';

export interface ClientOptions {
  apiKey: string;
  baseUrl?: string; // e.g. https://api.gemini.example.com/v1
  timeoutMs?: number;
}

export class LLMClient {
  constructor(private opts: ClientOptions) {}

  private async fetchJson<T>(url: string, init: RequestInit) {
    const controller = new AbortController();
    const timeout = setTimeout(() => controller.abort(), this.opts.timeoutMs ?? 60_000);
    try {
      const res = await fetch(url, { ...init, signal: controller.signal });
      clearTimeout(timeout);
      if (!res.ok) {
        const text = await res.text();
        throw new Error(`HTTP ${res.status}: ${text}`);
      }
      return (await res.json()) as T;
    } finally {
      clearTimeout(timeout);
    }
  }

  async generate<TOutput, TParams = Record<string, unknown> >(req: LLMRequest<TParams>): Promise<LLMResponse<TOutput>> {
    const url = `${this.opts.baseUrl ?? 'https://api.example.com'}/generate`;
    return withRetries(() => this.fetchJson<LLMResponse<TOutput>>(url, {
      method: 'POST',
      headers: { 'authorization': `Bearer ${this.opts.apiKey}`, 'content-type': 'application/json' },
      body: JSON.stringify(req),
    }));
  }

  // Streaming entry point
  async stream<TChunk = string>(req: LLMRequest, onEvent: (ev: StreamEvent<TChunk>) => void) {
    const url = `${this.opts.baseUrl ?? 'https://api.example.com'}/stream`;
    const res = await fetch(url, {
      method: 'POST',
      headers: { 'authorization': `Bearer ${this.opts.apiKey}`, 'content-type': 'application/json' },
      body: JSON.stringify({ ...req, stream: true }),
    });

    if (!res.ok) throw new Error(`HTTP ${res.status}`);

    // Handle text/event-stream or chunked JSON
    const reader = res.body?.getReader();
    if (!reader) throw new Error('Streaming not supported in this environment');

    const decoder = new TextDecoder();
    let buffer = '';

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      buffer += decoder.decode(value, { stream: true });
      let idx: number;
      while ((idx = buffer.indexOf('\n')) !== -1) {
        const line = buffer.slice(0, idx).trim();
        buffer = buffer.slice(idx + 1);
        if (!line) continue;
        try {
          // e.g. server sends JSON lines: { "type": "delta", "delta": "hello" }
          const ev = JSON.parse(line) as StreamEvent<TChunk>;
          onEvent(ev);
        } catch (e) {
          // partial chunk — keep buffering
        }
      }
    }

    // flush remainder
    if (buffer.trim()) {
      try { onEvent(JSON.parse(buffer) as StreamEvent<TChunk>); } catch {};
    }

    onEvent({ type: 'done' });
  }
}

Typed prompt templates

Prompt templates let you keep prompts DRY while enforcing that callers supply the correct variables.

Compile-time checked templates

// template.ts
type TemplateVars = T;

export function makeTemplate<T extends Record<string, string>>(tpl: string) {
  // At runtime we simply return a function, but the generic parameter T lets TS enforce keys
  return {
    render(vars: TemplateVars<T>) {
      return tpl.replace(/\{\{(\w+)\}\}/g, (_, k) => {
        if (!(k in vars)) throw new Error(`Missing template variable: ${k}`);
        return String((vars as any)[k]);
      });
    },
  } as const;
}

// Usage
const userSummary = makeTemplate<{ name: string; bio: string }>(
  `Summarize {{name}} with these notes:\n{{bio}}`);

const prompt = userSummary.render({ name: 'Alice', bio: 'Full-stack dev' });

For stronger runtime guarantees, pair template variable types with zod schemas for validation.

Structured outputs and validation

LLMs often return JSON that you want to parse into typed shapes. Use runtime validators and, when possible, instruct the model to emit JSON only. For secure handling and retention of structured outputs, consult zero-trust storage guidance.

// using zod (optional runtime dep)
import { z } from 'zod';

const PersonSchema = z.object({ name: z.string(), age: z.number() });
type Person = z.infer<typeof PersonSchema>;

// After receiving text output from the model, parse with zod
function parsePerson(output: string): Person | null {
  try {
    const json = JSON.parse(output);
    const res = PersonSchema.safeParse(json);
    return res.success ? res.data : null;
  } catch {
    return null;
  }
}

Streaming + structured parsing

When you stream JSON tokens, you can incrementally build the JSON string and attempt to parse when you encounter flush markers (or the stream signals done). Avoid calling JSON.parse on fragments — either the server must send discrete JSON objects per event, or you buffer until complete.

Example: accumulate token deltas into JSON fields

// caller.ts
const client = new LLMClient({ apiKey: process.env.API_KEY! });

let acc = '';
await client.stream({ model: 'gpt-xyz', prompt: prompt }, ev => {
  if (ev.type === 'delta') acc += ev.delta;
  if (ev.type === 'done') {
    const parsed = parsePerson(acc);
    if (!parsed) console.error('Invalid JSON from model');
    else console.log('Parsed person:', parsed);
  }
});

Retries, idempotency and long-running requests

Retries are essential, but naive retries can double-charge or duplicate actions if the model generated side-effects. In 2026, industry guidance favors:

Idempotency tokens for non-read-only operations
Respecting Retry-After headers and server signals
Fail-fast policies for large downstream cost operations

Include idempotency in your request type and send it as a header or body field.

interface LLMRequestWithId<TParams> extends LLMRequest<TParams> {
  idempotencyKey?: string;
}

// server receives idempotencyKey and avoids repeated billing or side-effects

Packaging an SDK-friendly library

Make runtime validation optional and provide both promise and stream-first APIs. Key packaging tips:

Ship ESM builds + types only. Keep runtime deps optional and small (zod as peerDep or optional). See practical toolchain hardening tips at Hardening Local JavaScript Tooling.
Provide a lightweight core (fetch, parsing, streaming) and a separate validation plugin.
Document cost considerations and streaming guarantees clearly.

Observability & monitoring

Track metrics for:

API latency, token usage, and streaming throughput
Retry rates and error types
Template rendering failures and validation rejects

Emit structured logs and traces; include model name, prompt hash, idempotency key, and final response size for debugging without leaking PII. For playbooks on observability and cost control, see Observability & Cost Control for Content Platforms and related resources on secure storage.

Note: As Gemini-style APIs expand multi-modal and tool-use capabilities (late 2025–early 2026), designing the client to accept generic params and pluggable tools will future-proof integrations.

Advanced strategies & production hardening

1) Rate-limiting & circuit breakers

Guard against cascading failures by implementing token buckets per-tenant and a circuit breaker to fail fast during provider outages. Operational patterns for quota and token-bucket approaches are discussed in broader infrastructure pieces like how to run a validator node (for background on economics and rate control at scale).

2) Cost-control features

Provide caller-side budgets (max tokens, max spend) and preflight token-estimation heuristics before generating large outputs. Pair these with a short stack audit to remove unused runtime deps (Strip the Fat).

3) Semantic schema evolution

Model outputs evolve. Keep validators versioned and provide graceful fallback when a newer model returns new fields — do not break the app on unknown fields. Local-first sync and validation strategies can help here; see the local-first sync appliances review for ideas about local validation and offline-first workflows.

Actionable takeaways

Use generics everywhere for request/response shapes to keep the surface flexible.
Make runtime validation opt-in (zod/io-ts) so you can balance safety vs bundle size. Tooling advice is available in hardening and audit playbooks (local JS tooling, stack audits).
Stream safely — expect partial chunks and use JSONL or discrete events from the server when possible. Edge-first design patterns are useful here (edge-first layouts).
Respect idempotency & Retry-After to avoid duplicated side-effects or overcharging.
Version validators and provide graceful degradation for schema changes.

Real-world example: typed SDK API

// public-api.ts
export type GenerateFn = <TOutput, TParams = Record<string, unknown>>(req: LLMRequest<TParams>) => Promise<LLMResponse<TOutput>>;
export type StreamFn = <TChunk = string>(req: LLMRequest, onEvent: (ev: StreamEvent<TChunk>) => void) => Promise<void>;

export interface SDK {
  generate: GenerateFn;
  stream: StreamFn;
  makeTemplate: typeof makeTemplate;
}

This surface keeps the SDK small and easy to test while giving type guarantees to consumers.

Future predictions (2026+)

Expect further standardization of LLM API features: richer function/tool calling, reliable resumable streams, and better server-sent typed events. SDKs that offer strong compile-time typing, optional runtime validation, and explicit streaming semantics will become the default for production apps integrating generative AI. See broader predictions on AI + observability in commerce and ops at AI & Observability predictions.

Wrap-up

Building a typed wrapper for Gemini-style APIs is more than adding TypeScript types — it's about creating a predictable, maintainable integration surface that handles real-world concerns: streaming, retries, prompt composition, and schema drift. The patterns here help you ship safer LLM features and scale them across teams.

Quick starter checklist

Create generic LLMRequest/LLMResponse types.
Implement streaming with a robust parser for JSONL or SSE events.
Add withRetries with exponential backoff + jitter and respect Retry-After.
Offer typed prompt templates and optional zod validation for outputs.
Expose a minimal SDK surface and keep runtime deps optional. Consider a short audit to remove underused tools (stack audit).

Ready to build a production-grade SDK? Start by implementing the small client skeleton above, add zod for the first validated endpoint, and iterate on streaming ergonomics based on your provider's event format.

If you'd like a curated starter repo or checklist tailored to your existing monorepo and bundler, ping us — we can help scaffold a TypeScript-first SDK and migration path for your services.

typescript

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.