Migration guide: swapping an LLM provider in your TypeScript stack (e.g., to Gemini)
Practical TypeScript checklist for swapping or adding an LLM provider—adapters, rate limits, prompt tests, canary rollouts, and fallbacks.
Facing a painful LLM swap in a TypeScript codebase? Start here.
Swapping or adding an LLM provider in a production TypeScript stack can break prompts, blow past rate limits, change runtime behavior, and surprise users. This guide gives a practical, actionable migration checklist and compatibility strategies so you can replace — or add — providers (for example, moving some calls to Google Gemini) with minimal disruption.
Executive summary: the migration checklist (do these first)
- Inventory all LLM usage: models, endpoints, prompt templates, embeddings, streaming, and rate/throughput patterns.
- Define contracts (TypeScript interfaces) for the provider surface your app consumes.
- Implement adapters for each provider to the common contract.
- Create tests: unit, contract, prompt regression, and integration smoke tests.
- Plan throttling & backoff — token-based and request-based limits differ by provider.
- Canary & metrics: run a partial rollout with observability, cost telemetry and fallbacks.
- Rollback and fallback paths: circuit breaker + provider priority list.
Why this matters in 2026 — context and trends
In 2026, the LLM ecosystem is more diverse and specialized than ever. Large players (OpenAI, Google Gemini, Anthropic) publish frequent model updates and new SDKs. Tech partnerships (for example, Apple's 2025 move to use Gemini technology in its assistant stack) mean apps may need to support multiple providers for performance, compliance, or commercial reasons.
At the same time, providers vary in these key ways:
- API shapes: chat vs completion vs function-calling vs streaming protocols.
- Rate limiting: requests-per-minute, tokens-per-minute, concurrency limits, and burst allowances.
- Safety and moderation: on-request vs post-processing filtering.
- Response behavior: temperature handling, sampling defaults, hallucination tendencies.
- Cost accounting: tokens vs call pricing vs multimodal charges (images/audio).
Step 1 — Inventory and map feature parity
Start by cataloging every place your app touches an LLM. This is more than “where we call OpenAI.” Be systematic.
What to capture
- API type: chat/completion/embedding/moderation/streaming.
- Prompt templates: runtime interpolation, few-shot examples, system messages.
- Model config: temperature, top_p, max_tokens, stop sequences, sampling seeds.
- Runtime shape: sync vs streaming, chunking behavior, function-call hooks.
- Operational data: requests per minute, peak concurrency, token usage, error rates.
- Security & compliance: PII in prompts, residency, retention policies.
Do this as a living document (spreadsheet or doc). It becomes the migration spec for mapping features to a new provider.
Step 2 — Define a strong TypeScript contract
Instead of spreading provider SDK types everywhere, define a narrow, explicit contract your application uses. This simplifies adapters, tests, and future provider swaps.
Example: minimal LLM provider interface
export type ChatMessage = { role: 'system' | 'user' | 'assistant'; content: string };
export interface LLMResponseChunk {
text?: string; // for streaming
done?: boolean;
metadata?: Record;
}
export interface LLMResponse {
text: string;
tokens?: number; // measured tokens
model?: string;
raw?: unknown; // provider-specific raw response
}
export interface LLMProvider {
name: string;
generate(chat: ChatMessage[], options?: { maxTokens?: number; temperature?: number }): Promise;
streamGenerate?(chat: ChatMessage[], onChunk: (c: LLMResponseChunk) => void, options?: { maxTokens?: number; temperature?: number }): Promise<void>;
embed?(input: string | string[]): Promise<number[][]>;
// metadata to help routing or rate limiting
capacity?: { rps?: number; tokensPerMinute?: number };
}
Why: This single source of truth avoids leaking provider-specific types and makes it easy to swap implementations without touching business code.
Step 3 — Implement adapters for each provider
Adapters map SDKs to your LLMProvider contract. Keep them thin and well-tested. Use environment-based configuration to select providers at runtime.
Adapter example: provider-adapter.ts (pseudo)
import OpenAI from 'openai';
export function makeOpenAIAdapter(apiKey: string): LLMProvider {
const client = new OpenAI({ apiKey });
return {
name: 'openai',
async generate(chat, options) {
const resp = await client.chat.completions.create({ model: 'gpt-4o', messages: chat, max_tokens: options?.maxTokens });
return { text: resp.choices.map(c => c.message.content).join(''), tokens: resp.usage?.total_tokens, raw: resp };
}
};
}
For Gemini, implement a Gemini adapter that reconciles function-calling, multimodality, or streaming differences into the same shape.
Step 4 — Handle rate limits and throttling
Providers differ: some count tokens, some count requests, some cap concurrency. Build a pluggable throttling layer that enforces the strictest constraints you want to honor.
Patterns to use
- Token bucket for token-based quotas — measure tokens and throttle when budget low.
- Leaky bucket / request queue for request-per-second controls.
- Concurrency semaphores for providers with connection limits or streaming per-connection caps.
- Adaptive backoff driven by HTTP 429/503 responses: exponential backoff with jitter and a circuit-breaker to stop cascading failures.
TypeScript rate limiter example (conceptual)
export interface RateLimitConfig { requestsPerMinute?: number; tokensPerMinute?: number; }
export function wrapWithRateLimit(provider: LLMProvider, cfg: RateLimitConfig): LLMProvider {
// implement a token-bucket and queue behind the scenes
return {
...provider,
async generate(chat, options) {
// await checkAndConsume(cfg, estimatedTokens(chat, options));
return provider.generate(chat, options);
}
};
}
Step 5 — Streaming & runtime behavior compatibility
Streaming APIs differ in chunk semantics, control sequences, and event shapes. Normalize streams into your LLMResponseChunk shape so downstream UIs behave consistently.
Key considerations:
- Chunk boundaries: some providers send partial tokens; others send sentence-level chunks.
- End-of-stream indicators: make sure your adapter signals a final chunk with done=true.
- Error handling mid-stream: design a strategy for both recoverable resumes and terminal failures.
Step 6 — Prompt compatibility and regression testing
Prompts are code. Treat them as first-class artifacts with tests and versioning.
Practical steps
- Snapshot tests for prompts: freeze outputs for a canonical model and detect regressions. Use golden files for deterministic parts (embeddings or rule-based prompts).
- Prompt linting: ensure templates always include required system messages or few-shot content.
- A/B tests between providers with identical prompts to catch semantic differences (e.g., Gemini vs. previous provider)
- Semantic diffing: for structured outputs (JSON), validate schemas using zod/io-ts and compare parsed results.
Example: testing structured function output
// Use zod to validate structured responses
import { z } from 'zod';
const TodoSchema = z.object({ id: z.string(), text: z.string(), priority: z.number().int().min(1).max(5) });
// In tests: call provider, parse JSON and assert TodoSchema.parse(result)
Step 7 — Contract & integration tests (CI)
Write two layers of tests:
- Contract tests against a mocked provider interface to assert your adapter implements the contract. These are fast and run on every PR.
- Integration/smoke tests that call the actual provider(s) in a gated environment (staging) to validate end-to-end behavior, rate limits, and costs.
For contract testing patterns, consider tooling like Pact for HTTP contracts or custom fixtures using MSW/nock. Keep integration tests budgeted to avoid unexpected bills.
Step 8 — Observability, metrics and cost telemetry
When you run a canary, you must measure. Instrument these metrics for each provider:
- Latency (p50/p95/p99), tokens per request, error rate
- 429/503 counts and retry volume
- Cost per request and cost per feature (map features to costs)
- Quality metrics: exact match on structured outputs, human-labeled quality or RM (reward model) scores
Push metrics to your observability stack (Prometheus/Grafana, Datadog) and attach traces so you can correlate an LLM provider change with user-facing regressions. See embedding observability examples for ideas on metric design.
Step 9 — Deployment strategy: canary, blue/green, or shadowing
Don’t flip the switch globally. Use these rollout strategies:
- Shadowing: send requests to new provider in parallel (no user impact) and record diffs. Consider automating diffs with prompt-chain tooling for shadowing analysis.
- Canary: route a small percentage (1–5%) of real traffic to the new provider and compare KPIs.
- Blue/green: switch non-user-critical traffic first (internal tools, low-risk features).
Shadowing is particularly effective in 2026 because providers now expose richer server-side telemetry; comparing returned metadata helps detect subtle behavior changes.
Step 10 — Fallbacks, circuit breakers, and multi-provider strategies
Assume a provider can exhibit transient failures or unexpected semantics. Design a fallback strategy:
- Priority list: primary provider > fallback provider.
- Graceful degrade: return cached responses, simpler rules, or a best-effort response.
- Circuit breaker: open when error rate or latency exceeds thresholds for N minutes.
Example: fallback wrapper (simplified)
async function generateWithFallback(providers: LLMProvider[], chat: ChatMessage[]) {
for (const prov of providers) {
try {
return await prov.generate(chat);
} catch (err) {
// log and try next
console.warn(`provider ${prov.name} failed`, err);
}
}
throw new Error('All providers failed');
}
Security, privacy & compliance checklist
- Verify data residency and retention policies for the provider (especially important for healthcare or financial data).
- Redact or tokenise PII in prompts where possible; consider on-prem or private models if required.
- Confirm provider’s safety filters and moderation APIs match your risk profile.
- Ensure API keys and secrets are stored in a secure vault and rotated routinely.
Cost modeling and billing controls
Different billing models require different guardrails. Build an internal cost model by feature, not by raw usage. Track token usage per feature and set soft and hard limits.
Automated policies to prevent runaway costs:
- Per-feature budget caps
- Alerts on daily spend anomalies
- Automatic throttling when projected monthly spend exceeds budget
Real-world example: migrating a customer support assistant to Gemini (hypothetical)
We migrated a customer support assistant from Provider A to Google Gemini in late 2025. Key lessons:
- Prompt changes: responses were less verbose by default; we adjusted system prompts to preserve greeting style.
- Token reporting: Gemini reported tokens differently — we added adapter code to normalize token counts to our accounting system.
- Streaming: Gemini’s streaming chunks favored sentence boundaries; the UI reflow logic needed small changes to preserve cursor position.
- Rollout: a two-week shadowing period exposed three prompt regressions that automated snapshot tests didn’t catch; human-in-the-loop reviews fixed these quickly.
"Treat prompts like APIs: version them, test them, and monitor how changes affect users." — internal migration playbook
Testing matrix and CI checklist
Include these tests in CI pipelines:
- Unit tests for adapters (mock provider responses)
- Prompt snapshot tests (run in CI and update intentionally)
- Schema validation for structured outputs
- Contract tests that verify the adapter implements LLMProvider interface
- Staged integration tests (nightly) that call the real provider with limited quotas
Common migration pitfalls and how to avoid them
- Leaking SDK types across app layers — avoid this by using an adapter and a shared contract.
- Ignoring token count differences — normalize token accounting in adapters.
- Assuming streaming parity — test streaming paths end-to-end in staging.
- No cost governance — run small integration tests, budgeted usage, and telemetry first.
- No human review phase — include product and QA reviews during the canary.
Actionable takeaways
- Make a contract: one TypeScript interface your app depends on.
- Write adapters for each provider that map behavior and metadata.
- Test prompts with snapshots, schema checks and human validation.
- Implement rate limiting using token-bucket + request queue to protect providers and budgets.
- Roll out gradually using shadowing and canaries; monitor quality and cost metrics closely.
Final checklist before you flip the switch
- Inventory complete and mapped to feature parity
- TypeScript contract defined and used across the codebase
- Adapters implemented and unit-tested
- Rate-limiting & fallback wrappers in place
- Prompt regression tests in CI
- Shadowing/canary deployment plan ready and monitored
- Cost & compliance checks passed
Looking forward: what to expect in 2026+
Expect more heterogeneity: specialty models, on-device inference, and unified multimodal APIs. The best strategy is to keep your app’s LLM dependency surface narrow, make behavior explicit with contracts, and invest in observability and prompt testing. Partnerships and consolidations (like Apple’s adoption of Gemini in 2025–2026) will change provider landscapes — but a robust adapter + contract approach keeps your codebase resilient.
Resources & next steps
- Start a migration doc (inventory + mapping)
- Build the LLMProvider TypeScript contract in a shared package or monorepo
- Implement adapters for your top two providers
- Instrument metrics and set up nightly shadowing diffs
Call to action
If you’re planning a migration this quarter, fork the TypeScript contract examples above, run a quick shadowing experiment, and instrument tokens & latency today. Need a review of your adapter or migration plan? Share a snippet of your LLMProvider contract or prompt templates and I’ll walk through compatibility and testing suggestions tailored to your codebase.
Related Reading
- How to Audit and Consolidate Your Tool Stack Before It Becomes a Liability
- From Unit Tests to Timing Guarantees: Building a Verification Pipeline
- From Outage to SLA: Reconciling Vendor SLAs
- Automating Safe Backups and Versioning Before Letting AI Tools Touch Your Repositories
- Embedding Observability into Serverless Clinical Analytics — Evolution and Advanced Strategies
- Preparing for Third‑Party Outages: Testing Patient Access and Telehealth Failovers
- Volunteer Roles You Need Now: Tech Moderators, Livestream Hosts and eCommerce Helpers
- Prayer Nook Lighting: Using Smart Lamps to Create Calm, Modest Spaces at Home
- How to Pitch a Short Film to YouTube for Monetization: A Guide for Actors and Filmmakers
- Use a 32" Monitor in the Kitchen: Why a Big Screen Helps With Recipes and Streaming
Related Topics
typescript
Contributor
Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.
Up Next
More stories handpicked for you
From Our Network
Trending stories across our publication group