From Chrome Extension to Local AI Extension: A Migration Playbook in TypeScript
A practical, incremental playbook to migrate Chrome extensions to privacy-first local-AI WebExtensions using TypeScript and modern web APIs.
Hook: Why your Chrome extension should become a privacy-preserving local-AI extension — and why now
If you're maintaining a Chrome extension that processes user text, stores preferences, or automates workflows, you probably face three recurring headaches: privacy concerns, third-party API costs, and the growing expectation that AI features run locally. In 2025–2026 the shift toward on-device LLMs and improved browser capabilities (WebAssembly threads, WebGPU, SharedArrayBuffer support behind COOP/COEP) made local-AI extensions realistic for many use cases. This playbook shows a practical, incremental migration path from a traditional Chrome extension to a TypeScript-based, privacy-first local-AI WebExtension.
What you’ll get from this guide
- Concrete migration phases (audit → TS toolchain → MV3 service worker → local model runtime)
- TypeScript examples for messaging, model management, and safe I/O
- Architectural patterns for running inference inside a browser extension
- Privacy and UX considerations so the extension never leaks user text
Context & 2026 trends
By late 2025 and into 2026 browser vendors expanded support for capabilities that make local AI feasible inside extensions: stable WebGPU and better WebAssembly threading, broader SharedArrayBuffer availability under proper COOP/COEP headers, and a growing ecosystem of WASM-based model runtimes (small LLMs compiled to WASM/gguf). Mobile-first browsers and privacy-focused projects — some shipping on-device LLM support — proved the model. For extension authors, this means you can now build feature-rich, offline-first AI experiences that never leave the user's machine.
Migration overview — incremental, safe, reversible
The safest migrations are incremental. Below is a three-phase plan you can follow in parallel branches or small sprints. Each phase produces a usable artifact and keeps the extension functional for existing users.
Phase 0 — Audit & goals
- Inventory all places where user data leaves the extension (analytics, API calls, update checks).
- Identify feature candidates for local inference (summaries, completions, classifiers).
- Set success criteria: e.g., “Local inference latency & memory fit within the browser for target devices.”
- Decide model strategy: bundled small model, user-downloaded model, or optional connect-to-cloud.
Phase 1 — TypeScript + Modern Tooling (low risk)
Convert code to TypeScript gradually. The goal is to add static safety and improve DX without changing runtime behavior.
- Initialize toolchain: tsconfig, bundler (esbuild/Vite/Rollup), and type libs.
- Use allowJs and checkJs to migrate files one-by-one.
- Install WebExtension types and the promise-based polyfill:
@types/chromeandwebextension-polyfill(with TypeScript types). - Keep manifest.json in sync — you’ll migrate to MV3 in Phase 2.
Phase 2 — Manifest V3 & Service Worker conversion
Manifest V3 (MV3) uses a background service worker instead of a persistent background page. The worker is ephemeral, so design for lifecycle events and move long-running work to an offscreen document or content context that supports WebGPU/WebAssembly as needed.
Phase 3 — Local-AI runtime integration
Add a model manager that downloads/installs models into IndexedDB or Cache Storage, performs inference via a WASM runtime using WebGPU/WebGL/WebNN, and exposes a typed messaging API for UI and content scripts.
Concrete setup: TypeScript + bundler + typings
Start by adding TypeScript with a conservative config so you can convert files progressively.
// tsconfig.json (tailored for extensions)
{
"compilerOptions": {
"target": "ES2020",
"module": "ESNext",
"moduleResolution": "Node",
"lib": ["DOM", "ES2020", "WebWorker"],
"strict": true,
"esModuleInterop": true,
"skipLibCheck": true,
"resolveJsonModule": true,
"rootDir": "src",
"outDir": "dist",
"allowJs": true,
"checkJs": false
},
"include": ["src/**/*"]
}
Add essential dependencies in package.json:
webextension-polyfill— unify chrome/browser APIs with Promise-friendly calls@types/chrome— helpful for legacy chrome.* types- Bundler:
esbuildorvitefor fast builds
Manifest v3 basics (example)
Convert your manifest to MV3. Note: a service worker can't access DOM APIs, so move DOM work into content scripts or an offscreen document.
// manifest.json (MV3)
{
"manifest_version": 3,
"name": "My Local-AI Extension",
"version": "1.0.0",
"permissions": ["storage", "scripting", "offscreen"],
"background": {
"service_worker": "dist/background.js"
},
"action": {
"default_popup": "popup.html"
},
"content_scripts": [
{
"matches": [""],
"js": ["dist/content.js"],
"run_at": "document_idle"
}
]
}
Service worker patterns in TypeScript
Service workers in MV3 are event-driven. Use typed message handlers and keep state in IndexedDB/Cache to survive worker restarts.
// src/background.ts
import browser from 'webextension-polyfill';
type InferenceRequest = { id: string; prompt: string };
type InferenceResult = { id: string; text: string };
// Lightweight typed message dispatcher
browser.runtime.onMessage.addListener(async (msg: any, sender) => {
if (msg?.type === 'INFER') {
const req = msg as InferenceRequest;
// Dispatch to model manager (keeps model lifecycle outside worker memory if needed)
const text = await ModelManager.infer(req.prompt);
const res: InferenceResult = { id: req.id, text };
return Promise.resolve(res);
}
});
// Basic model manager skeleton
class ModelManager {
static async infer(prompt: string): Promise {
// Ensure model loaded (may lazy-load WASM into an OffscreenDocument)
// For demo, return a dummy response
return `LocalAI response for: ${prompt}`;
}
}
Where should inference run?
Choose where to run heavy inference carefully:
- Content Script — Pros: access to DOM and WebGPU in some cases. Cons: instantiates per-tab and may duplicate memory across tabs.
- Offscreen Document — Pros: long-running, can host WebGL/WebGPU context and DOM APIs; intentionally designed for MV3 long work. Cons: involves more plumbing and permissions.
- Service Worker — Pros: lightweight event handling, ideal for orchestration. Cons: no DOM/WebGPU access; ephemeral lifetime.
Best practice: keep the worker as the orchestrator and perform actual WASM/WebGPU inference inside an offscreen document or a privileged extension page. Use typed messages to control it.
Example: Creating an offscreen document to run a WASM runtime
Chrome provides an offscreen API to create a hidden document for long-running tasks that need a DOM or GPU. Use this to instantiate a WebAssembly LLM runtime that expects WebGL/WebGPU and threads.
// background.ts (continuation)
async function ensureOffscreen() {
if (!await browser.offscreen.hasDocument()) {
await browser.offscreen.createDocument({
url: 'offscreen.html',
reasons: ['WEB_RTC', 'AUDIO_PLAYBACK', 'CUSTOM_TYPED_REASON']
});
}
}
browser.runtime.onMessage.addListener(async (msg) => {
if (msg?.type === 'START_INFER') {
await ensureOffscreen();
// send message to offscreen to start model load/inference
await browser.runtime.sendMessage({ type: 'OFFSCREEN_INFER', prompt: msg.prompt });
}
});
offscreen.html: the inference host
The offscreen document can import a bundled JS module that initializes the WASM runtime using WebGPU or WebGL. Keep this file small and tightly typed.
// src/offscreen.ts
import browser from 'webextension-polyfill';
browser.runtime.onMessage.addListener(async (msg) => {
if (msg.type === 'OFFSCREEN_INFER') {
const result = await LocalWasmRuntime.infer(msg.prompt);
// send result back to the service worker or content script
await browser.runtime.sendMessage({ type: 'INFER_RESULT', result });
}
});
class LocalWasmRuntime {
static async init() {
// fetch model from IndexedDB/cache and instantiate WASM with WebGPU
}
static async infer(prompt: string) {
// call runtime and return the text
return `Simulated: ${prompt}`;
}
}
Managing model files: storage & privacy
Treat model files like any large binary asset. Store them in Cache Storage or IndexedDB with explicit user consent. Provide a clear UI to:
- Download and remove models locally
- Restrict model usage to offline/local-only mode
- Show disk and memory footprint estimates before download
Example: saving a model chunk into IndexedDB using idb-keyval or a small wrapper.
// src/modelStore.ts
import { set, get } from 'idb-keyval';
export async function saveModelBlob(name: string, blob: Blob) {
await set(`model:${name}`, blob);
}
export async function loadModelBlob(name: string): Promise {
return get(`model:${name}`);
}
Privacy-first defaults and UI
To claim the privacy-first angle credibly, ship defaults that minimize data exfiltration:
- Local-only by default: disable any cloud inference until users opt in.
- Explainability: show when models are loaded and where they live on disk.
- Network rules: intercept outgoing requests used by legacy analytics or telemetry and provide a single opt-in toggle.
- Permission minimization: ask only for permissions required for the feature set.
Type-safe messaging model
Messaging is the spine of your extension. Declare discriminated unions and helpers so TypeScript enforces correct message shapes.
// src/messages.ts
export type RequestMessage =
| { type: 'INFER'; id: string; prompt: string }
| { type: 'MODEL_DOWNLOAD'; modelId: string }
| { type: 'MODEL_REMOVE'; modelId: string };
export type ResponseMessage =
| { type: 'INFER_RESULT'; id: string; text: string }
| { type: 'MODEL_STATUS'; modelId: string; status: 'ready' | 'downloading' | 'error' };
Use a wrapper to send messages and handle timeouts to cope with worker restarts.
Bundling and dev cycle tips
- Use esbuild or Vite for fast dev builds and source maps.
- Keep an unpacked extension loader script in package.json for quick reloads (Chrome's "Load unpacked").
- Watch service worker logs in
chrome://extensions→ Inspect service worker for background console. - Unit-test business logic outside extension APIs; stub browser.* with
webextension-polyfillmocks for CI.
Performance & resource considerations
On-device inference is constrained by memory, CPU, and battery. Optimizations that matter:
- Choose compact models (quantized formats like 4-bit GGUF where possible).
- Favor streaming and chunked token generation to reduce peak memory.
- Use WebGPU-backed WASM runtimes to offload compute to GPU (if available).
- Provide fallback: if a device can't run the model, gracefully fall back to a lightweight heuristic or show local-only disabled state.
Security considerations
- Be mindful of executing arbitrary WASM: sandbox model runtime and validate inputs. Even local models can be abused if the runtime mismanages memory.
- Significant model blobs should be verified (checksum) on download so users don't get tampered files.
- Avoid eval() or dynamic script imports from remote sources unless the user explicitly enables a cloud-backed mode.
Testing & debugging tips for MV3
- Use unpacked extension loading and keep your source maps referencing the dist/ bundle for easier stepping.
- Inspect service worker logs: Chrome provides an inspector for the worker in the Extensions page.
- For offscreen documents, open the offscreen URL directly (during dev) to observe the console and performance profile.
- Automate end-to-end tests by spawning a headful Chromium with the extension loaded and driving actions with Puppeteer.
Example: Incremental migration checklist (1-3 week sprints)
- Week 1: Audit, add TypeScript + bundler, migrate background orchestration to TS.
- Week 2: Migrate manifest to MV3, implement typed messaging, add an offscreen stub with a mock runtime.
- Week 3: Integrate real WASM runtime, add model download UI, implement privacy defaults and telemetry opt-out.
Real-world considerations & case study notes
In 2025 several privacy-first mobile browsers (for example, some emerging browser projects) shipped local-AI features to demonstrate the viability of on-device LLMs. Extension authors can learn from those apps’ UX patterns: explicit model management, transparent UI about where data stays, and graceful fallback to cloud-backed inference only when users opt in.
"Ship small, observable changes. Convert background logic to TypeScript first — then add the offscreen runtime when the messaging contract is stable." — recommended migration pattern
Common migration pitfalls and how to avoid them
- Assuming service worker permanence: service workers are ephemeral. Persist state in storage and design for restart.
- Loading large models synchronously: always download models asynchronously and show progress; prefer streaming chunks into IndexedDB.
- Not validating model sources: require checksums/signatures and show provenance to users.
- Excessive permissions: avoid broad host permissions; use programmatic injection with scripting API where possible.
Actionable code snippets — typed sendRequest helper
Use a typed wrapper that sets a timeout and enforces response types.
// src/utils/messaging.ts
import browser from 'webextension-polyfill';
import type { RequestMessage, ResponseMessage } from '../messages';
export async function sendRequest(msg: RequestMessage, timeout = 10000): Promise {
return new Promise((resolve, reject) => {
const timer = setTimeout(() => reject(new Error('timeout')), timeout);
browser.runtime.sendMessage(msg).then((res) => {
clearTimeout(timer);
resolve(res as T);
}).catch((err) => {
clearTimeout(timer);
reject(err);
});
});
}
Wrap-up: the practical benefits you'll achieve
Migrating your extension to TypeScript and enabling local AI via a well-architected offscreen runtime will give you:
- Stronger developer ergonomics and fewer runtime errors with TypeScript
- Higher user trust by keeping sensitive text on-device
- Lower operational costs by avoiding per-request LLM API bills
- Competitive differentiation in a market that values privacy-first AI
Next steps & resources
To implement this migration in your codebase:
- Run a permissions and network audit of your current extension.
- Create a types-first branch and add tsconfig + bundler to iterate quickly.
- Prototype offscreen inference with a small WASM runtime and a toy model to validate UX and memory profile.
Call to action
Ready to start? Clone our migration starter (TypeScript + MV3 + offscreen boilerplate), run the checklist in your repository, and share benchmarks on memory and latency. If you want a tailored checklist for your extension, paste the list of features and I’ll suggest a concrete step-by-step migration plan you can run in sprints. Ship safer AI — locally and privately.
Related Reading
- How to Use Social Platform Features to Land Sponsorships Faster
- The Chemistry Behind a Great Cup: What Coffee Experts Mean by ‘Balanced’ and ‘Layered’
- Celebrity Duos Who Launched Their First Podcast: Success Stories and First-Season Benchmarks
- Wearables in Beauty: Natural Cycles’ Wristband and the Wave of Health-First Devices
- What Startup Talent Churn in AI Labs Signals for Quantum Teams
Related Topics
Unknown
Contributor
Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.
Up Next
More stories handpicked for you
PWA + Local AI: Shipping an Offline Assistant for Android and iOS with TypeScript
Client-Side NLP with TypeScript and WASM: Practical Patterns
Build a Local LLM-Powered Browser Feature with TypeScript (no server required)
A TypeScript dev’s guide to building low-footprint apps for older Android devices
Composable micro services in TypeScript for micro apps: patterns and pitfalls
From Our Network
Trending stories across our publication group