extensionsmigrationprivacy

From Chrome Extension to Local AI Extension: A Migration Playbook in TypeScript

UUnknown

2026-02-25

11 min read

A practical, incremental playbook to migrate Chrome extensions to privacy-first local-AI WebExtensions using TypeScript and modern web APIs.

Hook: Why your Chrome extension should become a privacy-preserving local-AI extension — and why now

If you're maintaining a Chrome extension that processes user text, stores preferences, or automates workflows, you probably face three recurring headaches: privacy concerns, third-party API costs, and the growing expectation that AI features run locally. In 2025–2026 the shift toward on-device LLMs and improved browser capabilities (WebAssembly threads, WebGPU, SharedArrayBuffer support behind COOP/COEP) made local-AI extensions realistic for many use cases. This playbook shows a practical, incremental migration path from a traditional Chrome extension to a TypeScript-based, privacy-first local-AI WebExtension.

What you’ll get from this guide

Concrete migration phases (audit → TS toolchain → MV3 service worker → local model runtime)
TypeScript examples for messaging, model management, and safe I/O
Architectural patterns for running inference inside a browser extension
Privacy and UX considerations so the extension never leaks user text

Context & 2026 trends

By late 2025 and into 2026 browser vendors expanded support for capabilities that make local AI feasible inside extensions: stable WebGPU and better WebAssembly threading, broader SharedArrayBuffer availability under proper COOP/COEP headers, and a growing ecosystem of WASM-based model runtimes (small LLMs compiled to WASM/gguf). Mobile-first browsers and privacy-focused projects — some shipping on-device LLM support — proved the model. For extension authors, this means you can now build feature-rich, offline-first AI experiences that never leave the user's machine.

Migration overview — incremental, safe, reversible

The safest migrations are incremental. Below is a three-phase plan you can follow in parallel branches or small sprints. Each phase produces a usable artifact and keeps the extension functional for existing users.

Phase 0 — Audit & goals

Inventory all places where user data leaves the extension (analytics, API calls, update checks).
Identify feature candidates for local inference (summaries, completions, classifiers).
Set success criteria: e.g., “Local inference latency & memory fit within the browser for target devices.”
Decide model strategy: bundled small model, user-downloaded model, or optional connect-to-cloud.

Phase 1 — TypeScript + Modern Tooling (low risk)

Convert code to TypeScript gradually. The goal is to add static safety and improve DX without changing runtime behavior.

Initialize toolchain: tsconfig, bundler (esbuild/Vite/Rollup), and type libs.
Use allowJs and checkJs to migrate files one-by-one.
Install WebExtension types and the promise-based polyfill: @types/chrome and webextension-polyfill (with TypeScript types).
Keep manifest.json in sync — you’ll migrate to MV3 in Phase 2.

Phase 2 — Manifest V3 & Service Worker conversion

Manifest V3 (MV3) uses a background service worker instead of a persistent background page. The worker is ephemeral, so design for lifecycle events and move long-running work to an offscreen document or content context that supports WebGPU/WebAssembly as needed.

Phase 3 — Local-AI runtime integration

Add a model manager that downloads/installs models into IndexedDB or Cache Storage, performs inference via a WASM runtime using WebGPU/WebGL/WebNN, and exposes a typed messaging API for UI and content scripts.

Concrete setup: TypeScript + bundler + typings

Start by adding TypeScript with a conservative config so you can convert files progressively.

// tsconfig.json (tailored for extensions)
{
  "compilerOptions": {
    "target": "ES2020",
    "module": "ESNext",
    "moduleResolution": "Node",
    "lib": ["DOM", "ES2020", "WebWorker"],
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true,
    "resolveJsonModule": true,
    "rootDir": "src",
    "outDir": "dist",
    "allowJs": true,
    "checkJs": false
  },
  "include": ["src/**/*"]
}

Add essential dependencies in package.json:

webextension-polyfill — unify chrome/browser APIs with Promise-friendly calls
@types/chrome — helpful for legacy chrome.* types
Bundler: esbuild or vite for fast builds

Manifest v3 basics (example)

Convert your manifest to MV3. Note: a service worker can't access DOM APIs, so move DOM work into content scripts or an offscreen document.

// manifest.json (MV3)
{
  "manifest_version": 3,
  "name": "My Local-AI Extension",
  "version": "1.0.0",
  "permissions": ["storage", "scripting", "offscreen"],
  "background": {
    "service_worker": "dist/background.js"
  },
  "action": {
    "default_popup": "popup.html"
  },
  "content_scripts": [
    {
      "matches": [""],
      "js": ["dist/content.js"],
      "run_at": "document_idle"
    }
  ]
}

Service worker patterns in TypeScript

Service workers in MV3 are event-driven. Use typed message handlers and keep state in IndexedDB/Cache to survive worker restarts.

// src/background.ts
import browser from 'webextension-polyfill';

type InferenceRequest = { id: string; prompt: string };
type InferenceResult = { id: string; text: string };

// Lightweight typed message dispatcher
browser.runtime.onMessage.addListener(async (msg: any, sender) => {
  if (msg?.type === 'INFER') {
    const req = msg as InferenceRequest;
    // Dispatch to model manager (keeps model lifecycle outside worker memory if needed)
    const text = await ModelManager.infer(req.prompt);
    const res: InferenceResult = { id: req.id, text };
    return Promise.resolve(res);
  }
});

// Basic model manager skeleton
class ModelManager {
  static async infer(prompt: string): Promise {
    // Ensure model loaded (may lazy-load WASM into an OffscreenDocument)
    // For demo, return a dummy response
    return `LocalAI response for: ${prompt}`;
  }
}

Where should inference run?

Choose where to run heavy inference carefully:

Content Script — Pros: access to DOM and WebGPU in some cases. Cons: instantiates per-tab and may duplicate memory across tabs.
Offscreen Document — Pros: long-running, can host WebGL/WebGPU context and DOM APIs; intentionally designed for MV3 long work. Cons: involves more plumbing and permissions.
Service Worker — Pros: lightweight event handling, ideal for orchestration. Cons: no DOM/WebGPU access; ephemeral lifetime.

Best practice: keep the worker as the orchestrator and perform actual WASM/WebGPU inference inside an offscreen document or a privileged extension page. Use typed messages to control it.

Example: Creating an offscreen document to run a WASM runtime

Chrome provides an offscreen API to create a hidden document for long-running tasks that need a DOM or GPU. Use this to instantiate a WebAssembly LLM runtime that expects WebGL/WebGPU and threads.

// background.ts (continuation)
async function ensureOffscreen() {
  if (!await browser.offscreen.hasDocument()) {
    await browser.offscreen.createDocument({
      url: 'offscreen.html',
      reasons: ['WEB_RTC', 'AUDIO_PLAYBACK', 'CUSTOM_TYPED_REASON']
    });
  }
}

browser.runtime.onMessage.addListener(async (msg) => {
  if (msg?.type === 'START_INFER') {
    await ensureOffscreen();
    // send message to offscreen to start model load/inference
    await browser.runtime.sendMessage({ type: 'OFFSCREEN_INFER', prompt: msg.prompt });
  }
});

offscreen.html: the inference host

The offscreen document can import a bundled JS module that initializes the WASM runtime using WebGPU or WebGL. Keep this file small and tightly typed.

// src/offscreen.ts
import browser from 'webextension-polyfill';

browser.runtime.onMessage.addListener(async (msg) => {
  if (msg.type === 'OFFSCREEN_INFER') {
    const result = await LocalWasmRuntime.infer(msg.prompt);
    // send result back to the service worker or content script
    await browser.runtime.sendMessage({ type: 'INFER_RESULT', result });
  }
});

class LocalWasmRuntime {
  static async init() {
    // fetch model from IndexedDB/cache and instantiate WASM with WebGPU
  }
  static async infer(prompt: string) {
    // call runtime and return the text
    return `Simulated: ${prompt}`;
  }
}

Managing model files: storage & privacy

Treat model files like any large binary asset. Store them in Cache Storage or IndexedDB with explicit user consent. Provide a clear UI to:

Download and remove models locally
Restrict model usage to offline/local-only mode
Show disk and memory footprint estimates before download

Example: saving a model chunk into IndexedDB using idb-keyval or a small wrapper.

// src/modelStore.ts
import { set, get } from 'idb-keyval';

export async function saveModelBlob(name: string, blob: Blob) {
  await set(`model:${name}`, blob);
}

export async function loadModelBlob(name: string): Promise {
  return get(`model:${name}`);
}

Privacy-first defaults and UI

To claim the privacy-first angle credibly, ship defaults that minimize data exfiltration:

Local-only by default: disable any cloud inference until users opt in.
Explainability: show when models are loaded and where they live on disk.
Network rules: intercept outgoing requests used by legacy analytics or telemetry and provide a single opt-in toggle.
Permission minimization: ask only for permissions required for the feature set.

Type-safe messaging model

Messaging is the spine of your extension. Declare discriminated unions and helpers so TypeScript enforces correct message shapes.

// src/messages.ts
export type RequestMessage =
  | { type: 'INFER'; id: string; prompt: string }
  | { type: 'MODEL_DOWNLOAD'; modelId: string }
  | { type: 'MODEL_REMOVE'; modelId: string };

export type ResponseMessage =
  | { type: 'INFER_RESULT'; id: string; text: string }
  | { type: 'MODEL_STATUS'; modelId: string; status: 'ready' | 'downloading' | 'error' };

Use a wrapper to send messages and handle timeouts to cope with worker restarts.

Bundling and dev cycle tips

Use esbuild or Vite for fast dev builds and source maps.
Keep an unpacked extension loader script in package.json for quick reloads (Chrome's "Load unpacked").
Watch service worker logs in chrome://extensions → Inspect service worker for background console.
Unit-test business logic outside extension APIs; stub browser.* with webextension-polyfill mocks for CI.

Performance & resource considerations

On-device inference is constrained by memory, CPU, and battery. Optimizations that matter:

Choose compact models (quantized formats like 4-bit GGUF where possible).
Favor streaming and chunked token generation to reduce peak memory.
Use WebGPU-backed WASM runtimes to offload compute to GPU (if available).
Provide fallback: if a device can't run the model, gracefully fall back to a lightweight heuristic or show local-only disabled state.

Security considerations

Be mindful of executing arbitrary WASM: sandbox model runtime and validate inputs. Even local models can be abused if the runtime mismanages memory.
Significant model blobs should be verified (checksum) on download so users don't get tampered files.
Avoid eval() or dynamic script imports from remote sources unless the user explicitly enables a cloud-backed mode.

Testing & debugging tips for MV3

Use unpacked extension loading and keep your source maps referencing the dist/ bundle for easier stepping.
Inspect service worker logs: Chrome provides an inspector for the worker in the Extensions page.
For offscreen documents, open the offscreen URL directly (during dev) to observe the console and performance profile.
Automate end-to-end tests by spawning a headful Chromium with the extension loaded and driving actions with Puppeteer.

Example: Incremental migration checklist (1-3 week sprints)

Week 1: Audit, add TypeScript + bundler, migrate background orchestration to TS.
Week 2: Migrate manifest to MV3, implement typed messaging, add an offscreen stub with a mock runtime.
Week 3: Integrate real WASM runtime, add model download UI, implement privacy defaults and telemetry opt-out.

Real-world considerations & case study notes

In 2025 several privacy-first mobile browsers (for example, some emerging browser projects) shipped local-AI features to demonstrate the viability of on-device LLMs. Extension authors can learn from those apps’ UX patterns: explicit model management, transparent UI about where data stays, and graceful fallback to cloud-backed inference only when users opt in.

"Ship small, observable changes. Convert background logic to TypeScript first — then add the offscreen runtime when the messaging contract is stable." — recommended migration pattern

Common migration pitfalls and how to avoid them

Assuming service worker permanence: service workers are ephemeral. Persist state in storage and design for restart.
Loading large models synchronously: always download models asynchronously and show progress; prefer streaming chunks into IndexedDB.
Not validating model sources: require checksums/signatures and show provenance to users.
Excessive permissions: avoid broad host permissions; use programmatic injection with scripting API where possible.

Actionable code snippets — typed sendRequest helper

Use a typed wrapper that sets a timeout and enforces response types.

// src/utils/messaging.ts
import browser from 'webextension-polyfill';
import type { RequestMessage, ResponseMessage } from '../messages';

export async function sendRequest(msg: RequestMessage, timeout = 10000): Promise {
  return new Promise((resolve, reject) => {
    const timer = setTimeout(() => reject(new Error('timeout')), timeout);
    browser.runtime.sendMessage(msg).then((res) => {
      clearTimeout(timer);
      resolve(res as T);
    }).catch((err) => {
      clearTimeout(timer);
      reject(err);
    });
  });
}

Wrap-up: the practical benefits you'll achieve

Migrating your extension to TypeScript and enabling local AI via a well-architected offscreen runtime will give you:

Stronger developer ergonomics and fewer runtime errors with TypeScript
Higher user trust by keeping sensitive text on-device
Lower operational costs by avoiding per-request LLM API bills
Competitive differentiation in a market that values privacy-first AI

Next steps & resources

To implement this migration in your codebase:

Run a permissions and network audit of your current extension.
Create a types-first branch and add tsconfig + bundler to iterate quickly.
Prototype offscreen inference with a small WASM runtime and a toy model to validate UX and memory profile.

Call to action

Ready to start? Clone our migration starter (TypeScript + MV3 + offscreen boilerplate), run the checklist in your repository, and share benchmarks on memory and latency. If you want a tailored checklist for your extension, paste the list of features and I’ll suggest a concrete step-by-step migration plan you can run in sprints. Ship safer AI — locally and privately.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.