From Chrome Extension to Local AI Extension: A Migration Playbook in TypeScript
extensionsmigrationprivacy

From Chrome Extension to Local AI Extension: A Migration Playbook in TypeScript

UUnknown
2026-02-25
11 min read
Advertisement

A practical, incremental playbook to migrate Chrome extensions to privacy-first local-AI WebExtensions using TypeScript and modern web APIs.

Hook: Why your Chrome extension should become a privacy-preserving local-AI extension — and why now

If you're maintaining a Chrome extension that processes user text, stores preferences, or automates workflows, you probably face three recurring headaches: privacy concerns, third-party API costs, and the growing expectation that AI features run locally. In 2025–2026 the shift toward on-device LLMs and improved browser capabilities (WebAssembly threads, WebGPU, SharedArrayBuffer support behind COOP/COEP) made local-AI extensions realistic for many use cases. This playbook shows a practical, incremental migration path from a traditional Chrome extension to a TypeScript-based, privacy-first local-AI WebExtension.

What you’ll get from this guide

  • Concrete migration phases (audit → TS toolchain → MV3 service worker → local model runtime)
  • TypeScript examples for messaging, model management, and safe I/O
  • Architectural patterns for running inference inside a browser extension
  • Privacy and UX considerations so the extension never leaks user text

By late 2025 and into 2026 browser vendors expanded support for capabilities that make local AI feasible inside extensions: stable WebGPU and better WebAssembly threading, broader SharedArrayBuffer availability under proper COOP/COEP headers, and a growing ecosystem of WASM-based model runtimes (small LLMs compiled to WASM/gguf). Mobile-first browsers and privacy-focused projects — some shipping on-device LLM support — proved the model. For extension authors, this means you can now build feature-rich, offline-first AI experiences that never leave the user's machine.

Migration overview — incremental, safe, reversible

The safest migrations are incremental. Below is a three-phase plan you can follow in parallel branches or small sprints. Each phase produces a usable artifact and keeps the extension functional for existing users.

Phase 0 — Audit & goals

  • Inventory all places where user data leaves the extension (analytics, API calls, update checks).
  • Identify feature candidates for local inference (summaries, completions, classifiers).
  • Set success criteria: e.g., “Local inference latency & memory fit within the browser for target devices.”
  • Decide model strategy: bundled small model, user-downloaded model, or optional connect-to-cloud.

Phase 1 — TypeScript + Modern Tooling (low risk)

Convert code to TypeScript gradually. The goal is to add static safety and improve DX without changing runtime behavior.

  1. Initialize toolchain: tsconfig, bundler (esbuild/Vite/Rollup), and type libs.
  2. Use allowJs and checkJs to migrate files one-by-one.
  3. Install WebExtension types and the promise-based polyfill: @types/chrome and webextension-polyfill (with TypeScript types).
  4. Keep manifest.json in sync — you’ll migrate to MV3 in Phase 2.

Phase 2 — Manifest V3 & Service Worker conversion

Manifest V3 (MV3) uses a background service worker instead of a persistent background page. The worker is ephemeral, so design for lifecycle events and move long-running work to an offscreen document or content context that supports WebGPU/WebAssembly as needed.

Phase 3 — Local-AI runtime integration

Add a model manager that downloads/installs models into IndexedDB or Cache Storage, performs inference via a WASM runtime using WebGPU/WebGL/WebNN, and exposes a typed messaging API for UI and content scripts.

Concrete setup: TypeScript + bundler + typings

Start by adding TypeScript with a conservative config so you can convert files progressively.

// tsconfig.json (tailored for extensions)
{
  "compilerOptions": {
    "target": "ES2020",
    "module": "ESNext",
    "moduleResolution": "Node",
    "lib": ["DOM", "ES2020", "WebWorker"],
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true,
    "resolveJsonModule": true,
    "rootDir": "src",
    "outDir": "dist",
    "allowJs": true,
    "checkJs": false
  },
  "include": ["src/**/*"]
}

Add essential dependencies in package.json:

  • webextension-polyfill — unify chrome/browser APIs with Promise-friendly calls
  • @types/chrome — helpful for legacy chrome.* types
  • Bundler: esbuild or vite for fast builds

Manifest v3 basics (example)

Convert your manifest to MV3. Note: a service worker can't access DOM APIs, so move DOM work into content scripts or an offscreen document.

// manifest.json (MV3)
{
  "manifest_version": 3,
  "name": "My Local-AI Extension",
  "version": "1.0.0",
  "permissions": ["storage", "scripting", "offscreen"],
  "background": {
    "service_worker": "dist/background.js"
  },
  "action": {
    "default_popup": "popup.html"
  },
  "content_scripts": [
    {
      "matches": [""],
      "js": ["dist/content.js"],
      "run_at": "document_idle"
    }
  ]
}

Service worker patterns in TypeScript

Service workers in MV3 are event-driven. Use typed message handlers and keep state in IndexedDB/Cache to survive worker restarts.

// src/background.ts
import browser from 'webextension-polyfill';

type InferenceRequest = { id: string; prompt: string };
type InferenceResult = { id: string; text: string };

// Lightweight typed message dispatcher
browser.runtime.onMessage.addListener(async (msg: any, sender) => {
  if (msg?.type === 'INFER') {
    const req = msg as InferenceRequest;
    // Dispatch to model manager (keeps model lifecycle outside worker memory if needed)
    const text = await ModelManager.infer(req.prompt);
    const res: InferenceResult = { id: req.id, text };
    return Promise.resolve(res);
  }
});

// Basic model manager skeleton
class ModelManager {
  static async infer(prompt: string): Promise {
    // Ensure model loaded (may lazy-load WASM into an OffscreenDocument)
    // For demo, return a dummy response
    return `LocalAI response for: ${prompt}`;
  }
}

Where should inference run?

Choose where to run heavy inference carefully:

  • Content Script — Pros: access to DOM and WebGPU in some cases. Cons: instantiates per-tab and may duplicate memory across tabs.
  • Offscreen Document — Pros: long-running, can host WebGL/WebGPU context and DOM APIs; intentionally designed for MV3 long work. Cons: involves more plumbing and permissions.
  • Service Worker — Pros: lightweight event handling, ideal for orchestration. Cons: no DOM/WebGPU access; ephemeral lifetime.

Best practice: keep the worker as the orchestrator and perform actual WASM/WebGPU inference inside an offscreen document or a privileged extension page. Use typed messages to control it.

Example: Creating an offscreen document to run a WASM runtime

Chrome provides an offscreen API to create a hidden document for long-running tasks that need a DOM or GPU. Use this to instantiate a WebAssembly LLM runtime that expects WebGL/WebGPU and threads.

// background.ts (continuation)
async function ensureOffscreen() {
  if (!await browser.offscreen.hasDocument()) {
    await browser.offscreen.createDocument({
      url: 'offscreen.html',
      reasons: ['WEB_RTC', 'AUDIO_PLAYBACK', 'CUSTOM_TYPED_REASON']
    });
  }
}

browser.runtime.onMessage.addListener(async (msg) => {
  if (msg?.type === 'START_INFER') {
    await ensureOffscreen();
    // send message to offscreen to start model load/inference
    await browser.runtime.sendMessage({ type: 'OFFSCREEN_INFER', prompt: msg.prompt });
  }
});

offscreen.html: the inference host

The offscreen document can import a bundled JS module that initializes the WASM runtime using WebGPU or WebGL. Keep this file small and tightly typed.

// src/offscreen.ts
import browser from 'webextension-polyfill';

browser.runtime.onMessage.addListener(async (msg) => {
  if (msg.type === 'OFFSCREEN_INFER') {
    const result = await LocalWasmRuntime.infer(msg.prompt);
    // send result back to the service worker or content script
    await browser.runtime.sendMessage({ type: 'INFER_RESULT', result });
  }
});

class LocalWasmRuntime {
  static async init() {
    // fetch model from IndexedDB/cache and instantiate WASM with WebGPU
  }
  static async infer(prompt: string) {
    // call runtime and return the text
    return `Simulated: ${prompt}`;
  }
}

Managing model files: storage & privacy

Treat model files like any large binary asset. Store them in Cache Storage or IndexedDB with explicit user consent. Provide a clear UI to:

  • Download and remove models locally
  • Restrict model usage to offline/local-only mode
  • Show disk and memory footprint estimates before download

Example: saving a model chunk into IndexedDB using idb-keyval or a small wrapper.

// src/modelStore.ts
import { set, get } from 'idb-keyval';

export async function saveModelBlob(name: string, blob: Blob) {
  await set(`model:${name}`, blob);
}

export async function loadModelBlob(name: string): Promise {
  return get(`model:${name}`);
}

Privacy-first defaults and UI

To claim the privacy-first angle credibly, ship defaults that minimize data exfiltration:

  • Local-only by default: disable any cloud inference until users opt in.
  • Explainability: show when models are loaded and where they live on disk.
  • Network rules: intercept outgoing requests used by legacy analytics or telemetry and provide a single opt-in toggle.
  • Permission minimization: ask only for permissions required for the feature set.

Type-safe messaging model

Messaging is the spine of your extension. Declare discriminated unions and helpers so TypeScript enforces correct message shapes.

// src/messages.ts
export type RequestMessage =
  | { type: 'INFER'; id: string; prompt: string }
  | { type: 'MODEL_DOWNLOAD'; modelId: string }
  | { type: 'MODEL_REMOVE'; modelId: string };

export type ResponseMessage =
  | { type: 'INFER_RESULT'; id: string; text: string }
  | { type: 'MODEL_STATUS'; modelId: string; status: 'ready' | 'downloading' | 'error' };

Use a wrapper to send messages and handle timeouts to cope with worker restarts.

Bundling and dev cycle tips

  • Use esbuild or Vite for fast dev builds and source maps.
  • Keep an unpacked extension loader script in package.json for quick reloads (Chrome's "Load unpacked").
  • Watch service worker logs in chrome://extensions → Inspect service worker for background console.
  • Unit-test business logic outside extension APIs; stub browser.* with webextension-polyfill mocks for CI.

Performance & resource considerations

On-device inference is constrained by memory, CPU, and battery. Optimizations that matter:

  • Choose compact models (quantized formats like 4-bit GGUF where possible).
  • Favor streaming and chunked token generation to reduce peak memory.
  • Use WebGPU-backed WASM runtimes to offload compute to GPU (if available).
  • Provide fallback: if a device can't run the model, gracefully fall back to a lightweight heuristic or show local-only disabled state.

Security considerations

  • Be mindful of executing arbitrary WASM: sandbox model runtime and validate inputs. Even local models can be abused if the runtime mismanages memory.
  • Significant model blobs should be verified (checksum) on download so users don't get tampered files.
  • Avoid eval() or dynamic script imports from remote sources unless the user explicitly enables a cloud-backed mode.

Testing & debugging tips for MV3

  • Use unpacked extension loading and keep your source maps referencing the dist/ bundle for easier stepping.
  • Inspect service worker logs: Chrome provides an inspector for the worker in the Extensions page.
  • For offscreen documents, open the offscreen URL directly (during dev) to observe the console and performance profile.
  • Automate end-to-end tests by spawning a headful Chromium with the extension loaded and driving actions with Puppeteer.

Example: Incremental migration checklist (1-3 week sprints)

  1. Week 1: Audit, add TypeScript + bundler, migrate background orchestration to TS.
  2. Week 2: Migrate manifest to MV3, implement typed messaging, add an offscreen stub with a mock runtime.
  3. Week 3: Integrate real WASM runtime, add model download UI, implement privacy defaults and telemetry opt-out.

Real-world considerations & case study notes

In 2025 several privacy-first mobile browsers (for example, some emerging browser projects) shipped local-AI features to demonstrate the viability of on-device LLMs. Extension authors can learn from those apps’ UX patterns: explicit model management, transparent UI about where data stays, and graceful fallback to cloud-backed inference only when users opt in.

"Ship small, observable changes. Convert background logic to TypeScript first — then add the offscreen runtime when the messaging contract is stable." — recommended migration pattern

Common migration pitfalls and how to avoid them

  • Assuming service worker permanence: service workers are ephemeral. Persist state in storage and design for restart.
  • Loading large models synchronously: always download models asynchronously and show progress; prefer streaming chunks into IndexedDB.
  • Not validating model sources: require checksums/signatures and show provenance to users.
  • Excessive permissions: avoid broad host permissions; use programmatic injection with scripting API where possible.

Actionable code snippets — typed sendRequest helper

Use a typed wrapper that sets a timeout and enforces response types.

// src/utils/messaging.ts
import browser from 'webextension-polyfill';
import type { RequestMessage, ResponseMessage } from '../messages';

export async function sendRequest(msg: RequestMessage, timeout = 10000): Promise {
  return new Promise((resolve, reject) => {
    const timer = setTimeout(() => reject(new Error('timeout')), timeout);
    browser.runtime.sendMessage(msg).then((res) => {
      clearTimeout(timer);
      resolve(res as T);
    }).catch((err) => {
      clearTimeout(timer);
      reject(err);
    });
  });
}

Wrap-up: the practical benefits you'll achieve

Migrating your extension to TypeScript and enabling local AI via a well-architected offscreen runtime will give you:

  • Stronger developer ergonomics and fewer runtime errors with TypeScript
  • Higher user trust by keeping sensitive text on-device
  • Lower operational costs by avoiding per-request LLM API bills
  • Competitive differentiation in a market that values privacy-first AI

Next steps & resources

To implement this migration in your codebase:

  1. Run a permissions and network audit of your current extension.
  2. Create a types-first branch and add tsconfig + bundler to iterate quickly.
  3. Prototype offscreen inference with a small WASM runtime and a toy model to validate UX and memory profile.

Call to action

Ready to start? Clone our migration starter (TypeScript + MV3 + offscreen boilerplate), run the checklist in your repository, and share benchmarks on memory and latency. If you want a tailored checklist for your extension, paste the list of features and I’ll suggest a concrete step-by-step migration plan you can run in sprints. Ship safer AI — locally and privately.

Advertisement

Related Topics

#extensions#migration#privacy
U

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-02-25T01:30:13.794Z