architectureedgecost

Serverless micro apps with TypeScript and edge AI: cost, latency, and privacy trade-offs

UUnknown

2026-02-17

12 min read

Compare serverless vs on-device edge AI for TypeScript micro apps—cost, latency, and privacy trade-offs with actionable migration steps.

Hook: You want fast, private micro apps — but at what cost?

If you're migrating a JavaScript micro app (think: a dining recommender that friends use in a group chat) to TypeScript, you face three painful trade-offs: cost, latency, and privacy. Do you route every query to a cheap serverless function that calls a managed LLM, or do you run the model locally on a Raspberry Pi with the new AI HAT and keep data on-device? In 2026 the lines blur: web-scale serverless platforms trimmed cold-starts, while hobbyist hardware ships NPUs and reliable on-device inference. This article gives a practical, action-first comparison and helps you choose a path for incremental TypeScript adoption.

Executive summary (most important first)

Choose serverless when you need low developer friction, predictable CI/CD, and easy model upgrades — at the cost of per-request cloud fees and external data flows. Choose on-device edge AI (Raspberry Pi + AI HAT) when privacy and bounded latency matter, and you can accept a higher upfront hardware and ops cost.

Below you'll find: a practical cost model, latency measurements and guidance, TypeScript-first deployment patterns for both approaches, and a migration checklist to incrementally move a JS micro app to TypeScript while measuring impact in production.

Why this matters in 2026

By late 2025 and into 2026 we saw two important shifts that affect micro apps:

Edge hardware matured: Raspberry Pi 5 and dedicated AI HATs (NPU-equipped boards announced in 2024–2025) make on-device inference feasible for small LLMs and retrieval-augmented generation (RAG) workflows.
Serverless platforms evolved: major providers optimized cold starts and added predictable pricing tiers for edge functions; smaller functions now approach sub-50ms warm latency in many regions.

Those changes mean the decision to go serverless vs edge is no longer purely academic — it's measurable. We'll use a typical micro app (a dining recommender) as the running example.

Architecture overview: two patterns

Pattern A — Serverless micro app (TypeScript)

Flow: client -> edge/serverless function (Vercel, Cloudflare Workers, AWS Lambda@Edge, Deno Deploy) -> (optional) LLM API or managed model -> response to client.

Strengths:

Developer velocity: deploy with git pushes, familiar Node/Deno toolchains, and declarative routing.
Centralized updates: swap to a new model or prompt without touching devices.
Observability & logging: integrated metrics, tracing, and API dashboards.

Weaknesses:

Ongoing costs: compute + managed LLM API calls + egress.
Latency variability: network round trips and API queue times.
Privacy risk: user data crosses networks and is processed by third parties unless you design otherwise.

Pattern B — On-device edge AI (Raspberry Pi + AI HAT)

Flow: client -> local micro service on Pi -> on-device model inference -> local response (optionally sync back anonymized signals to a server).

Strengths:

Low and consistent latency for local users (no network round trip).
Strong privacy: data stays on the owner’s device; perfect for personal micro apps or small groups.
Fixed marginal cost: after hardware purchase, per-request marginal cost is near zero.

Weaknesses:

Ops overhead: provisioning, OS updates, hardware failures, remote management.
Model size and capability limits: very large models aren’t practical; you rely on optimized, quantized models.
Deployment complexity: cross-compilation, WASM builds, or native binaries for the HAT.

Case study: Dining recommender — assumptions

We'll use a small but realistic workload to compare costs and latency. Tweak numbers for your use case.

Active users (monthly): 1,000
Average requests per user per month: 10 (10,000 requests/month)
Average serverless function runtime: 100ms at 256MB (if using cloud model), or 50ms at 128MB for edge workers
Local inference latency on Pi AI HAT: 200–800ms depending on quantized model and NPU support
Model type: small LLM (7B quantized) or a retrieval + small transformer for recommendations

Cost model (practical formulas and sample numbers)

We'll separate costs into serverless (compute + LLM API) and edge (hardware amortized + energy + ops).

Serverless cost formula (monthly)

serverless_monthly = request_cost + compute_cost + network_egress + model_api_cost

request_cost = requests * request_unit_price (some providers include this)
compute_cost = sum_over_invocations(GB-seconds * GBs_unit_price)
network_egress = GB_out * egress_price
model_api_cost = model_calls * cost_per_call (or per-token)

Example (conservative illustrative numbers): 10,000 requests/month; each request calls a managed LLM costing $0.002 per call (prompt+completion) and compute cost = $0.05 per million GB-seconds negligible at this scale.

model_api_cost = 10,000 * $0.002 = $20/month. Add a small serverless compute bill and egress — total around $25–$50/month for this scale.

Edge cost formula (monthly amortized)

edge_monthly = hardware_amortization + energy + ops_time_cost + occasional_model_update_cost

hardware_amortization = (Pi + AI HAT + SD) / amortization_months (e.g., $230 / 36 = $6.4/month)
energy ≈ Watts * hours_per_day * days * energy_price
ops_time_cost = time_spent_by_devops * hourly_rate / months

Example: Pi 5 + AI HAT = $230 upfront, amortized over 36 months → $6.4/month. Energy for 5W average at $0.15/kWh = negligible (~$0.5/month). Ops time to update, monitor and handle failures might be 1–3 hours/month (~$100–300/month labor if externalized). If you self-maintain, ops cost is mostly your time.

So for a single Pi serving ~10 users (personal micro app), edge is cheap per user. For 1,000 users, you need dozens of devices or a different architecture — at that scale serverless is usually cheaper and simpler.

Latency: measured trade-offs

Latency depends on the dominant factor: network vs model runtime.

Serverless (edge function + cloud LLM): client -> edge function (~5–50ms) + API call (~50–300ms or more) = 55–350ms typical for well-provisioned flows in 2026.
On-device (Pi + HAT): local request handling (~1–10ms) + model inference (200–800ms for quantized 7B on NPU) = 201–810ms.

In other words: serverless can be faster for small outputs if it's calling an ultra-fast managed model with regional endpoints. But edge provides more consistent latency and avoids network tails. For interactive apps where sub-200ms is required, serverless with highly optimized models may win today; for privacy-sensitive or offline use cases, edge is better even with slightly higher latency.

Privacy and compliance

Privacy often dictates architecture. Here are concrete differences:

Serverless + Managed LLMs: PII and conversational history are transmitted to a third party. You must design redaction, retention policies, and legal controls (DPA, data processing agreements). Consider token-level encryption or ephemeral keys if supported.
On-device inference: Data never leaves the user's device by default. This simplifies GDPR/CCPA scope for many micro apps and reduces legal friction.

If privacy is a primary requirement (patient data, personal financial info, or a user’s sensitive location history), prioritize on-device inference or hybrid designs that keep sensitive signals locally and send only safe, aggregated features to the cloud.

Hybrid patterns — the best of both worlds

You don't need to be binary. Common hybrid strategies:

Local-first with optional cloud fallback: run a small model locally; if the request needs more context or a larger model, escalate to serverless.
Federated telemetry: send anonymized signals for model improvement while keeping raw data local.
Split inference: run retrieval + ranking locally, and forward a compact prompt to a cloud model for generation.

TypeScript developer workflows: serverless vs on-device

TypeScript helps here by making contracts explicit and improving maintainability during incremental migration.

Serverless TypeScript flow (example: Vercel Edge / Cloudflare Workers)

Steps:

Start with a tiny handler: src/api/recommend.ts
Use types for request/response payloads and a clear DTO for user preferences
Bundle with esbuild or Vite; target ESM and keep bundle small for edge runtime limits
Test locally with node or wrangler/deno dev, then run CI for type checks and unit tests

Example TypeScript edge handler (Vercel-like):

import type { VercelRequest, VercelResponse } from '@vercel/node';

type Prefs = { cuisine?: string; budget?: 'cheap' | 'mid' | 'expensive' };

export default async function handler(req: VercelRequest, res: VercelResponse) {
  const prefs = req.body as Prefs;
  // lightweight ranking logic or call a managed LLM
  const recs = await getRecommendations(prefs);
  res.json({ recommendations: recs });
}

On-device TypeScript flow (Pi + HAT)

Two realistic runtime choices for TypeScript on Pi:

Run Node 20+ on ARM and compile TypeScript into JS (fast, simple)
Bundle business logic as WebAssembly and call native C/C++ inference libs for the HAT

Example: minimal TypeScript server that invokes a local inference binary (pseudo):

import express from 'express';
import { execFile } from 'child_process';

const app = express();
app.use(express.json());

app.post('/recommend', (req, res) => {
  const prefs = JSON.stringify(req.body);
  execFile('./local_infer', [prefs], (err, stdout) => {
    if (err) return res.status(500).send({ error: err.message });
    res.json(JSON.parse(stdout));
  });
});

app.listen(3000, () => console.log('Listening'));

In production you'd replace execFile with a native binding or a WASM call for better performance and safety.

Incremental migration from JS to TypeScript (practical checklist)

For many micro apps, you don't rewrite everything at once. Here's a safe path that preserves momentum but improves reliability.

Enable allowJs and add a tsconfig with incremental: true. This lets you start adding .ts files while keeping .js files working.
Add types for boundaries: type the HTTP layer (request/response DTOs) first. Stability here prevents bugs across layers.
Introduce strict options selectively: enable noImplicitAny and strictNullChecks in a module first; fix issues iteratively.
Use JSDoc as transitional typing for third-party or legacy modules to get gradual safety without full rewrite.
Write integration tests that validate the TypeScript-compiled outputs in a real runtime (Node on Pi or edge runtime emulator).
Automate builds for both targets: your CI should produce an edge bundle and an ARM build (or Docker image) for Pi deployments. See ops tooling and local testing patterns at hosted tunnels & local testing.

Operational concerns and monitoring

Whether serverless or edge, put observability and update paths in place early.

Serverless: integrate logs, tracing, and synthetic checks. Track per-request latency and API costs.
Edge devices: set up secure remote access (ssh bastion or device manager), automated OS updates, and heartbeat reporting to a central dashboard.
Security: sign firmware and model updates. Use certificates for mutual TLS between device and management plane.

When to pick which approach (decision matrix)

Use this simple rule-of-thumb:

Pick serverless when you expect growth, need continuous model improvements, or you want minimal device ops. Ideal for public micro apps with hundreds-to-thousands of users.
Pick on-device edge when privacy, offline availability, or predictable local latency are decisive. Ideal for personal micro apps (Where2Eat for a friend group), closed communities, or regulated data.
Pick hybrid when you need local privacy for sensitive inputs but still want occasional cloud-level reasoning and model upgrades.

Future trends to watch (2026 & beyond)

Smaller, standardized quantized models will make on-device inference more capable and further reduce the cloud-vs-edge gap.
Edge runtimes for TypeScript — expect more wasm-first runtimes and lightweight Deno/Bun variants optimized for NPUs and ARM.
Privacy-preserving pipelines: federated fine-tuning and secure enclaves will make hybrid patterns more attractive for commercial apps.

Actionable takeaways

Start with TypeScript at the API boundary (request/response DTOs) to get immediate safety with minimal rewrite effort.
Prototype both paths: deploy a serverless TypeScript function and a single Pi + AI HAT unit. Measure real latencies and ops cost for your exact workload.
If privacy is more important than raw cost, prioritize on-device inference or hybrid designs that keep sensitive features local.
Automate model updates: for edge devices use signed updates; for serverless use feature flags to roll back quickly.
Measure end-to-end: include API call costs, network egress, hardware amortization, and devops time when comparing alternatives.

Final recommendation

For a personal dining recommender used by a handful of friends, a Raspberry Pi + AI HAT running a TypeScript microservice is compelling: the privacy gains and fixed-cost behavior outweigh the small ops overhead. For public micro apps or when you expect rapid usage growth, start serverless in TypeScript and instrument everything. Migrate selected functions to edge devices later using a hybrid approach if privacy or offline capability becomes a requirement.

Getting started checklist (next steps)

Pick the initial architecture (serverless or Pi prototype).
Type your API request/response DTOs and add TS build to CI.
Deploy a minimal endpoint (10–20 lines) and run 100 request latency tests.
If building Pi prototype: order a Pi 5 + AI HAT, set up Node 20+ and a local TypeScript service, and test inference with a quantized 7B model.
Document privacy and compliance decisions in your README and product spec.

Closing — choose measurably, iterate quickly

The right answer isn't binary. Use TypeScript to codify boundaries and reduce technical debt while you experiment with serverless and edge AI options. Prototype, measure, and let data (latency, cost, and user trust) drive the decision.

Ready to try this in your codebase? Start by converting one endpoint to TypeScript and deploying it to a serverless edge. If you'd like, I can generate a tailored migration plan and a starter repo for your dining recommender that includes both a Vercel edge function and a Pi deployment script.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.