Graceful Shutdown and Restart Patterns in TypeScript Services
Practical TypeScript patterns—signal handling, transactional shutdowns, health checks—to keep Node.js services resilient against random process terminations.
Survive the random kills: why graceful shutdowns matter for TypeScript backends in 2026
If you've lost work, leaked connections, or seen half-committed transactions after a deployment or a flaky node restart, you're not alone. Production environments in 2026 are more ephemeral than ever: containers, edge functions, and host autoscalers will kill and restart processes at unpredictable times. The result? Partial requests, stuck jobs, and corrupt state. This article gives you practical, TypeScript-first patterns—signal handling, transactional shutdowns, health checks, and restart strategies—to make your services resilient when processes die at random.
Quick summary (inverted pyramid): what to do first
- Catch signals (SIGTERM, SIGINT, SIGQUIT) and route them to a shutdown manager.
- Flip readiness immediately so load balancers stop sending new traffic.
- Stop accepting work and drain in-flight requests and sockets gracefully.
- Finish or rollback transactions and finish background jobs deterministically.
- Enforce timeouts and exit cleanly, letting the orchestrator restart you.
Context: what's changed in 2024–2026 (and why that matters)
By 2026 we see three platform trends that change how graceful shutdowns are implemented:
- Node.js and the ecosystem have broadly adopted AbortController-first APIs (core fs, timers, fetch) and many libraries accept an AbortSignal for cancellation—use it to cancel pending work.
- Containers, Kubernetes, and service meshes are default infrastructure; readiness and liveness probes, preStop hooks, and terminationGracePeriodSeconds are table stakes for shutdown logic.
- Chaos engineering is mainstream—teams regularly run automated killers (Chaos Mesh, Pumba) and expect apps to survive unexpected SIGKILLs as much as planned redeploys.
Pattern 1 — Centralized graceful shutdown manager (TypeScript)
Instead of scattering signal handlers across files, create a single orchestrator that coordinates shutdown steps and exposes an AbortSignal for the rest of the app to listen to.
Why a manager?
- Clear lifecycle: readiness flip, stepwise disposal, timeout enforcement.
- Easy testing: replace timers and probes in unit tests.
- Consistent logging and observability of shutdown duration and reasons.
Example: graceful-shutdown.ts
/* TypeScript: graceful-shutdown.ts */
import { EventEmitter } from 'events';
export type CleanupFn = (signal?: AbortSignal) => Promise | void;
export class GracefulShutdown extends EventEmitter {
private cleaners: CleanupFn[] = [];
private shuttingDown = false;
private controller = new AbortController();
constructor(private readonly timeoutMs = 30_000) {
super();
this.installSignalHandlers();
}
public get signal() { return this.controller.signal; }
public register(fn: CleanupFn) {
this.cleaners.push(fn);
}
private installSignalHandlers() {
const handler = (sig: NodeJS.Signals) => {
// prevent reentrant shutdown
if (this.shuttingDown) return;
this.shuttingDown = true;
this.emit('shutdown', sig);
void this.shutdown(sig).catch((err) => {
console.error('shutdown error', err);
process.exit(1);
});
};
process.on('SIGTERM', () => handler('SIGTERM'));
process.on('SIGINT', () => handler('SIGINT'));
process.on('SIGQUIT', () => handler('SIGQUIT'));
}
private async shutdown(reason: string | NodeJS.Signals) {
console.log('Graceful shutdown started:', reason);
// stop new work by aborting shared signal
this.controller.abort();
const timeout = new Promise((_, reject) => {
setTimeout(() => reject(new Error('shutdown timeout')), this.timeoutMs);
});
const runAll = (async () => {
for (const fn of this.cleaners) {
try {
await fn(this.signal);
} catch (err) {
console.error('cleanup failed', err);
}
}
})();
await Promise.race([runAll, timeout]).catch((err) => {
console.error('Graceful shutdown timed out or failed:', err);
});
console.log('Graceful shutdown complete. Exiting.');
process.exit(0);
}
}
How to use it
/* app.ts */
import http from 'http';
import express from 'express';
import { GracefulShutdown } from './graceful-shutdown';
const shutdown = new GracefulShutdown(20_000);
const app = express();
let ready = true;
app.get('/health/liveness', (_req, res) => res.sendStatus(200));
app.get('/health/readiness', (_req, res) => res.send(ready ? 200 : 503));
const server = http.createServer(app);
server.listen(3000, () => console.log('listening'));
// register a cleanup that flips readiness and closes server
shutdown.register(async (signal) => {
ready = false; // immediate readiness flip
console.log('stopping accepting new connections');
await new Promise((resolve) => {
server.close(() => resolve());
// in case the process is stuck, abort after the shared signal is aborted
signal?.addEventListener('abort', () => { /* already handled by timeout */ });
});
});
Pattern 2 — Drain sockets and in-flight HTTP requests
server.close() stops accepting new connections but doesn't forcibly close existing keep-alive sockets. Track sockets and destroy idle ones once the drain window expires.
/* socket-drain.ts (continued) */
const sockets = new Set();
server.on('connection', (socket) => {
sockets.add(socket);
socket.on('close', () => sockets.delete(socket));
});
shutdown.register(async () => {
// after server.close we have only active sockets
const drainTimeout = 10_000;
const killAfter = Date.now() + drainTimeout;
for (const s of sockets) {
// set a short timeout so truly idle keep-alive sockets get killed
s.setTimeout(5_000);
}
while (sockets.size > 0 && Date.now() < killAfter) {
await new Promise((r) => setTimeout(r, 100));
}
for (const s of sockets) s.destroy();
});
Pattern 3 — Transactional shutdowns: commit or rollback deterministically
Long-running transactions are fragile during shutdown. The pattern is:
- Stop accepting new requests that start transactions (readiness flip).
- Wait for active request handlers to finish or signal cancellation via an AbortSignal.
- Commit or rollback in a finally block; prefer idempotent commits or compensating transactions for safety.
Example with a generic DB client that supports transactions:
/* tx-worker.ts */
async function handleRequest(req: RequestContext, signal?: AbortSignal) {
const client = await db.connect();
try {
await client.beginTransaction();
// pass the signal down to cancel long DB queries if controller aborts
const result = await client.query('UPDATE ...', { signal });
await client.commit();
return result;
} catch (err) {
await client.rollback();
throw err;
} finally {
client.release();
}
}
// During shutdown register a handler that waits until in-flight handlers finish
shutdown.register(async (signal) => {
// wait for all active handlers (tracked via a semaphore) to finish
await activeRequests.waitForZero({ signal });
});
Tips for safer transactions
- Keep transactions short. In 2026, prefer single-statement transactions when possible and use compensating transactions for complex workflows.
- Use optimistic concurrency and idempotency keys so retries during restarts are safe.
- If using ORMs (Prisma, TypeORM), ensure the client supports AbortSignal for cancelable queries.
Pattern 4 — Background workers and job queues
For queue processors (BullMQ, RabbitMQ consumers), handle a two-phase shutdown:
- Stop fetching new jobs.
- Allow current job(s) to finish or abort them safely.
- Persist state so a new worker can resume if needed.
/* worker.ts */
let pulling = true;
async function pullLoop(signal?: AbortSignal) {
while (pulling && !signal?.aborted) {
const job = await queue.getNext();
if (!job) { await sleep(100); continue; }
try {
await processJob(job, signal);
await job.ack();
} catch (err) {
await job.nack();
}
}
}
shutdown.register(async (signal) => {
pulling = false; // stop getting new jobs
// wait for currently processing jobs to finish or be canceled by signal
await activeJobs.waitForZero({ signal });
});
Pattern 5 — Readiness & liveness: flip early, verify often
Health endpoints are your control plane during shutdown. The canonical approach:
- /health/liveness returns 200 if the process is alive (used by kubelet to restart crashed pods).
- /health/readiness returns 200 only if the instance can accept new work (used by load balancers and service meshes).
On receiving a signal, immediately make /health/readiness return 503. This prevents new traffic while you drain.
Flipping readiness first makes shutdowns predictable to orchestrators—without it, load balancers may keep sending new requests into a dying process.
Pattern 6 — Restart strategies (zero downtime and rolling restarts)
There are multiple ways to restart a TypeScript service without dropping traffic:
- Orchestrator-managed rolling updates (Kubernetes Deployments, ECS): prefer rolling deployments with proper readiness probes and terminationGracePeriodSeconds tuned to your drain + commit time.
- Signal-based graceful restart (process manager): PM2 or systemd can send signals for reload (SIGUSR2, SIGTERM) and manage rolling restarts across instances.
- Cluster manager: use a small master that spawns workers and performs rolling restarts by spawning a new worker and killing the old one after it signals ready.
Example snippet for a simple rolling restart using Node's cluster:
/* master.ts */
import cluster from 'cluster';
import os from 'os';
if (cluster.isMaster) {
const workers: cluster.Worker[] = [];
for (let i = 0; i < os.cpus().length; i++) cluster.fork();
// rolling restart helper
async function rollingRestart() {
for (const id in cluster.workers) {
const w = cluster.workers[id]!;
w.send({ cmd: 'shutdown' });
await new Promise((res) => w.on('exit', res));
cluster.fork();
}
}
} else {
// worker starts HTTP server and listens for shutdown message
process.on('message', (m) => {
if (m?.cmd === 'shutdown') process.kill(process.pid, 'SIGTERM');
});
}
Tooling, build configs & editor integrations (practical tips)
Make sure your development and CI tooling don't get in the way of debugging shutdowns:
- tsconfig: target the Node version you run in prod. For ESM apps set
"module": "NodeNext"and enable"sourceMap": trueto keep stack traces readable in TypeScript. - Bundlers: using esbuild or swc reduces start-up time (less time spent in transient states during restarts). Keep your shutdown logic robust to faster restarts.
- VS Code: configure a launch task that sends SIGTERM to the debuggee to reproduce real shutdown behavior; use the Node debugger's 'Restart' to see how your app behaves under immediate restarts.
- Testing: run chaos tests in pre-prod—simulate SIGTERM, SIGKILL, and container stop timeouts. Use ephemeral environments (kind, k3d) to validate Kubernetes probes and lifecycle hooks.
Observability and SLOs for shutdowns
Track metrics that indicate shutdown health:
- Shutdown duration histogram (time from signal to process exit).
- Number of requests dropped during shutdown.
- Transaction abort/rollback counts.
Capture shutdown reasons in logs (SIGTERM vs. SIGINT vs. healthcheck failure) and export them to your APM. Use these signals to set SLOs: e.g., 99% of graceful shutdowns complete within the configured termination window.
Common pitfalls and how to avoid them
- Ignoring keep-alive sockets: causes requests to hang past terminationGracePeriodSeconds. Track and destroy sockets after a short drain window.
- Not flipping readiness: LB continues to send requests to a process that's about to die.
- Long transactions: avoid multi-second transactions that force long termination windows.
- Mixing signals: ensure your orchestration (systemd, Docker, Kubernetes) sends the expected signals—Docker sends SIGTERM then SIGKILL after the stop timeout, Kubernetes respects terminationGracePeriodSeconds.
Test plan: how to validate your shutdown strategy
- Unit test: call your GracefulShutdown.shutdown manually and assert registered cleaners run and the AbortSignal is triggered.
- Integration: run the container locally and send SIGTERM; watch readiness flip, server.close, and socket cleanup logs.
- Chaos test: use a chaos tool (Chaos Mesh, Litmus, Pumba) in a staging cluster to randomly kill pods and verify no user-visible errors occur beyond expected retries.
- Performance test: measure how often shutdowns exceed timeout and adjust terminationGracePeriodSeconds and timeouts accordingly.
2026 recommendations & future-proofing
- Prefer AbortController-driven cancellation for all async primitives; by 2026 most major libs support it and it simplifies cancellation propagation.
- Design systems for idempotency and short-lived transactions so restarts are cheap and safe.
- Automate chaos tests in CI for every major release—treat random kills as a first-class test case.
- Make health endpoints and metrics part of your deployment checklist; don't rely solely on logs to diagnose shutdown issues.
Actionable checklist (apply this in 30 minutes)
- Add a single GracefulShutdown manager to your app and replace ad-hoc signal handlers.
- Expose /health/readiness and /health/liveness and flip readiness on signal.
- Track sockets and ensure idle keep-alives are destroyed during drain.
- Stop pulling new jobs on shutdown and wait for active jobs to finish.
- Run a SIGTERM test in a staging container with Kubernetes/ECS to validate terminationGracePeriodSeconds.
Closing thoughts
Random process terminations are inevitable. In 2026, with faster infra and ubiquitous chaos testing, applications that treat graceful shutdown as a feature—not an afterthought—will be more reliable and easier to operate. The patterns above are practical, TypeScript-first, and compatible with modern Node and container ecosystems.
Try this now: add the GracefulShutdown manager to a small service, simulate SIGTERM locally, and verify readiness flips and sockets drain. If you have CI/CD, add a chaos step in staging that sends random SIGKILLs—your users will thank you.
Call to action
Want a ready-to-run template? Clone our minimal TypeScript graceful-shutdown starter repo (includes tsconfig, Dockerfile, and Kubernetes manifest) and run chaos tests in 10 minutes. Share feedback or your own patterns in the comments—let's make TypeScript services that actually survive real-world chaos.
Related Reading
- How to Host a Local Film Night Without Inviting Online Toxicity
- The Clean Kitchen Checklist: Integrating Robot Vacuums and Wet-Dry Machines into Your Weekly Kitchen Routine
- Music Podcasters Take Notes: What Ant & Dec’s First Podcast Launch Teaches Artists
- Save on Outdoor Adventures: Which Altra and Brooks Deals Work Best for Hikes Abroad
- Implementing Live-Stream Integrations: When Users Go Live from Your Upload Widget
Related Topics
Unknown
Contributor
Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.
Up Next
More stories handpicked for you
Chaos-Testing Node Apps: Simulating 'Process Roulette' with TypeScript
From Chrome Extension to Local AI Extension: A Migration Playbook in TypeScript
PWA + Local AI: Shipping an Offline Assistant for Android and iOS with TypeScript
Client-Side NLP with TypeScript and WASM: Practical Patterns
Build a Local LLM-Powered Browser Feature with TypeScript (no server required)
From Our Network
Trending stories across our publication group