A language-agnostic static analysis model in TypeScript: implementing a MU-like graph representation
Build a language-agnostic MU-like graph in TypeScript to mine code changes, cluster fixes, and ship trusted ESLint rules.
If you build developer tooling long enough, you learn a hard truth: the best static analysis rules are usually not invented in a vacuum. They are mined from real code changes, reviewed by real engineers, and validated against the messy reality of production systems. That is exactly why a language-agnostic approach matters. Instead of locking your analysis into one parser, one AST shape, or one language ecosystem, you can create a higher-level graph model in TypeScript that captures the semantics of a change and makes it searchable across JavaScript, TypeScript, Python, Java, and beyond.
This guide shows how to design that model, often called a MU-like representation, and how to use it for mining code changes, clustering bug fixes, and surfacing actionable rules into ESLint and TS Server. The goal is not just academic elegance. It is to improve code hygiene, reduce false positives, and ship rules developers actually accept. For a broader view on how teams balance ownership and automation, see our guide on operate vs orchestrate and how a multi-cloud management mindset applies to analysis pipelines too.
Pro tip: If your analysis model cannot group semantically similar changes across different syntaxes, you do not have a “mining” system yet. You have a parser collection.
Why a language-agnostic graph beats AST-only analysis
ASTs are precise, but too local
ASTs are essential, but they are tightly coupled to language grammar. A JavaScript AST can tell you about call expressions, property access, and import declarations. A Python AST can do the same in a different shape, while Java brings yet another object model. That is fine for lint rules that stay inside one language, but it becomes a liability when your goal is to mine recurring fix patterns across repositories. Two developers can fix the same bug in different syntactic forms, and an AST-only pipeline may treat them as unrelated events.
A MU-like graph solves this by lifting the representation one level higher. Instead of remembering every token and parse node, it models the program change in terms of entities, operations, and relationships that matter to the rule. That makes it easier to cluster code edits that share the same intent, such as “add a null check before dereferencing,” “avoid passing mutable default config,” or “validate input before constructing a request.” This is the same rationale behind using scientific hypothesis testing: abstract away irrelevant surface differences so you can compare explanations meaningfully.
Cross-language rule mining needs semantic normalization
When you mine code changes across repos, the challenge is not just scale. It is comparability. A bug fix in React may add a dependency to a hook array, while a bug fix in Python may add a guard before a dictionary lookup. Both are “missing precondition” fixes, but they are expressed differently. A graph model with normalized node types and edge semantics can represent both in a shared frame. That lets your clustering algorithm discover patterns that would otherwise be fragmented across languages and frameworks.
This is where a TypeScript SDK becomes useful as an implementation layer. TypeScript gives you strong typing for graph nodes, transformations, and rule extraction pipelines, while still being practical for working across JSON, AST adapters, and code review integrations. In a production setting, this also improves observability and maintainability, much like the discipline described in governance and observability operating models.
What the MU representation is modeling
A higher-level graph of change, not source text
Think of MU as a graph of meaning. It does not need to preserve every syntactic detail; it needs to preserve the relationships that help you infer a rule. For each change, you can model nodes such as API calls, control-flow guards, data sources, sinks, literals, variables, and container objects. Edges capture relationships like “uses,” “guards,” “depends on,” “returns,” “propagates to,” and “transforms into.” The point is to represent the change at a level where common bug-fix structure becomes visible.
For example, in one repository a fix may change fetch(url) to fetch(url, { signal }) after introducing timeout handling. In another repo, the same intent might show up as a Python requests.get(..., timeout=5) adjustment. The syntax is different, but the graph can normalize both as a call to an external resource with a newly enforced timeout constraint. That is the power of a language-agnostic representation: it is structured around behavior, not grammar.
Canonical node and edge types
To make clustering stable, define a compact schema. Keep node types few and semantic. For example, use nodes like Entity, Call, Condition, Literal, Return, and Assignment. Then define edge types that express intent: controls, targets, reads, writes, protects, and fixes. In practice, you can enrich the graph with language-specific metadata, but the clustering layer should ignore most of that metadata unless it is necessary for disambiguation.
This design mirrors how strong engineering teams standardize interfaces across domains. A useful analogy comes from partner SDK governance: keep the contract stable, isolate implementation details, and reduce the chance that downstream consumers depend on accidental behavior. A graph schema is a contract too, and the better it is designed, the more reusable your mining pipeline becomes.
Why TypeScript is a good fit for the model
TypeScript is well suited for building the MU layer because it gives you discriminated unions, generics, readonly data structures, and excellent developer ergonomics. You can encode the graph schema with strong types, validate transformations, and avoid a large class of accidental shape mismatches. At the same time, the runtime is flexible enough to ingest JSON from multiple parsers and adapters. That balance matters when you are orchestrating analysis across mixed-language repositories and need to keep the implementation approachable for tooling engineers.
type NodeKind = 'Entity' | 'Call' | 'Condition' | 'Literal' | 'Return' | 'Assignment';
type EdgeKind = 'controls' | 'targets' | 'reads' | 'writes' | 'protects' | 'fixes';
type NodeId = string;
interface MuNode {
id: NodeId;
kind: NodeKind;
label?: string;
language?: string;
metadata?: Record<string, unknown>;
}
interface MuEdge {
from: NodeId;
to: NodeId;
kind: EdgeKind;
}
interface MuGraph {
nodes: MuNode[];
edges: MuEdge[];
repository: string;
commit: string;
filePath: string;
}Building a TypeScript pipeline for mining code changes
From commit diffs to normalized edit scripts
Your mining pipeline starts with commits, not rules. First, collect bug-fix commits from repositories with labels such as fix, bug, hotfix, or review metadata. Then parse the before-and-after code into language-specific trees, and convert those trees into normalized edit scripts. The edit script should describe what changed at the semantic level, not just what text changed. This usually means extracting operations like “insert guard,” “replace call argument,” “move validation earlier,” or “wrap with try/catch.”
After that, translate the edit script into a MU graph that can be compared across languages. A good pipeline is: repository ingestion, commit selection, diff parsing, semantic lifting, graph normalization, and clustering. If you want to see how data pipelines can become decision tools, the structure is similar to the way teams build metrics that matter for scaled systems: raw events are not enough; you need normalization and interpretation. Similarly, your rule mining system needs curated inputs and a repeatable representation.
Handling different languages without losing signal
Different languages expose different parsing affordances. JavaScript and TypeScript can be handled with the TypeScript compiler API, Babel, or tree-sitter. Python and Java often enter through tree-sitter or language-specific parsers. The key is not choosing one parser to rule them all. The key is creating adapter layers that emit the same MU schema from each language backend. Each adapter can map language syntax to shared semantic categories, such as function invocation, conditionals, exception handling, object property access, or API parameter validation.
This is where a language-agnostic architecture overlaps with broader platform engineering. If you have ever read about avoiding vendor sprawl during digital transformation, the lesson is the same: standardize the operating layer while allowing heterogeneous backends underneath. That keeps your analysis system extensible when you add new languages, frameworks, or ecosystem-specific edge cases.
Practical ingestion code
In TypeScript, the ingestion stage should be explicit and testable. Separate concerns into parser adapters, change normalizers, graph builders, and clusterers. Make each step output serializable artifacts so you can inspect failures. In production, graph artifacts become extremely valuable for debugging missed clusters, much like an audit trail helps verify document lineage in practical audit trails. When a rule looks wrong, you should be able to trace it back to a commit, a file, an edit, and a normalized graph.
interface ChangeEvent {
repo: string;
commit: string;
language: string;
before: string;
after: string;
filePath: string;
}
function buildMuGraph(change: ChangeEvent): MuGraph {
// 1) parse before/after
// 2) extract edit operations
// 3) normalize entities and control/data relationships
// 4) return graph
return {
repository: change.repo,
commit: change.commit,
filePath: change.filePath,
nodes: [],
edges: []
};
}Clustering semantically similar code changes
Feature extraction from graphs
Once you have graphs, clustering becomes the next gate. A good clusterer does not rely on raw node IDs or file paths. It should derive a feature vector from graph topology, node kind histograms, edge patterns, API names, guard structures, and normalized literals. You may also compute graph hashes or graph embeddings to make candidate grouping faster. The trick is to include enough structure to keep unrelated changes apart, while abstracting enough detail to unite equivalent fixes.
In practice, I recommend starting with a hybrid approach: a cheap signature filter followed by a deeper semantic similarity score. The signature can include the target API family, the fix category, and the presence of guards or exception handling. The deeper score can compare graph neighborhoods, sequence patterns, or learned embeddings. This staged approach is similar to how teams use assessment designs that distinguish genuine understanding: a first pass narrows the field, and the deeper pass validates the real signal.
Clustering strategies that work
For an initial system, start simple. Use locality-sensitive hashing, MinHash on extracted semantic tokens, or agglomerative clustering on a graph similarity score. If you have enough data, experiment with graph embeddings and density-based clustering such as HDBSCAN. The output should be clusters that are interpretable enough for humans to review, because the final rule must be legible to developers. A cluster that cannot be summarized in one sentence is usually not ready to become a lint rule.
At scale, you also need cluster hygiene. Remove duplicates, split mixed-intent clusters, and collapse near-identical examples from the same repository if they bias the cluster too much. This is the kind of operational discipline often seen in IT leadership frameworks: know when to operate consistently and when to orchestrate across systems. In rule mining, consistency is essential, but so is judgment.
A comparison of approaches
| Approach | Strengths | Weaknesses | Best use case |
|---|---|---|---|
| AST-only matching | Simple, precise within one language | Poor cross-language generalization | Single-language linting |
| Normalized edit scripts | Good for change intent | Needs careful abstraction design | Bug-fix mining |
| MU-like graphs | Cross-language semantic grouping | More complex to implement | Rule mining at scale |
| Graph embeddings | Useful for similarity search | Harder to explain | Candidate clustering |
| Hybrid pipeline | Balanced speed and accuracy | More moving parts | Production rule generation |
Turning clusters into rules developers will accept
From cluster summary to rule statement
Mining is only half the job. The true product is the rule. A strong rule statement should explain the risky pattern, the recommended fix, and the practical reason it matters. For instance: “When calling an external HTTP API, always set a timeout or abort signal.” Or: “Before accessing a nested object property, guard for nullish values unless the API guarantees presence.” This framing makes the rule understandable to both reviewers and automated tooling.
The Amazon research summary is notable here: it reports 62 high-quality static analysis rules mined across Java, JavaScript, and Python from fewer than 600 clusters, and 73% acceptance of recommendations in code review. That acceptance rate is a powerful signal that mined rules can be more useful than hand-authored guesses because they reflect real developer pain. If you are building code hygiene tooling, that kind of acceptance is the north star. It is also why rule quality must be treated as an engineering metric, not a content generation metric.
Rule schema design
Represent each rule with a machine-readable schema that includes triggering conditions, matched graph motifs, severity, rationale, examples, and remediation advice. Keep it deterministic enough for ESLint to execute and for TS Server to suggest. A practical rule schema might include a normalized pattern, a set of required nodes, a set of prohibited edges, and a fixer template. You can also store examples from mined clusters, redacted if necessary, to support documentation and test generation.
This approach is especially useful when integrated with quality workflows like risk registers and reliability patterns in other domains. The same discipline applies: define criteria, document thresholds, and make the output actionable. A rule that has no remediation path will get ignored, no matter how clever the underlying model is.
Human review is part of the product
Even with a great clustering system, you need reviewer-in-the-loop validation. Have senior engineers label clusters as true positives, false positives, or “needs split.” Track the reasons: too broad, too narrow, language-specific, framework-specific, or stylistic rather than defect-oriented. Over time, these labels become a feedback loop that improves your semantic abstraction and your downstream precision. This is exactly the kind of quality discipline that separates a scalable rules engine from a novelty demo.
Pro tip: If reviewers keep saying “this rule is technically correct but annoying,” your abstraction level is probably wrong, not your reviewers.
Surfacing rules in ESLint and TS Server
ESLint as the fast feedback layer
For JavaScript and TypeScript repositories, ESLint is the fastest route from mined rule to developer value. Convert each mined rule into a custom ESLint rule that inspects the ESTree or TypeScript AST, then use the MU-derived pattern to drive matching logic. The graph itself can live upstream as the mining artifact, while ESLint gets a distilled matcher that runs in milliseconds. That separation keeps the training/mining world independent from the enforcement world.
In practice, your ESLint rule will likely use a combination of AST selectors, symbol resolution, and semantic guards. For example, a rule mined from clusters about missing timeouts may inspect calls to fetch, axios, or a wrapper client, then verify whether a timeout/abort mechanism exists. By keeping the rule logic derived from a normalized cluster, you reduce the risk of encoding one repo’s quirks as if they were universal. If you want more on tooling ecosystems that wrap analysis around behavior, our guide to building platform-specific agents with the TypeScript SDK offers a useful pattern.
TS Server for inline intelligence
TS Server opens the door to proactive guidance inside editors. Unlike a lint-only setup, TS Server can power code actions, quick fixes, and contextual hints before the developer even saves the file. That is ideal for mined rules with clear remediation. A missing null guard, unsafe optional chaining pattern, or brittle API call can be surfaced as a diagnostic with a suggested fix. The goal is to reduce friction while preserving trust.
When designing editor integrations, keep latency and predictability in mind. Developers will tolerate a rule more readily if the warning appears instantly, explains itself clearly, and offers a fix they can inspect. If you are looking for a strategic analogy, the decision between enforcement layers is similar to the tradeoffs described in governance and observability: the best system gives you feedback at the right place in the workflow, not only after the fact.
Packaging mined rules safely
Don’t ship every mined rule automatically. Create a promotion pipeline: candidate, reviewed, experimental, recommended, and enforced. Candidate rules can run in warn mode, while only the highest-confidence rules get autofix or hard-error treatment. This is especially important if the rule affects widely used libraries or critical runtime behavior. A staged rollout keeps trust high and makes rollback simple if a rule proves noisy.
Quality, testing, and evaluation
Measure precision, recall, and acceptance
Rule mining systems should be evaluated on more than raw cluster counts. Measure precision against a curated test set, recall on known defect patterns, and acceptance rate in real review workflows. If acceptance is low, the rule may be too noisy, too broad, or too obscure to be useful. The Amazon summary’s 73% acceptance rate is valuable because it connects analysis quality to actual developer behavior, which is the only metric that really matters in practice.
To make evaluation meaningful, create a benchmark suite of historical changes and hold-out repositories. Each benchmark should include the source diff, the MU graph, the inferred rule, and the expected outcome. This allows reproducible comparisons between AST-only matching, graph-based matching, and any embedding-based enhancements you add later. Treat it like a quality gate, not a one-time paper exercise.
Test cases should mirror real developer mistakes
Good tests are not synthetic puzzles. They should reflect mistakes teams actually make: forgotten validations, missing resource cleanup, unsafe assumptions about API shape, weak error handling, and inconsistent defaults. Your test corpus should include both positive and negative examples, plus borderline cases that stress your abstraction boundaries. That’s how you keep a rule from becoming a style preference with a security-sounding name.
There is a useful parallel to assessment design: if the test can be gamed by surface cues alone, then it is not measuring the right skill. In static analysis, if your system only detects literal string matches, it is not truly mining behavior. Real quality comes from semantic coverage.
Operational feedback loops
Finally, build feedback loops from production usage back into the mining pipeline. Track which suggestions are accepted, dismissed, edited, or deferred. Group dismissal reasons and use them to refine cluster boundaries and rule thresholds. Over time, your mining system becomes a living quality engine rather than a one-off research prototype. That is how you move from clever graphs to durable developer infrastructure.
Implementation architecture in TypeScript
Suggested module layout
A maintainable system benefits from a clean separation of concerns. Put parsers and adapters in one layer, semantic normalization in another, graph building in a third, clustering in a fourth, and rule export in a fifth. In TypeScript, this structure can be expressed with interfaces and dependency injection so that each component is individually testable. Keep the pipeline async-friendly because repository mining and parse work will often be I/O bound.
src/
adapters/
javascriptAdapter.ts
pythonAdapter.ts
javaAdapter.ts
normalization/
entityNormalizer.ts
edgeBuilder.ts
graphs/
muGraph.ts
similarity.ts
clustering/
clusterer.ts
rules/
ruleSchema.ts
eslintExporter.ts
tsServerExporter.tsSerialization and reproducibility
Persist every intermediate artifact. You want raw diffs, normalized changes, graphs, cluster assignments, and published rules. This supports debugging, experiment comparison, and auditing. If a rule changes after a normalization tweak, you should be able to rerun the pipeline on the same data and explain the delta. Reproducibility is non-negotiable if you want engineering teams to trust the tool.
For storage, JSON is usually enough for the graph and rule artifacts, but make sure you include stable IDs and version tags. That lets you evolve the schema without breaking older clusters. It also makes it easier to compare results over time and answer questions like “Did the last parser upgrade reduce recall?”
Type safety patterns that help
Use discriminated unions for node kinds, branded types for graph IDs, and readonly arrays for published artifacts. Add runtime validation with Zod or similar libraries at the boundaries where untrusted data enters the pipeline. This prevents a large class of silent failures that are common in analysis systems, especially when multiple languages and parsers are involved. The stronger the type boundary, the easier it is to reason about cluster quality and rule correctness.
Common pitfalls and how to avoid them
Overfitting to one repository
One of the fastest ways to ruin a mining project is to overfit clusters to a single codebase. If all examples come from one repo, the resulting rule may simply encode local conventions. Mix repositories, frameworks, and styles deliberately. A useful rule should survive transfer to a new codebase without collapsing into false positives.
Under-modeling the fix intent
The opposite problem is abstraction that is too weak. If your graph only knows that a file changed, you will miss the signal entirely. The model must capture enough intent to distinguish “added a guard,” “removed a risky call,” and “changed an argument order.” That balance is the essence of MU-like modeling: high-level enough to generalize, concrete enough to explain.
Ignoring developer UX
Many mining systems fail because they optimize for research metrics instead of developer workflow. If the rule message is cryptic, the fix is unclear, or the warning shows up too late, adoption will stall. The integration target matters as much as the mining engine. That is why the final mile into ESLint and TS Server is not an optional feature; it is part of the product definition.
Conclusion: from code-change mining to durable code hygiene
A language-agnostic MU-like graph representation gives TypeScript teams a practical way to mine code changes across repositories, cluster semantically similar fixes, and convert them into rules developers will actually use. The real advantage is not just cross-language compatibility. It is the ability to turn messy historical changes into structured knowledge that improves code hygiene at scale. If you design the pipeline carefully, you can create a feedback loop from repository history to editor diagnostics and code review assistance.
Start by defining a compact semantic graph, then build adapters for each language you want to mine. Add clustering, human review, and a promotion pipeline that controls when a rule becomes an ESLint warning or TS Server suggestion. Finally, measure success by acceptance rate and defect reduction, not by how sophisticated the model sounds. If you want adjacent thinking on scalable tooling and platform design, revisit our pieces on TypeScript SDK platform agents, SDK governance, and metrics that matter.
Related Reading
- Build Platform-Specific Agents with the TypeScript SDK: From Scrapers to Social Listening Bots - Learn how to structure TypeScript tooling pipelines that adapt to different environments.
- Partner SDK Governance for OEM-Enabled Features: A Security Playbook - A useful model for defining stable contracts across complex integrations.
- Metrics That Matter: How to Measure Business Outcomes for Scaled AI Deployments - Helpful for evaluating mining and rule adoption outcomes.
- Assessment Designs That Distinguish AI-Polished Answers From Real Understanding - A strong analogy for building tests that actually measure semantic quality.
- Payer‑to‑Payer APIs as an Operating Model: Governance, Observability and Reliability Patterns - Relevant patterns for making analysis pipelines observable and trustworthy.
FAQ
What is a MU-like graph representation?
It is a higher-level graph model that captures the semantic structure of a code change rather than its exact syntax. The goal is to normalize changes so similar fixes across different languages can be grouped together.
Why build this in TypeScript?
TypeScript offers strong typing, good developer ergonomics, and easy integration with JavaScript tooling. It is especially useful for defining stable schemas, building adapters, and exporting rules to ESLint or editor tooling.
How is this different from AST-based analysis?
AST-based analysis is language-specific and syntax-focused. A MU-like model abstracts away surface differences to represent change intent, which is better for cross-language mining and clustering.
Can this approach work for Java, Python, and JavaScript together?
Yes. You need language-specific parsers or adapters, but they can all emit the same normalized graph schema. That shared schema is what enables cross-language clustering.
How do mined rules reach developers?
Usually through a promotion pipeline that exports the rule to ESLint for fast feedback and to TS Server for editor diagnostics and quick fixes. High-confidence rules can be enforced, while lower-confidence rules can start as warnings.
What should I measure to know if the system is working?
Measure cluster quality, rule precision, recall, acceptance rate, and reduction in recurring defects. Acceptance in code review is especially important because it reflects whether developers find the rule useful.
Related Topics
Ethan Cole
Senior SEO Content Strategist
Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.
Up Next
More stories handpicked for you