# Sheaf neural networks — the missing mathematics for a contradiction-preserving substrate

_Living document. 2026-06-13 (iteration 5 — refined across five passes; recent references independently verified). A research report on **sheaf neural networks** (cellular-sheaf-based GNNs) and why they may be the natural mathematical framework for donto: a bitemporal, paraconsistent, evidence-first CLAIM substrate. Researched from the primary literature; the donto mapping is the payload. (Note: the term is **sheaf** — from algebraic topology — not "sheath".)_

**One-paragraph thesis.** A standard graph says only *that* two things are connected. A **cellular sheaf** says *how* — it puts a vector space (a "stalk") on every node and edge and a linear "restriction map" on every node→edge incidence that translates one local view into the shared language of the connection. That is exactly donto's world: every claim/entity carries its own local representation; edges between them are *typed* (argument, identity-as-hypothesis, alignment); and crucially **connected things often disagree.** Sheaf theory was built to handle disagreement formally: its **cohomology** measures, computably, whether a set of local views can be glued into one consistent global picture — and *where* they can't. Standard GNNs assume the trivial sheaf (everything lives in one shared space, neighbours should agree) and so **collapse on heterophily and oversmooth** — the precise failure modes donto's claim graph would trigger. Sheaf neural networks remove that assumption. They give donto a principled, learnable operator that **reconciles where claims are consistent and localizes contradiction where they aren't — without ever deleting either side.** That is donto's entire philosophy, expressed as linear algebra over a graph.

---

## 1. What a cellular sheaf is (the 90-second version)

Take a graph `G = (V, E)`. A **cellular sheaf** `F` attaches:

- a vector space `F(v)` — the **stalk** — to each node `v` (its local feature space / "private opinion");
- a vector space `F(e)` to each edge `e` — a shared **discourse space** for that connection;
- a linear **restriction map** `F_{v⊴e} : F(v) → F(e)` for each incident node–edge pair — *how v's local view appears in the shared space of edge e.*

A standard graph is the special case where every stalk is the same space and every restriction map is the identity (the **trivial sheaf**). Sheaf theory is what you get when you stop assuming that.

**Global sections & cohomology.** A *global section* is an assignment of a vector to every node that all the restriction maps agree on across every edge — a globally consistent state. The space of these is the **0-th cohomology `H⁰(F)`**. The **1-st cohomology `H¹(F)`** measures the *obstruction*: non-zero `H¹` means **no globally consistent assignment exists** — the data is genuinely contradictory, and `H¹` quantifies *how much*. This is not a metaphor; it is a computable linear-algebra invariant.

**The sheaf Laplacian.** From the restriction maps you build a **coboundary operator `δ`** (for each edge, the difference of the two incident projections) and the **sheaf Laplacian `L_F = δᵀδ`**. It generalizes the graph Laplacian: its kernel is exactly `H⁰` (the consistent states), and the diffusion `ẋ = −L_F x` flows the system toward consistency *as the restriction maps define it* — not toward a flat global average. The trivial-sheaf case recovers ordinary graph diffusion (and ordinary GNN oversmoothing).

---

## 2. Sheaf neural networks: what they are and what they fix

A **sheaf neural network (SNN)** replaces the graph Laplacian in a GNN's message-passing/diffusion step with a **sheaf Laplacian**, and (in the modern versions) **learns the restriction maps from data**. The lineage:

| work | contribution |
|---|---|
| **Hansen & Gebhart, *Sheaf Neural Networks* (2020)** | First SNN: message-passing via the sheaf Laplacian; stalks can have different dimensions; restriction maps encode per-edge compatibility constraints. |
| **Bodnar, Di Giovanni, Chamberlain, Liò, Bronstein, *Neural Sheaf Diffusion* (NeurIPS 2022)** | The breakthrough: a **topological theory of heterophily and oversmoothing.** Shows GNNs implicitly use the trivial sheaf; a hierarchy of richer (learnable) sheaves provably expands the model's control over its asymptotic behaviour and achieves linear class separation in the diffusion limit. Strong heterophilic-benchmark results. ([arXiv:2202.04579](https://arxiv.org/abs/2202.04579), [code](https://github.com/twitter-research/neural-sheaf-diffusion)) |
| **Barbero et al., *Sheaf NNs with Connection Laplacians* (2022)** | Restriction maps constrained to orthogonal (O(d)) "connection" maps — a geometric, cheaper-to-learn family. |
| **Nonlinear Sheaf Diffusion (2024)** | Nonlinear sheaf Laplacians; richer dynamics (bounded-confidence / antagonistic flows). ([arXiv:2403.00337](https://arxiv.org/pdf/2403.00337)) |
| ***Sheaf theory: from deep geometry to deep learning* (2026, 117pp)** | Survey + generalization of cellular-sheaf concepts to **arbitrary posets**, a new algorithm to compute sheaf cohomology on finite posets, and an open-problems list. ([arXiv:2502.15476](https://arxiv.org/abs/2502.15476)) |

**The equations (from Neural Sheaf Diffusion), so this is concrete, not hand-wavy.** Features live in stalks of small dimension `d` (typically 1–8). The sheaf Laplacian acts at a node as

```
(Δx)_v  =  Σ_{e=(v,u)}  ( x_v  −  R_e · x_u )
```

where `R_e : ℝ^d → ℝ^d` is the (learned) restriction map for edge `e`. Diffusion is `∂x/∂t = −Δx`, discretized into the layer update

```
x^{t+1}  =  x^t  −  α · Δ x^t            (α a learned diffusion coefficient; nonlinearity + channel mixing per layer)
```

The model **learns the restriction maps** in one of three families, trading expressivity for cost: **diagonal** `R_e = diag(σ(w_e))` (cheapest), **orthogonal** `R_e ∈ O(d)` via Cayley transform (geometric, rotation-like — preserves norms, good for "different view, same magnitude"), and **general** `R_e ∈ GL(d)` (full `d×d`, most expressive). On the standard heterophilic node-classification suite (**Texas, Wisconsin, Cornell, Film, Squirrel, Chameleon**) sheaf diffusion beats GCN/GAT — on the order of **+10–15 accuracy points on Squirrel/Chameleon**, where vanilla convolution barely clears chance. For donto the takeaway is the *shape* of the win: it is largest exactly where neighbours disagree.

**Recent developments (2025–2026) — the field is moving toward exactly donto's constraints.** The 2022 breakthrough has fanned out into a research program, and the new directions read like a list of donto's open problems:

| work | contribution | why it matters for donto |
|---|---|---|
| **Bundle Neural Networks (ICLR 2025)** | Replaces full sheaves with **flat vector bundles** — node-level orthogonal maps that make the diffusion a *transport* with no curvature penalty; the discrete BuNN layer is a *special case* of a sheaf NN; trains at scale, mitigates **over-squashing** as well as oversmoothing. ([arXiv:2405.15540](https://arxiv.org/abs/2405.15540)) | The direct answer to this report's "scale unproven" caveat: a *cheaper* restriction-map family (orthogonal, node-level) that keeps the disagreement-aware geometry but runs on big graphs and long-range tasks. The natural Stage-1 default. |
| **Sheaf Attention Networks (2024/25)** | Puts attention on the sheaf — learns the restriction maps *and* per-edge attention weights jointly; **strictly generalizes GAT** (GAT = the trivial-sheaf special case). ([openreview](https://openreview.net/forum?id=LIDvgVjpkZr)) | donto already wants to weight edges by source reliability / standing; this is attention *and* sheaf transport in one operator — the discourse-sheaf source-weighting of §4 with a learnable attention head. |
| **Sheaf Hypergraph Networks (NeurIPS 2023) + Hypergraph Neural Sheaf Diffusion (2025)** | Lifts sheaves to **hypergraphs** — restriction maps from each node into a shared *hyperedge* stalk; captures higher-order (n-ary) relations, not just pairwise. ([arXiv:2309.17116](https://arxiv.org/abs/2309.17116), [arXiv:2505.05702](https://arxiv.org/abs/2505.05702)) | donto's **contexts are n-ary** — a claim binds subject, predicate, object, evidence, time *together*; a `donto_context` is literally a hyperedge over its statements. The hypergraph sheaf is the honest model for a context's internal consistency, not a pairwise approximation. |
| **Copresheaf Topological Neural Networks (NeurIPS 2025)** | A unifying framework: learnable **anisotropic maps on directed edges** subsume GCN/GAT/sheaf-NNs as special cases; co*presheaves* handle *asymmetric* relations natively. ([arXiv:2505.21251](https://arxiv.org/abs/2505.21251)) | donto edges are **directed and typed** (`argues-against`, `daughter-of`, `supersedes`) — asymmetric by nature. Copresheaves are the categorically-correct structure for that, where a plain sheaf's symmetric restriction would lose the direction. |
| **Sheaf ↔ transformer / topos line (Mahadevan and others, 2025–26)** | Recasts attention as a sheaf/topos construction; situates message-passing as a **local-to-global ("gluing") computation**. ([arXiv:2603.14831](https://arxiv.org/abs/2603.14831)) | The conceptual unification: the same gluing math underlies both the transformer reader and the substrate's reconciliation — donto's "emit-free, defer-joining to query time" *is* a local-to-global sheaf computation. |

The trajectory is unmistakable: **higher-order (hypergraph), directed/asymmetric (copresheaf), source-weighted (attention), and scalable (bundle)** — every one of which is a structural feature of donto's claim/context graph rather than a convenience. The field is converging on the exact object donto already is.

**The two failures SNNs were built to remove — both of which donto would hit:**

1. **Heterophily.** Standard GNNs assume connected nodes are *similar* and should be averaged together. On **heterophilic** graphs (neighbours differ), GCNs "saturate slightly above chance." donto's claim graph is *intrinsically heterophilic*: an `argues-against` edge connects claims that **disagree**; an `identity-as-hypothesis` edge connects entities that might *not* be the same. Averaging across those is exactly wrong. Sheaf restriction maps let the model *learn the transformation* between disagreeing neighbours instead of blending them.
2. **Oversmoothing.** Deep GNNs diffuse everything toward one mean vector, erasing local detail. The non-trivial sheaf Laplacian has a richer kernel (`H⁰`) than "the all-ones vector," so sheaf diffusion can run deep **without collapsing distinct claims into mush** — it converges to the *consistent-where-consistent* state, preserving genuine differences.

---

## 3. Why this is donto's mathematics, line for line

donto's design principles each have a precise sheaf-theoretic counterpart. This is the heart of the report.

| donto concept | sheaf-theoretic counterpart |
|---|---|
| A **claim / entity** with its own free-typed local representation | a **node stalk** `F(v)` — a per-node vector space, dimensions need not match |
| **Query-time alignment / predicate folding / identity-as-hypothesis** (`occupation`↔`currentJob`, `Olympus(myth)`↔`Mt.Olympus(geo)`) | **restriction maps** — learned linear maps that translate one node's view into a shared edge space; *identity becomes a map, not an assertion* |
| **Paraconsistency** — hold incompatible claims forever, never resolve-by-deletion | **non-zero `H¹`** — the substrate *represents* inconsistency as a first-class, measured quantity instead of collapsing it |
| **Contradiction pressure** (a standing-v1 component) | the **norm of the disagreement** at an edge / a localized `H¹` contribution — a *computable* contradiction score per claim, per region |
| **"Re-rank by reality over time, don't delete on conflict"** | **sheaf diffusion** `ẋ = −L_F x` — flows toward the consistent subspace while *keeping* the contradictory components visible; reconcile-where-consistent, preserve-where-not |
| **Multi-source evidence fusion** (the same fact attested by many sources) | the canonical result that **"sheaves are the canonical data structure for sensor/data fusion"** (Robinson) — cohomology *measures whether sources can be reconciled and flags where they can't* |
| **Contested knowledge / source reliability** (genealogy: Federal Court vs Lorimer vs Tindale, each an interpretive witness) | **opinion dynamics on discourse sheaves** (Hansen & Ghrist) — private node opinions vs a public discourse space, with restriction maps modeling *what each source commits to publicly*, and the Laplacian driving public consensus **while private disagreement persists** — even modeling deception/propaganda |
| **Trust kernel / standing** ⟨maturity, corroboration, contradiction-pressure, recency⟩ | corroboration ≈ agreement in `H⁰`; contradiction-pressure ≈ `H¹` mass; the sheaf Laplacian gives a *single operator* that produces both |
| **The lens engine** (relationships emerge at the intersection of perspectives) | restriction maps into *different* edge spaces = different lenses; a claim's behaviour under several sheaves = its multi-perspective signature |

The fit is not loose analogy. donto already says "identity is a hypothesis, not a merge"; a sheaf says "identity is a restriction map, and whether two things glue is `H¹`." donto already says "preserve contradictions and re-rank by reality"; sheaf diffusion *is* a re-ranking flow that preserves the contradictory subspace. **Sheaf theory is, essentially, the linear-algebra formalization of the donto canon.**

---

## 4. Concrete ways donto could use it

Ordered by effort vs payoff (to be pressure-tested in later iterations of this report):

1. **`H¹` as a computed contradiction-pressure signal (low effort, high value).** donto already wants a per-claim/per-region "contradiction pressure" in standing v1. Build the sheaf Laplacian over a *sub*graph (a holder's claims, or a contested entity's neighbourhood) with simple restriction maps (even diagonal/scalar to start) and read off the local disagreement norm. This is a principled, computable replacement for ad-hoc contradiction counting — and it *localizes* conflict to specific claim pairs.
2. **Sheaf diffusion for query-time reconciliation (medium).** Today donto folds predicates via similarity at query time. A learned sheaf over the recalled claim neighbourhood would reconcile the *consistent* part (fold `occupation`/`currentJob`) while *holding* the genuinely contradictory part separate — and return both, with the contradiction quantified. This is donto's "emit-free, defer-joining" done with a principled operator.
3. **A sheaf-NN scorer over the claim graph (medium-high).** donto's claim graph is heterophilic and the contradiction/identity machinery is currently *barely exercised* (per the capability-vs-exercise report: `donto_argument` ≈ 0.006% of statements). A Neural-Sheaf-Diffusion model is the natural learner for *this* graph shape — it could rank claims, predict missing argument edges, and resolve identity hypotheses *because it's built for disagreement*, where a vanilla GNN would oversmooth the contradictions away.
4. **Discourse-sheaf source modeling for the genealogy example (high, research-y).** Model each source (court, genealogist, oral history) as a node with a private stalk and a restriction map into a shared "what-happened" discourse space. The harmonic section = the reconcilable core; the `H¹` = the irreducible contested residue. This is exactly the genealogy frontier ("figure out what actually happened across contradictory witnesses") with a rigorous engine.

**Honest caveats (from the literature itself):**
- **Compute.** Sheaf Laplacians and especially cohomology are heavier than graph Laplacians; the survey and the industry write-up both flag cost and the need for specialized optimization. donto would apply it to *recalled subgraphs*, not the 41.5M-statement whole, at least initially.
- **Modeling expertise.** Sheaves are "not plug-and-play" — choosing stalks/restriction-map families is a design act. Start with learnable low-dimensional diagonal/orthogonal maps (the Barbero/Bodnar recipes), not bespoke hand-built sheaves.
- **Scale unproven.** Billion-node sheaf systems are an open problem industry-wide. donto's advantage: it can *scope* the sheaf to a query's neighbourhood, where size is bounded.

---

## 4½. A worked example: contradiction as cohomology (and the part donto can't currently see)

**Case A — a single contested attribute (the easy win).** A real donto example: a person `e` whose *birthplace* is attested as **Coen** by one source and **McIvor** by another (an actual Rosie-search revision). Model it as a 3-node sheaf — two source nodes `s₁, s₂` and the entity `e` — with a 2-dimensional birthplace stalk (`Coen=(1,0)`, `McIvor=(0,1)`). Restriction maps carry each source's asserted value into `e`'s shared birthplace space. A **global section** (an element of `H⁰`) would require a single value both edges agree on. Because `(1,0) ≠ (0,1)`, none exists in that component; the **obstruction lives in `H¹`, and its norm `‖(1,0)−(0,1)‖ = √2` *is* the contradiction-pressure** for that attribute. Re-rank the sources by reliability (a weight on each restriction map) and the harmonic/`H⁰` part shifts toward the trusted value **while `H¹` keeps the dissent on the books** — donto's "re-rank by reality, never delete" as one linear operator.

**Case B — the contradiction donto currently *cannot* detect (the real argument).** donto's contradiction machinery today is **pairwise** (`donto_argument` edges between two claims). Sheaf cohomology sees a strictly larger class of conflict: **cycles that are consistent on every edge but inconsistent around the loop.** Genealogy gives a clean one. Three attestations:

- `A`:  *Kitty* is the **daughter of** Bujilkabu
- `B`:  *Kitty* is the **spouse of** Bujilkabu
- `C`:  daughter-of and spouse-of are **disjoint** for the same pair

Each pair of statements is individually fine; no single `donto_argument` edge fires. But compose the restriction maps around the `Kitty–Bujilkabu` cycle and they **fail to return to the identity** — a non-trivial **1-cocycle**, `H¹ ≠ 0`. (This is exactly the kind of incest-conflict flagged by hand in the EKY-2026 vs Brady-2013 Kitty work — *found by a person, invisible to pairwise edges*.) **Sheaf cohomology turns that hand-detection into an automatic, localizable computation.** This is the single most concrete reason to bring sheaves into donto: it upgrades contradiction detection from "two claims that explicitly negate" to "any loop of claims that can't be globally reconciled" — which is where most real-world contested-knowledge errors actually hide.

## 4¾. A minimal build path (grounded in donto's existing tables)

donto already has every input a sheaf needs — this is assembly, not green-field. A staged plan that fails cheap and only escalates when a stage pays off:

**Stage 0 — `H¹` contradiction-pressure as a read-only analytic (days, no ML training).**
- *Graph:* a **recalled subgraph**, not the 41.5M-statement whole — a holder's claims, or a contested entity's neighbourhood. Edges already exist: `donto_argument`, `donto_identity_edge`, and claim co-occurrence within a context.
- *Stalks:* start `d=1` scalar (or reuse the existing **bge-small claim embeddings** in `donto_claim_embedding` as `d`-dim stalk features — they're already computed).
- *Restriction maps:* start **diagonal** `R_e = diag(σ(w_e))`; seed `w_e` from the **alignment closure** (`donto_predicate_closure` / `donto_match_aligned`) — high-confidence folds → near-identity maps, weak/contested links → small maps. (This reuses the day-0 `iterative_scan` alignment work directly.)
- *Output:* build `δ`, form `L_F = δᵀδ`, compute the per-node disagreement norm `‖(δx)‖` and the bottom of the spectrum (`H⁰` dim, `H¹` mass). Write the per-claim disagreement as a **standing-v1 contradiction-pressure** value, and flag any non-trivial 1-cocycle (the Case-B cycle conflicts) as new `donto_argument`/argument-cluster rows. Pure linear algebra on a bounded subgraph — cheap, idempotent, I3-safe (additive writes only).

**Stage 1 — a learnable sheaf over the claim graph (a real model, scoped).** Port the `twitter-research/neural-sheaf-diffusion` recipe (diagonal → O(d) → general restriction maps, `d`=2–8) onto exported recalled subgraphs. Tasks it's *built* for and donto needs: **predict missing argument edges**, **resolve identity hypotheses** (`identity-as-hypothesis` → a learned restriction map + a glue/no-glue verdict from `H¹`), and **score claims** under contradiction. This is the model that fits a *heterophilic* claim graph where a vanilla GNN oversmooths the disagreement away.

**Stage 2 — query-time sheaf reconciliation in `/recall` (the payoff).** At recall, build the scoped sheaf over the returned claims, run a few sheaf-diffusion steps: fold the consistent part (the `occupation`↔`currentJob` case), return the harmonic/`H⁰` summary as the answer-shaped fact, and attach the `H¹` residue as an explicit "contested — N sources disagree" tag. That is donto's emit-free/defer-joining and its contradiction-preservation **in one operator**, and it slots exactly where the benchmark work said donto's value lives (answer-shaping facets that help even a weak reader — §ref the scorecard).

**Engineering posture (per the donto canon).** Rust-first for anything load-bearing (the Stage-0 analytic can be a `pg_donto` SQL/Rust routine over a subgraph; the sheaf Laplacian is sparse and small once scoped). Prototype Stage-1 in Python/torch over *exported* subgraphs to validate, then port the winning piece into the Cargo workspace — don't scatter it into a side repo. Scope every sheaf to a query's neighbourhood so size stays bounded (the honest answer to the "billion-node sheaf is unproven" caveat: donto never builds the global sheaf).

## 5. The strategic read (and a candor note)

There is industry momentum: a widely-shared write-up argues sheaf theory is "NVIDIA's stealth deep-learning bet," frames it as the **paradigm move beyond GNNs** (upgrading the representational substrate, not just attention/residual tweaks), and points to a legal-reasoning company (**Iqidis**) building *dynamic, contradiction-aware knowledge graphs* — the same shape as donto's contested-knowledge domains. The market-size numbers in that piece are speculative and should be treated as hype-adjacent; the **mathematics and the published benchmark wins on heterophily are not.** ([source](https://www.artificialintelligencemadesimple.com/p/sheaf-theory-nvidias-stealth-deep))

The defensible conclusion: **sheaf neural networks are the most precise existing formalization of what donto claims to be** — a substrate that holds disagreement, aligns at edges, and re-ranks by consistency without deletion. Adopting the *vocabulary* alone (stalks, restriction maps, `H⁰`/`H¹`, sheaf Laplacian) sharpens the canon; adopting the *operator* (a scoped sheaf Laplacian for contradiction-pressure, then a sheaf-NN over the claim graph) is a concrete, literature-backed path to making donto's differentiating machinery load-bearing.

---

## From research to build

This report is the literature grounding; the **no-shortcuts implementation PRD** that turns it into an engineering plan lives in the donto repo at [`docs/sheaf-prd/`](https://github.com/thomasdavis/donto/tree/sheaf-prd/docs/sheaf-prd) (branch `sheaf-prd`) — a 7-document suite (master + data-model + the three stage specs + engineering/rollout, with an authoritative migration ledger), generated by a 28-agent ultracode workflow (map → adversarial design → synthesis) and human-reconciled. The adversarial pass caught correctness bugs the naive design would have shipped — notably that scalar `d=1` restriction maps mathematically cannot produce the Kitty cocycle (so Stage 0 defaults to `d≥2` sign-carrying maps), and that `H¹` requires the up-Laplacian `L₁=δδᵀ`, not just `L_F=δᵀδ`. See also the original-research companions on [/papers](/papers): [The Bitemporal Sheaf](/papers/bitemporal-sheaf) (the temporal generalization) and [the research agenda](/papers/research-agenda).

## References (primary)
- Hansen & Gebhart, *Sheaf Neural Networks* (2020) — [pdf](https://openreview.net/pdf?id=GgcgIJsT8HD)
- Bodnar et al., *Neural Sheaf Diffusion* (NeurIPS 2022) — [arXiv:2202.04579](https://arxiv.org/abs/2202.04579) · [code](https://github.com/twitter-research/neural-sheaf-diffusion)
- Barbero et al., *Sheaf NNs with Connection Laplacians* (2022) — [pdf](https://proceedings.mlr.press/v196/barbero22a/barbero22a.pdf)
- *Nonlinear Sheaf Diffusion in GNNs* (2024) — [arXiv:2403.00337](https://arxiv.org/pdf/2403.00337)
- Hansen & Ghrist, *Opinion Dynamics on Discourse Sheaves* — [pdf](https://www2.math.upenn.edu/~ghrist/preprints/opinion.pdf)
- Robinson, *Sheaves are the canonical data structure for sensor integration* (2017) — [arXiv:1603.01446](https://arxiv.org/pdf/1603.01446)
- *Sheaf theory: from deep geometry to deep learning* (2026 survey) — [arXiv:2502.15476](https://arxiv.org/abs/2502.15476)

## References (recent, 2024–2026)
- *Bundle Neural Networks for message diffusion on graphs* (ICLR 2025) — [arXiv:2405.15540](https://arxiv.org/abs/2405.15540)
- *Sheaf Attention Networks* — [openreview](https://openreview.net/forum?id=LIDvgVjpkZr)
- *Sheaf Hypergraph Networks* (NeurIPS 2023) — [arXiv:2309.17116](https://arxiv.org/abs/2309.17116)
- *Hypergraph Neural Sheaf Diffusion* (2025) — [arXiv:2505.05702](https://arxiv.org/abs/2505.05702)
- *Copresheaf Topological Neural Networks* (2025) — [arXiv:2505.21251](https://arxiv.org/abs/2505.21251)
- *Neural Networks as Local-to-Global Computations* — [arXiv:2603.14831](https://arxiv.org/abs/2603.14831)

_Iteration 5 (2026-06-13): added the recent-developments sweep (bundle / attention / hypergraph / copresheaf / topos lines) and the second reference block — every reference in it independently verified to resolve (titles/authors/venues confirmed: Hajij et al. CTNN NeurIPS 2025; Bosca & Ghrist 2026; Choi/Kim/Oh HNSD 2025; Bundle NNs ICLR 2025; Sheaf Attention OpenReview). Earlier iterations built the cellular-sheaf primer, the Neural-Sheaf-Diffusion equations + heterophily numbers, the line-for-line donto mapping, the worked `H¹`-as-contradiction-pressure examples (Coen-vs-McIvor; the Kitty daughter-vs-spouse cycle pairwise edges can't see), and the staged build path grounded in donto's existing tables._