# Identity as a restriction map: non-destructive entity resolution

_donto papers · original research · 2026-06-13 · method + theory, working draft v1_

**Abstract.** Entity resolution (ER) is conventionally a *merge*: decide two records refer to one entity, fuse them into a single canonical row, and rewrite every reference. That is a destructive, irreversible write, and it is catastrophic on contested data — a wrong early merge is unrecoverable, because the evidence that the two were ever distinct has been overwritten. We argue ER should **never merge**. Represent co-reference as a *learned restriction map* between two entities plus a glue / no-glue verdict read off the obstruction `H¹`; identity then lives as a reversible, evidence-weighted **edge**, and "are these the same?" becomes a query-time decision over a *recomputable* clustering rather than a one-way write. donto can make this argument concretely because its identity layer is already shaped this way, and we verified it against the live `donto-pg` (2026-06-13): `donto_entity_symbol` IRIs are *never* fused (260 symbols, all `status='active'`, none `merged`), co-reference is stored as `donto_identity_edge` (169 bitemporal edges, with an explicit `distinct_referent` "cannot-link" relation), competing resolutions coexist as `donto_identity_hypothesis` overlays (51 of them, including the real `genes-hardening-kitty-disambiguation-2026-05`), and the per-hypothesis clustering (`donto_identity_cluster_cache`, 7,659 rows, 909 flagged `contested`) is a *cache rebuilt from edges by union-find* — delete every cache row and the answer is unchanged. Invariant **I3** (no destructive overwrite; migration `0149_i3_statement_delete_guard`, registered in `migrations.rs`) forbids the destructive merge by construction. We give the construction, show the existing machinery is the non-neural skeleton of the sheaf PRD's Stage-1 identity task, state the theorem we believe (non-destructive ER weakly dominates merge-based ER on contested data, equality on easy cases), and are honest that **the dominance is conjectured, not measured** — donto lacks a disputed-identity benchmark, and building one is the work this paper sets up.

---

## 1. The merge is the bug

Classical ER — from Fellegi–Sunter record linkage to modern embedding-clustering pipelines — terminates in a *fusion*: pick a survivor record, repoint foreign keys, drop the duplicate. The fusion is the point; it is what makes the downstream graph "clean." It is also the failure mode, and it has three distinct costs usually conflated into one:

1. **Irreversibility.** Once `B`'s rows are repointed to `A` and `B` is dropped, the system has *forgotten that B existed as a separate hypothesis*. If later evidence shows `A ≠ B`, there is nothing to split — the un-merge requires information the merge destroyed.
2. **Eager commitment.** The decision is taken at *write* time, on whatever evidence existed then, and frozen. Contested data is precisely the regime where evidence arrives late and reverses early calls.
3. **Single-resolution monism.** A merge admits one clustering of the world. But two competent archivists can legitimately disagree about whether two "Kitty" attestations are one woman or two — and a substrate for contested knowledge must hold *both* resolutions, not adjudicate one away at ingest.

donto's genealogy consumer makes all three concrete and expensive. The live store carries hypotheses literally named `genes-hardening-kitty-disambiguation-2026-05` (19 member symbols) and `genes-hardening-rosie-disambiguation-2026-05` (14): real cases where a single surface name ("Kitty", "Rosie Rosie") fans into many candidate referents, the sources disagree, and the *correct* state is sustained ambiguity — at most resolved later, never by deletion. A merge-based ER here is not merely lossy; it would manufacture a false ancestor and an irreversible native-title error. (See [Capability vs. exercise](/reports/capability-vs-exercise) for how barely this machinery is currently used, and why that is a *latent* asset rather than dead weight.)

The claim of this paper is that all three costs vanish if identity is never a merge but always an **edge with a verdict you can recompute** — and that this is not an aspiration but the shape donto's tables already have.

---

## 2. Identity as a restriction map

### 2.1 The sheaf framing

Following the sheaf construction donto adopts ([the bitemporal sheaf](/papers/bitemporal-sheaf); literature in [sheaf neural networks for donto](/reports/sheaf-neural-networks-for-donto)), put a stalk `F(a)` on each entity — a vector space holding `a`'s local representation (its embedding, or a low-`d` projection; PRD-01 adopts `donto_claim_embedding` for the stalk). A candidate co-reference between entities `a` and `b` is an *edge* `e = (a,b)` carrying restriction maps `R_{a⊴e} : F(a) → F(e)` and `R_{b⊴e} : F(b) → F(e)` into a shared discourse space. The edge's local obstruction is

```
‖ δx ‖_e  =  ‖ R_{a⊴e} x_a  −  R_{b⊴e} x_b ‖
```

and the question "do `a` and `b` glue (denote the same referent)?" is, exactly, **is `H¹` along this edge trivial?** Low obstruction → the two views reconcile under the learned map → *same referent*; high obstruction → they cannot be glued → *distinct referent*. The verdict is **read off geometry, not written into the graph.** This is the PRD-03 §3.2 identity task verbatim: an orthogonal restriction map between the two entities' stalks ("different view, same magnitude"), with confidence a calibrated function of the obstruction norm.

Why a *restriction map* and not a similarity score? A scalar cosine throws away direction. Two attestations of "Kitty" can be near in embedding space yet refer to different women (a high-cosine *distinct* pair), while a person recorded once by maiden name and once by married name may be far in surface space yet glue under the right rotation. The restriction map is the learnable object that captures "these are the *same* entity seen through *different* descriptive frames" — a rotation/projection between views, not a distance. This is also why the no-brittle-logic canon applies cleanly: per PRD-03, the map's weights are *seeded* from `donto_predicate_closure.confidence` / `donto_match_aligned` and *learned* offline, never a hand-maintained alias table.

### 2.2 The verdict is reversible because it is recomputed, not stored

The decisive design move: the *answer* to "is `a = b`?" is never persisted as a mutation of `a` or `b`. It is recomputed from the standing edges. In donto this is literally true — `donto_rebuild_identity_clusters(hypothesis_id)`, a live plpgsql function whose source we read, does on each call:

1. `DELETE FROM donto_identity_cluster_cache WHERE hypothesis_id = …` — throw the entire cached answer away;
2. union-find over the current (`upper(tx_time) IS NULL`) `same_referent` edges above the hypothesis's `threshold_same`, respecting `distinct_referent` edges as hard **cannot-link** blockers above `1 − threshold_distinct`;
3. recompute each symbol's `cluster_rep` and a `contested` flag (set when a cannot-link edge falls *inside* a would-be cluster).

The cache is disposable. The source of truth is the set of bitemporal edges, and every edge can be *retracted* by closing its `tx_time` — never deleted (I3). So a wrong early link is a single reversible row, and un-resolution costs exactly one `tx_time` close + one cluster rebuild. There is nothing to "un-merge" because nothing was ever merged. (Note the rebuild reads only *currently-believed* edges; a retracted edge is excluded from the answer but stays on disk as queryable history — the as-of recovery in §4 depends on exactly this.)

---

## 3. What donto already has (this is assembly, not green-field)

The non-destructive ER skeleton is built and on disk. All figures verified against the live `donto-pg` (2026-06-13):

| ER concept | donto object | live state |
|---|---|---|
| entity that is **never fused** | `donto_entity_symbol` (PK `symbol_id`, unique `iri`) | 260 symbols, all `status='active'` (0 `merged`) |
| co-reference as a **reversible edge** | `donto_identity_edge` (`left_symbol_id < right_symbol_id`, `relation`, `confidence`, bitemporal `tx_time`) | 169 edges |
| the relation vocabulary (incl. *cannot-link*) | enum `{same_referent, possibly_same_referent, distinct_referent, not_enough_information}` | 111 / 16 / 41 / 1 |
| **competing resolutions coexisting** | `donto_identity_hypothesis` (own `threshold_same` / `threshold_distinct`) | 51 hypotheses (incl. the Kitty + Rosie disambiguations) |
| symbol→referent assignment under a hypothesis | `donto_identity_membership` (`posterior`) | 123 rows |
| candidate links awaiting adjudication | `donto_identity_proposal` (`hypothesis_kind`, `status`, `evidence_anchor_ids`) | 320 proposals (by kind: 237 `different_from`, 71 `merge_candidate`, 10 `same_as`, 1 `split_candidate`, 1 `alias_of`) |
| the **recomputed** answer (disposable cache) | `donto_identity_cluster_cache` (`cluster_rep`, `contested`, `contest_edge_ids`) | 7,659 rows; **909 contested across 12 clusters** |
| candidate generation at scale | `donto_entity_embedding` (bge-small, HNSW) + `donto_predicate_closure` | ~3.16M entity vectors; ~1.05M closure rows |
| the rebuild operator | `donto_rebuild_identity_clusters` / `_all` (union-find + cannot-link) | live plpgsql |
| **destructive merge forbidden** | invariant I3, `0149_i3_statement_delete_guard` | enforced trigger on `donto_statement`; ER never touches it |

Three properties fall out for free that a merge-based store cannot offer:

- **Multi-resolution.** Two hypotheses can cluster the *same* symbols differently — a "lumper" (the live tournament hypothesis `…/identity/broad`, `threshold_same = 0.65`) and a "splitter" (`…/identity/strict`, `0.95`) coexist as overlays. The 909 `contested` cache rows are donto *holding* the disagreement rather than resolving it — the paraconsistent stance applied to identity.
- **Bitemporal recovery.** Because edges carry `tx_time`, you can ask "what did hypothesis H believe at transaction-time `t`?" and replay the resolution as it stood — the same as-of machinery the [bitemporal sheaf](/papers/bitemporal-sheaf) uses. A merge has no such history.
- **Evidence-anchored proposals.** `donto_identity_proposal.evidence_anchor_ids` ties every candidate link to source spans, so a glue verdict is auditable and a no-glue verdict (`different_from`) is *first-class*, not the silent default.

The one thing **not** built is the **learned** restriction map of §2.1. Today's 169 edges are seeded almost entirely non-neurally — by `human` (115), `test` (51), `live-test` (1), and only **2** by `embedding+llm_identity_adjudication`. The verdict is currently a *threshold on a stored confidence*, not an `H¹` read off a learned map over stalks. Closing that gap is PRD-03 Stage 1 — and the skeleton above is precisely the additive, I3-safe writeback surface that spec targets (the existing SQL `donto_assert_identity` plus a new `client.assert_identity` wrapper, symbol-keyed, `method='neural'`).

---

## 4. The claim, sharpened: weak dominance on contested data

> **Conjecture.** Let `M` be any merge-based ER policy and `N` the non-destructive edge-and-verdict policy of §2, given the *same* scoring model and the *same* evidence stream arriving in the *same* order. Then on a corpus with disputed identities, `N` weakly dominates `M`: (a) on *easy* instances (evidence never reverses) `N` and `M` reach identical clusterings — equal precision/recall; (b) on *hard* instances (a correct early link is later contradicted, or vice-versa), `N` recovers the post-reversal correct clustering while `M` cannot, because `M` destroyed the information the recovery requires.

The mechanism of (b) is exact and is the heart of the paper. Consider the canonical contested case, instantiated from the live Kitty disambiguation:

1. **`x₁`:** weak evidence links symbols `K_a` ("Kitty", Yarrabah 1898 baptism) and `K_b` ("Kitty", 1907 marriage cert, "spinster"). A `same_referent` edge is asserted, `confidence 0.6`. Both policies now cluster `{K_a, K_b}`.
2. **`x₂`:** the "spinster" attestation surfaces as an *anti-merge* signal (a never-married woman cannot be the earlier married Kitty). Under `N`: assert a `distinct_referent` edge `confidence 0.9`; the next `donto_rebuild_identity_clusters` call sees a cannot-link inside the cluster, **splits `K_a` from `K_b` and flags the residue `contested`** — the `same_referent` edge is retracted (`tx_time` closed) but *retained*, so the history "these were once thought one" stays queryable. Under `M`: `K_b` was fused into `K_a` at `x₁`; the rows are gone; the split is impossible without re-ingesting the source.
3. **`x₃` (reversal-of-the-reversal):** a parish register corroborates that the "spinster" entry was a clerical error. Under `N`: close the `distinct_referent` edge, re-open `same_referent`, rebuild — one transaction, fully recovered. Under `M`: there is no `K_b` to re-link.

`N`'s worst case is `M`'s only case. That asymmetry is what "weak dominance" names, and the recoverable path is *forced* by I3: the destructive merge is unavailable, so the reversible representation is the only one the substrate permits.

**Why only donto can pose this.** The conjecture needs three things simultaneously: (1) entities stored as symbols that are never fused (`donto_entity_symbol`) so an un-resolution has operands; (2) a *cannot-link* relation as first-class as `same` (`distinct_referent`) so a reversal is *representable*, not just an absence; (3) a no-delete invariant (I3) so the retracted edge survives as history. A vector DB (dedups), a normal KG (invalidate-on-conflict), or a record-linkage pipeline (fuses) each lacks at least one. donto has all three already wired — that is the entire reason the claim is *ours* to make.

---

## 5. The experiment that would establish it

The claim is **conjectured, not measured** — the honest center of this paper. What would settle it:

**A disputed-identity benchmark.** The blocker is data. Standard ER benchmarks (the Magellan/DeepMatcher suites; Music/Citation/Product pairs) are *static* — they give a fixed gold "same/different" per pair and reward a one-shot decision, which a merge handles fine. They do not test *reversal*, so they cannot distinguish `N` from `M`. We need a benchmark whose instances carry an **evidence timeline** with at least one reversal, and a gold *trajectory* (the correct clustering after each evidence event), not a single gold label.

The genealogy corpus is the natural source — donto already holds the Kitty and Rosie disambiguations as bitemporally-staged edges with dated source spans (the genes consumer's Kitty dossier documents ~16 distinct candidate "Kittys"; the live hypothesis carries 19 member symbols). The construction:

| step | mechanism | metric |
|---|---|---|
| build timelined instances | for each contested name, order the dated attestations; mark the human-adjudicated reversal points | N instances, each a `(symbol set, ordered evidence, gold trajectory)` |
| run `M` (merge baseline) | greedy fuse on `confidence ≥ threshold_same`, no un-merge | clustering after final event |
| run `N` (donto) | assert/retract edges in timeline order; `donto_rebuild_identity_clusters` after each | clustering after **each** event |
| **easy-case parity** | instances with no reversal | `N` cluster == `M` cluster (must be exact; falsifies the framing if not) |
| **hard-case recovery** | instances with ≥1 reversal | fraction where `N`'s post-reversal clustering matches gold and `M`'s does not — the dominance metric |
| calibration | does the learned-map `H¹` (Stage 1) → confidence track the gold glue/no-glue better than the stored-confidence threshold? | AUC / ECE vs. the `human`/`registry_match` baseline |

The Stage-1 hard gate (PRD-03 §0) requires the read-only `H¹` analytic to first recover the **Case-B Kitty cycle** — the `daughter-of` / `spouse-of` / disjointness loop whose obstruction is a non-trivial 1-cocycle that pairwise `donto_argument` edges cannot see — on a real subgraph. That gate is itself a piece of this benchmark: a cycle-contradiction *is* a multi-entity identity dispute that no pairwise merge decision can resolve, which is why the cohomological reading ([cycle contradictions](/papers/cycle-contradictions)) and the identity reading share an operator.

**What would falsify the conjecture.** Easy-case non-parity (if `N` and `M` disagree on instances with no reversal, the edge representation is leaking and the framing is wrong); or hard-case *non-recovery* (if `N`'s rebuild does not return to gold after a reversal — e.g. because union-find under cannot-link gets stuck in a contested-but-unresolved state). Both are measurable on the benchmark above. A sharp honest negative — *"non-destructive ER recovers, but the contested-limbo rate is high enough that users prefer a confident wrong merge"* — would be a real result.

---

## 6. Status: proven / conjectured / speculative

- **Proven / built (solid).** The non-destructive *representation* is live and verified: symbols are never fused (260 distinct, 0 `merged`), co-reference is reversible bitemporal edges (169) with a first-class cannot-link relation (41 `distinct_referent`), competing hypotheses coexist (51), and the clustering is a union-find cache *rebuilt from edges* (`donto_rebuild_identity_clusters`; 7,659 cache rows, 909 contested across 12 clusters) — delete the cache and the answer is unchanged. I3 + migration `0149` forbid the merge mechanically. That ER *can* be done without merging, on a ~42M-statement store, is demonstrated, not conjectured.
- **Conjectured (open, named failure mode).** The §4 *weak-dominance* claim. The recovery *mechanism* is exact (a reversal is one edge-close + one rebuild, traced end-to-end above), but the *aggregate* dominance is unmeasured — there is no disputed-identity benchmark yet, only the hand-staged Kitty/Rosie cases. Failure mode to watch: union-find under cannot-link can leave symbols in a sustained `contested` limbo (the live store already shows 909 such rows), and if that rate is high, "never wrong but often undecided" may be operationally worse than "occasionally wrong but always decided" for some consumers. This is exactly what the §5 benchmark is built to measure, and it could come back negative.
- **Speculative (flagged).** That the **learned restriction-map verdict** (Stage-1 `H¹` read) beats the current stored-confidence threshold on calibration. Today only 2 of 169 edges come from `embedding+llm_identity_adjudication`; the learned-map path is specified (PRD-03 §3.2) but **unbuilt**, gated behind Stage-0 Case-B recovery. Until that gate is met, the geometric verdict of §2.1 is a design, not a measurement — we flag it as such and do not claim the calibration win.

The contribution is not a leaderboard number; it is a **representation result with a forced corollary**: if identity is an edge with a recomputed verdict and the store never deletes, then entity resolution becomes reversible, and on contested data reversibility is more valuable than the cleanliness a merge buys — at least until the contested-limbo rate proves otherwise. donto is the store where that "if" is already true.

---

_See also: the contradiction operator this verdict reads, [The Bitemporal Sheaf](/papers/bitemporal-sheaf); the multi-entity loops a pairwise merge cannot resolve, [Cycle contradictions](/papers/cycle-contradictions); the math + Stage-1 identity task in [sheaf neural networks for donto](/reports/sheaf-neural-networks-for-donto); how barely the identity machinery is currently exercised, [capability vs. exercise](/reports/capability-vs-exercise); the full program in the [donto research agenda](/papers/research-agenda)._
