Identity as a restriction map: non-destructive entity resolution
donto papers · original research · 2026-06-13 · method + theory, working draft v1
Abstract. Entity resolution (ER) is conventionally a merge: decide two records refer to one entity, fuse them into a single canonical row, and rewrite every reference. That is a destructive, irreversible write, and it is catastrophic on contested data — a wrong early merge is unrecoverable, because the evidence that the two were ever distinct has been overwritten. We argue ER should never merge. Represent co-reference as a learned restriction map between two entities plus a glue / no-glue verdict read off the obstruction H¹; identity then lives as a reversible, evidence-weighted edge, and "are these the same?" becomes a query-time decision over a recomputable clustering rather than a one-way write. donto can make this argument concretely because its identity layer is already shaped this way, and we verified it against the live donto-pg (2026-06-13): donto_entity_symbol IRIs are never fused (260 symbols, all status='active', none merged), co-reference is stored as donto_identity_edge (169 bitemporal edges, with an explicit distinct_referent "cannot-link" relation), competing resolutions coexist as donto_identity_hypothesis overlays (51 of them, including the real genes-hardening-kitty-disambiguation-2026-05), and the per-hypothesis clustering (donto_identity_cluster_cache, 7,659 rows, 909 flagged contested) is a cache rebuilt from edges by union-find — delete every cache row and the answer is unchanged. Invariant I3 (no destructive overwrite; migration 0149_i3_statement_delete_guard, registered in migrations.rs) forbids the destructive merge by construction. We give the construction, show the existing machinery is the non-neural skeleton of the sheaf PRD's Stage-1 identity task, state the theorem we believe (non-destructive ER weakly dominates merge-based ER on contested data, equality on easy cases), and are honest that the dominance is conjectured, not measured — donto lacks a disputed-identity benchmark, and building one is the work this paper sets up.
1. The merge is the bug
Classical ER — from Fellegi–Sunter record linkage to modern embedding-clustering pipelines — terminates in a fusion: pick a survivor record, repoint foreign keys, drop the duplicate. The fusion is the point; it is what makes the downstream graph "clean." It is also the failure mode, and it has three distinct costs usually conflated into one:
- Irreversibility. Once
B's rows are repointed toAandBis dropped, the system has forgotten that B existed as a separate hypothesis. If later evidence showsA ≠ B, there is nothing to split — the un-merge requires information the merge destroyed. - Eager commitment. The decision is taken at write time, on whatever evidence existed then, and frozen. Contested data is precisely the regime where evidence arrives late and reverses early calls.
- Single-resolution monism. A merge admits one clustering of the world. But two competent archivists can legitimately disagree about whether two "Kitty" attestations are one woman or two — and a substrate for contested knowledge must hold both resolutions, not adjudicate one away at ingest.
donto's genealogy consumer makes all three concrete and expensive. The live store carries hypotheses literally named genes-hardening-kitty-disambiguation-2026-05 (19 member symbols) and genes-hardening-rosie-disambiguation-2026-05 (14): real cases where a single surface name ("Kitty", "Rosie Rosie") fans into many candidate referents, the sources disagree, and the correct state is sustained ambiguity — at most resolved later, never by deletion. A merge-based ER here is not merely lossy; it would manufacture a false ancestor and an irreversible native-title error. (See Capability vs. exercise for how barely this machinery is currently used, and why that is a latent asset rather than dead weight.)
The claim of this paper is that all three costs vanish if identity is never a merge but always an edge with a verdict you can recompute — and that this is not an aspiration but the shape donto's tables already have.
2. Identity as a restriction map
2.1 The sheaf framing
Following the sheaf construction donto adopts (the bitemporal sheaf; literature in sheaf neural networks for donto), put a stalk F(a) on each entity — a vector space holding a's local representation (its embedding, or a low-d projection; PRD-01 adopts donto_claim_embedding for the stalk). A candidate co-reference between entities a and b is an edge e = (a,b) carrying restriction maps R_{a⊴e} : F(a) → F(e) and R_{b⊴e} : F(b) → F(e) into a shared discourse space. The edge's local obstruction is
‖ δx ‖_e = ‖ R_{a⊴e} x_a − R_{b⊴e} x_b ‖
and the question "do a and b glue (denote the same referent)?" is, exactly, is H¹ along this edge trivial? Low obstruction → the two views reconcile under the learned map → same referent; high obstruction → they cannot be glued → distinct referent. The verdict is read off geometry, not written into the graph. This is the PRD-03 §3.2 identity task verbatim: an orthogonal restriction map between the two entities' stalks ("different view, same magnitude"), with confidence a calibrated function of the obstruction norm.
Why a restriction map and not a similarity score? A scalar cosine throws away direction. Two attestations of "Kitty" can be near in embedding space yet refer to different women (a high-cosine distinct pair), while a person recorded once by maiden name and once by married name may be far in surface space yet glue under the right rotation. The restriction map is the learnable object that captures "these are the same entity seen through different descriptive frames" — a rotation/projection between views, not a distance. This is also why the no-brittle-logic canon applies cleanly: per PRD-03, the map's weights are seeded from donto_predicate_closure.confidence / donto_match_aligned and learned offline, never a hand-maintained alias table.
2.2 The verdict is reversible because it is recomputed, not stored
The decisive design move: the answer to "is a = b?" is never persisted as a mutation of a or b. It is recomputed from the standing edges. In donto this is literally true — donto_rebuild_identity_clusters(hypothesis_id), a live plpgsql function whose source we read, does on each call:
DELETE FROM donto_identity_cluster_cache WHERE hypothesis_id = …— throw the entire cached answer away;- union-find over the current (
upper(tx_time) IS NULL)same_referentedges above the hypothesis'sthreshold_same, respectingdistinct_referentedges as hard cannot-link blockers above1 − threshold_distinct; - recompute each symbol's
cluster_repand acontestedflag (set when a cannot-link edge falls inside a would-be cluster).
The cache is disposable. The source of truth is the set of bitemporal edges, and every edge can be retracted by closing its tx_time — never deleted (I3). So a wrong early link is a single reversible row, and un-resolution costs exactly one tx_time close + one cluster rebuild. There is nothing to "un-merge" because nothing was ever merged. (Note the rebuild reads only currently-believed edges; a retracted edge is excluded from the answer but stays on disk as queryable history — the as-of recovery in §4 depends on exactly this.)
3. What donto already has (this is assembly, not green-field)
The non-destructive ER skeleton is built and on disk. All figures verified against the live donto-pg (2026-06-13):
| ER concept | donto object | live state |
|---|---|---|
| entity that is never fused | donto_entity_symbol (PK symbol_id, unique iri) |
260 symbols, all status='active' (0 merged) |
| co-reference as a reversible edge | donto_identity_edge (left_symbol_id < right_symbol_id, relation, confidence, bitemporal tx_time) |
169 edges |
| the relation vocabulary (incl. cannot-link) | enum {same_referent, possibly_same_referent, distinct_referent, not_enough_information} |
111 / 16 / 41 / 1 |
| competing resolutions coexisting | donto_identity_hypothesis (own threshold_same / threshold_distinct) |
51 hypotheses (incl. the Kitty + Rosie disambiguations) |
| symbol→referent assignment under a hypothesis | donto_identity_membership (posterior) |
123 rows |
| candidate links awaiting adjudication | donto_identity_proposal (hypothesis_kind, status, evidence_anchor_ids) |
320 proposals (by kind: 237 different_from, 71 merge_candidate, 10 same_as, 1 split_candidate, 1 alias_of) |
| the recomputed answer (disposable cache) | donto_identity_cluster_cache (cluster_rep, contested, contest_edge_ids) |
7,659 rows; 909 contested across 12 clusters |
| candidate generation at scale | donto_entity_embedding (bge-small, HNSW) + donto_predicate_closure |
~3.16M entity vectors; ~1.05M closure rows |
| the rebuild operator | donto_rebuild_identity_clusters / _all (union-find + cannot-link) |
live plpgsql |
| destructive merge forbidden | invariant I3, 0149_i3_statement_delete_guard |
enforced trigger on donto_statement; ER never touches it |
Three properties fall out for free that a merge-based store cannot offer:
- Multi-resolution. Two hypotheses can cluster the same symbols differently — a "lumper" (the live tournament hypothesis
…/identity/broad,threshold_same = 0.65) and a "splitter" (…/identity/strict,0.95) coexist as overlays. The 909contestedcache rows are donto holding the disagreement rather than resolving it — the paraconsistent stance applied to identity. - Bitemporal recovery. Because edges carry
tx_time, you can ask "what did hypothesis H believe at transaction-timet?" and replay the resolution as it stood — the same as-of machinery the bitemporal sheaf uses. A merge has no such history. - Evidence-anchored proposals.
donto_identity_proposal.evidence_anchor_idsties every candidate link to source spans, so a glue verdict is auditable and a no-glue verdict (different_from) is first-class, not the silent default.
The one thing not built is the learned restriction map of §2.1. Today's 169 edges are seeded almost entirely non-neurally — by human (115), test (51), live-test (1), and only 2 by embedding+llm_identity_adjudication. The verdict is currently a threshold on a stored confidence, not an H¹ read off a learned map over stalks. Closing that gap is PRD-03 Stage 1 — and the skeleton above is precisely the additive, I3-safe writeback surface that spec targets (the existing SQL donto_assert_identity plus a new client.assert_identity wrapper, symbol-keyed, method='neural').
4. The claim, sharpened: weak dominance on contested data
Conjecture. Let
Mbe any merge-based ER policy andNthe non-destructive edge-and-verdict policy of §2, given the same scoring model and the same evidence stream arriving in the same order. Then on a corpus with disputed identities,Nweakly dominatesM: (a) on easy instances (evidence never reverses)NandMreach identical clusterings — equal precision/recall; (b) on hard instances (a correct early link is later contradicted, or vice-versa),Nrecovers the post-reversal correct clustering whileMcannot, becauseMdestroyed the information the recovery requires.
The mechanism of (b) is exact and is the heart of the paper. Consider the canonical contested case, instantiated from the live Kitty disambiguation:
x₁: weak evidence links symbolsK_a("Kitty", Yarrabah 1898 baptism) andK_b("Kitty", 1907 marriage cert, "spinster"). Asame_referentedge is asserted,confidence 0.6. Both policies now cluster{K_a, K_b}.x₂: the "spinster" attestation surfaces as an anti-merge signal (a never-married woman cannot be the earlier married Kitty). UnderN: assert adistinct_referentedgeconfidence 0.9; the nextdonto_rebuild_identity_clusterscall sees a cannot-link inside the cluster, splitsK_afromK_band flags the residuecontested— thesame_referentedge is retracted (tx_timeclosed) but retained, so the history "these were once thought one" stays queryable. UnderM:K_bwas fused intoK_aatx₁; the rows are gone; the split is impossible without re-ingesting the source.x₃(reversal-of-the-reversal): a parish register corroborates that the "spinster" entry was a clerical error. UnderN: close thedistinct_referentedge, re-opensame_referent, rebuild — one transaction, fully recovered. UnderM: there is noK_bto re-link.
N's worst case is M's only case. That asymmetry is what "weak dominance" names, and the recoverable path is forced by I3: the destructive merge is unavailable, so the reversible representation is the only one the substrate permits.
Why only donto can pose this. The conjecture needs three things simultaneously: (1) entities stored as symbols that are never fused (donto_entity_symbol) so an un-resolution has operands; (2) a cannot-link relation as first-class as same (distinct_referent) so a reversal is representable, not just an absence; (3) a no-delete invariant (I3) so the retracted edge survives as history. A vector DB (dedups), a normal KG (invalidate-on-conflict), or a record-linkage pipeline (fuses) each lacks at least one. donto has all three already wired — that is the entire reason the claim is ours to make.
5. The experiment that would establish it
The claim is conjectured, not measured — the honest center of this paper. What would settle it:
A disputed-identity benchmark. The blocker is data. Standard ER benchmarks (the Magellan/DeepMatcher suites; Music/Citation/Product pairs) are static — they give a fixed gold "same/different" per pair and reward a one-shot decision, which a merge handles fine. They do not test reversal, so they cannot distinguish N from M. We need a benchmark whose instances carry an evidence timeline with at least one reversal, and a gold trajectory (the correct clustering after each evidence event), not a single gold label.
The genealogy corpus is the natural source — donto already holds the Kitty and Rosie disambiguations as bitemporally-staged edges with dated source spans (the genes consumer's Kitty dossier documents ~16 distinct candidate "Kittys"; the live hypothesis carries 19 member symbols). The construction:
| step | mechanism | metric |
|---|---|---|
| build timelined instances | for each contested name, order the dated attestations; mark the human-adjudicated reversal points | N instances, each a (symbol set, ordered evidence, gold trajectory) |
run M (merge baseline) |
greedy fuse on confidence ≥ threshold_same, no un-merge |
clustering after final event |
run N (donto) |
assert/retract edges in timeline order; donto_rebuild_identity_clusters after each |
clustering after each event |
| easy-case parity | instances with no reversal | N cluster == M cluster (must be exact; falsifies the framing if not) |
| hard-case recovery | instances with ≥1 reversal | fraction where N's post-reversal clustering matches gold and M's does not — the dominance metric |
| calibration | does the learned-map H¹ (Stage 1) → confidence track the gold glue/no-glue better than the stored-confidence threshold? |
AUC / ECE vs. the human/registry_match baseline |
The Stage-1 hard gate (PRD-03 §0) requires the read-only H¹ analytic to first recover the Case-B Kitty cycle — the daughter-of / spouse-of / disjointness loop whose obstruction is a non-trivial 1-cocycle that pairwise donto_argument edges cannot see — on a real subgraph. That gate is itself a piece of this benchmark: a cycle-contradiction is a multi-entity identity dispute that no pairwise merge decision can resolve, which is why the cohomological reading (cycle contradictions) and the identity reading share an operator.
What would falsify the conjecture. Easy-case non-parity (if N and M disagree on instances with no reversal, the edge representation is leaking and the framing is wrong); or hard-case non-recovery (if N's rebuild does not return to gold after a reversal — e.g. because union-find under cannot-link gets stuck in a contested-but-unresolved state). Both are measurable on the benchmark above. A sharp honest negative — "non-destructive ER recovers, but the contested-limbo rate is high enough that users prefer a confident wrong merge" — would be a real result.
6. Status: proven / conjectured / speculative
- Proven / built (solid). The non-destructive representation is live and verified: symbols are never fused (260 distinct, 0
merged), co-reference is reversible bitemporal edges (169) with a first-class cannot-link relation (41distinct_referent), competing hypotheses coexist (51), and the clustering is a union-find cache rebuilt from edges (donto_rebuild_identity_clusters; 7,659 cache rows, 909 contested across 12 clusters) — delete the cache and the answer is unchanged. I3 + migration0149forbid the merge mechanically. That ER can be done without merging, on a ~42M-statement store, is demonstrated, not conjectured. - Conjectured (open, named failure mode). The §4 weak-dominance claim. The recovery mechanism is exact (a reversal is one edge-close + one rebuild, traced end-to-end above), but the aggregate dominance is unmeasured — there is no disputed-identity benchmark yet, only the hand-staged Kitty/Rosie cases. Failure mode to watch: union-find under cannot-link can leave symbols in a sustained
contestedlimbo (the live store already shows 909 such rows), and if that rate is high, "never wrong but often undecided" may be operationally worse than "occasionally wrong but always decided" for some consumers. This is exactly what the §5 benchmark is built to measure, and it could come back negative. - Speculative (flagged). That the learned restriction-map verdict (Stage-1
H¹read) beats the current stored-confidence threshold on calibration. Today only 2 of 169 edges come fromembedding+llm_identity_adjudication; the learned-map path is specified (PRD-03 §3.2) but unbuilt, gated behind Stage-0 Case-B recovery. Until that gate is met, the geometric verdict of §2.1 is a design, not a measurement — we flag it as such and do not claim the calibration win.
The contribution is not a leaderboard number; it is a representation result with a forced corollary: if identity is an edge with a recomputed verdict and the store never deletes, then entity resolution becomes reversible, and on contested data reversibility is more valuable than the cleanliness a merge buys — at least until the contested-limbo rate proves otherwise. donto is the store where that "if" is already true.
See also: the contradiction operator this verdict reads, The Bitemporal Sheaf; the multi-entity loops a pairwise merge cannot resolve, Cycle contradictions; the math + Stage-1 identity task in sheaf neural networks for donto; how barely the identity machinery is currently exercised, capability vs. exercise; the full program in the donto research agenda.