Identity as a restriction map: non-destructive entity resolution

donto papers · original research · 2026-06-13 · method + theory, working draft v1

Abstract. Entity resolution (ER) is conventionally a merge: decide two records refer to one entity, fuse them into a single canonical row, and rewrite every reference. That is a destructive, irreversible write, and it is catastrophic on contested data — a wrong early merge is unrecoverable, because the evidence that the two were ever distinct has been overwritten. We argue ER should never merge. Represent co-reference as a learned restriction map between two entities plus a glue / no-glue verdict read off the obstruction H¹; identity then lives as a reversible, evidence-weighted edge, and "are these the same?" becomes a query-time decision over a recomputable clustering rather than a one-way write. donto can make this argument concretely because its identity layer is already shaped this way, and we verified it against the live donto-pg (2026-06-13): donto_entity_symbol IRIs are never fused (260 symbols, all status='active', none merged), co-reference is stored as donto_identity_edge (169 bitemporal edges, with an explicit distinct_referent "cannot-link" relation), competing resolutions coexist as donto_identity_hypothesis overlays (51 of them, including the real genes-hardening-kitty-disambiguation-2026-05), and the per-hypothesis clustering (donto_identity_cluster_cache, 7,659 rows, 909 flagged contested) is a cache rebuilt from edges by union-find — delete every cache row and the answer is unchanged. Invariant I3 (no destructive overwrite; migration 0149_i3_statement_delete_guard, registered in migrations.rs) forbids the destructive merge by construction. We give the construction, show the existing machinery is the non-neural skeleton of the sheaf PRD's Stage-1 identity task, state the theorem we believe (non-destructive ER weakly dominates merge-based ER on contested data, equality on easy cases), and are honest that the dominance is conjectured, not measured — donto lacks a disputed-identity benchmark, and building one is the work this paper sets up.

1. The merge is the bug

Classical ER — from Fellegi–Sunter record linkage to modern embedding-clustering pipelines — terminates in a fusion: pick a survivor record, repoint foreign keys, drop the duplicate. The fusion is the point; it is what makes the downstream graph "clean." It is also the failure mode, and it has three distinct costs usually conflated into one:

Irreversibility. Once B's rows are repointed to A and B is dropped, the system has forgotten that B existed as a separate hypothesis. If later evidence shows A ≠ B, there is nothing to split — the un-merge requires information the merge destroyed.
Eager commitment. The decision is taken at write time, on whatever evidence existed then, and frozen. Contested data is precisely the regime where evidence arrives late and reverses early calls.
Single-resolution monism. A merge admits one clustering of the world. But two competent archivists can legitimately disagree about whether two "Kitty" attestations are one woman or two — and a substrate for contested knowledge must hold both resolutions, not adjudicate one away at ingest.

donto's genealogy consumer makes all three concrete and expensive. The live store carries hypotheses literally named genes-hardening-kitty-disambiguation-2026-05 (19 member symbols) and genes-hardening-rosie-disambiguation-2026-05 (14): real cases where a single surface name ("Kitty", "Rosie Rosie") fans into many candidate referents, the sources disagree, and the correct state is sustained ambiguity — at most resolved later, never by deletion. A merge-based ER here is not merely lossy; it would manufacture a false ancestor and an irreversible native-title error. (See Capability vs. exercise for how barely this machinery is currently used, and why that is a latent asset rather than dead weight.)

The claim of this paper is that all three costs vanish if identity is never a merge but always an edge with a verdict you can recompute — and that this is not an aspiration but the shape donto's tables already have.

2. Identity as a restriction map

2.1 The sheaf framing

Following the sheaf construction donto adopts (the bitemporal sheaf; literature in sheaf neural networks for donto), put a stalk F(a) on each entity — a vector space holding a's local representation (its embedding, or a low-d projection; PRD-01 adopts donto_claim_embedding for the stalk). A candidate co-reference between entities a and b is an edge e = (a,b) carrying restriction maps R_{a⊴e} : F(a) → F(e) and R_{b⊴e} : F(b) → F(e) into a shared discourse space. The edge's local obstruction is

‖ δx ‖_e  =  ‖ R_{a⊴e} x_a  −  R_{b⊴e} x_b ‖

and the question "do a and b glue (denote the same referent)?" is, exactly, is H¹ along this edge trivial? Low obstruction → the two views reconcile under the learned map → same referent; high obstruction → they cannot be glued → distinct referent. The verdict is read off geometry, not written into the graph. This is the PRD-03 §3.2 identity task verbatim: an orthogonal restriction map between the two entities' stalks ("different view, same magnitude"), with confidence a calibrated function of the obstruction norm.

Why a restriction map and not a similarity score? A scalar cosine throws away direction. Two attestations of "Kitty" can be near in embedding space yet refer to different women (a high-cosine distinct pair), while a person recorded once by maiden name and once by married name may be far in surface space yet glue under the right rotation. The restriction map is the learnable object that captures "these are the same entity seen through different descriptive frames" — a rotation/projection between views, not a distance. This is also why the no-brittle-logic canon applies cleanly: per PRD-03, the map's weights are seeded from donto_predicate_closure.confidence / donto_match_aligned and learned offline, never a hand-maintained alias table.

2.2 The verdict is reversible because it is recomputed, not stored

The decisive design move: the answer to "is a = b?" is never persisted as a mutation of a or b. It is recomputed from the standing edges. In donto this is literally true — donto_rebuild_identity_clusters(hypothesis_id), a live plpgsql function whose source we read, does on each call:

DELETE FROM donto_identity_cluster_cache WHERE hypothesis_id = … — throw the entire cached answer away;
union-find over the current (upper(tx_time) IS NULL) same_referent edges above the hypothesis's threshold_same, respecting distinct_referent edges as hard cannot-link blockers above 1 − threshold_distinct;
recompute each symbol's cluster_rep and a contested flag (set when a cannot-link edge falls inside a would-be cluster).

The cache is disposable. The source of truth is the set of bitemporal edges, and every edge can be retracted by closing its tx_time — never deleted (I3). So a wrong early link is a single reversible row, and un-resolution costs exactly one tx_time close + one cluster rebuild. There is nothing to "un-merge" because nothing was ever merged. (Note the rebuild reads only currently-believed edges; a retracted edge is excluded from the answer but stays on disk as queryable history — the as-of recovery in §4 depends on exactly this.)

3. What donto already has (this is assembly, not green-field)

The non-destructive ER skeleton is built and on disk. All figures verified against the live donto-pg (2026-06-13):

ER concept	donto object	live state
entity that is never fused	`donto_entity_symbol` (PK `symbol_id`, unique `iri`)	260 symbols, all `status='active'` (0 `merged`)
co-reference as a reversible edge	`donto_identity_edge` (`left_symbol_id < right_symbol_id`, `relation`, `confidence`, bitemporal `tx_time`)	169 edges
the relation vocabulary (incl. cannot-link)	enum `{same_referent, possibly_same_referent, distinct_referent, not_enough_information}`	111 / 16 / 41 / 1
competing resolutions coexisting	`donto_identity_hypothesis` (own `threshold_same` / `threshold_distinct`)	51 hypotheses (incl. the Kitty + Rosie disambiguations)
symbol→referent assignment under a hypothesis	`donto_identity_membership` (`posterior`)	123 rows
candidate links awaiting adjudication	`donto_identity_proposal` (`hypothesis_kind`, `status`, `evidence_anchor_ids`)	320 proposals (by kind: 237 `different_from`, 71 `merge_candidate`, 10 `same_as`, 1 `split_candidate`, 1 `alias_of`)
the recomputed answer (disposable cache)	`donto_identity_cluster_cache` (`cluster_rep`, `contested`, `contest_edge_ids`)	7,659 rows; 909 contested across 12 clusters
candidate generation at scale	`donto_entity_embedding` (bge-small, HNSW) + `donto_predicate_closure`	~3.16M entity vectors; ~1.05M closure rows
the rebuild operator	`donto_rebuild_identity_clusters` / `_all` (union-find + cannot-link)	live plpgsql
destructive merge forbidden	invariant I3, `0149_i3_statement_delete_guard`	enforced trigger on `donto_statement`; ER never touches it

Three properties fall out for free that a merge-based store cannot offer:

Multi-resolution. Two hypotheses can cluster the same symbols differently — a "lumper" (the live tournament hypothesis …/identity/broad, threshold_same = 0.65) and a "splitter" (…/identity/strict, 0.95) coexist as overlays. The 909 contested cache rows are donto holding the disagreement rather than resolving it — the paraconsistent stance applied to identity.
Bitemporal recovery. Because edges carry tx_time, you can ask "what did hypothesis H believe at transaction-time t?" and replay the resolution as it stood — the same as-of machinery the bitemporal sheaf uses. A merge has no such history.
Evidence-anchored proposals. donto_identity_proposal.evidence_anchor_ids ties every candidate link to source spans, so a glue verdict is auditable and a no-glue verdict (different_from) is first-class, not the silent default.

The one thing not built is the learned restriction map of §2.1. Today's 169 edges are seeded almost entirely non-neurally — by human (115), test (51), live-test (1), and only 2 by embedding+llm_identity_adjudication. The verdict is currently a threshold on a stored confidence, not an H¹ read off a learned map over stalks. Closing that gap is PRD-03 Stage 1 — and the skeleton above is precisely the additive, I3-safe writeback surface that spec targets (the existing SQL donto_assert_identity plus a new client.assert_identity wrapper, symbol-keyed, method='neural').

4. The claim, sharpened: weak dominance on contested data

Conjecture. Let M be any merge-based ER policy and N the non-destructive edge-and-verdict policy of §2, given the same scoring model and the same evidence stream arriving in the same order. Then on a corpus with disputed identities, N weakly dominates M: (a) on easy instances (evidence never reverses) N and M reach identical clusterings — equal precision/recall; (b) on hard instances (a correct early link is later contradicted, or vice-versa), N recovers the post-reversal correct clustering while M cannot, because M destroyed the information the recovery requires.

The mechanism of (b) is exact and is the heart of the paper. Consider the canonical contested case, instantiated from the live Kitty disambiguation:

x₁: weak evidence links symbols K_a ("Kitty", Yarrabah 1898 baptism) and K_b ("Kitty", 1907 marriage cert, "spinster"). A same_referent edge is asserted, confidence 0.6. Both policies now cluster {K_a, K_b}.
x₂: the "spinster" attestation surfaces as an anti-merge signal (a never-married woman cannot be the earlier married Kitty). Under N: assert a distinct_referent edge confidence 0.9; the next donto_rebuild_identity_clusters call sees a cannot-link inside the cluster, splits K_a from K_b and flags the residue contested — the same_referent edge is retracted (tx_time closed) but retained, so the history "these were once thought one" stays queryable. Under M: K_b was fused into K_a at x₁; the rows are gone; the split is impossible without re-ingesting the source.
x₃ (reversal-of-the-reversal): a parish register corroborates that the "spinster" entry was a clerical error. Under N: close the distinct_referent edge, re-open same_referent, rebuild — one transaction, fully recovered. Under M: there is no K_b to re-link.

N's worst case is M's only case. That asymmetry is what "weak dominance" names, and the recoverable path is forced by I3: the destructive merge is unavailable, so the reversible representation is the only one the substrate permits.

Why only donto can pose this. The conjecture needs three things simultaneously: (1) entities stored as symbols that are never fused (donto_entity_symbol) so an un-resolution has operands; (2) a cannot-link relation as first-class as same (distinct_referent) so a reversal is representable, not just an absence; (3) a no-delete invariant (I3) so the retracted edge survives as history. A vector DB (dedups), a normal KG (invalidate-on-conflict), or a record-linkage pipeline (fuses) each lacks at least one. donto has all three already wired — that is the entire reason the claim is ours to make.

5. The experiment that would establish it

The claim is conjectured, not measured — the honest center of this paper. What would settle it:

A disputed-identity benchmark. The blocker is data. Standard ER benchmarks (the Magellan/DeepMatcher suites; Music/Citation/Product pairs) are static — they give a fixed gold "same/different" per pair and reward a one-shot decision, which a merge handles fine. They do not test reversal, so they cannot distinguish N from M. We need a benchmark whose instances carry an evidence timeline with at least one reversal, and a gold trajectory (the correct clustering after each evidence event), not a single gold label.

The genealogy corpus is the natural source — donto already holds the Kitty and Rosie disambiguations as bitemporally-staged edges with dated source spans (the genes consumer's Kitty dossier documents ~16 distinct candidate "Kittys"; the live hypothesis carries 19 member symbols). The construction:

step	mechanism	metric
build timelined instances	for each contested name, order the dated attestations; mark the human-adjudicated reversal points	N instances, each a `(symbol set, ordered evidence, gold trajectory)`
run `M` (merge baseline)	greedy fuse on `confidence ≥ threshold_same`, no un-merge	clustering after final event
run `N` (donto)	assert/retract edges in timeline order; `donto_rebuild_identity_clusters` after each	clustering after each event
easy-case parity	instances with no reversal	`N` cluster == `M` cluster (must be exact; falsifies the framing if not)
hard-case recovery	instances with ≥1 reversal	fraction where `N`'s post-reversal clustering matches gold and `M`'s does not — the dominance metric
calibration	does the learned-map `H¹` (Stage 1) → confidence track the gold glue/no-glue better than the stored-confidence threshold?	AUC / ECE vs. the `human`/`registry_match` baseline

The Stage-1 hard gate (PRD-03 §0) requires the read-only H¹ analytic to first recover the Case-B Kitty cycle — the daughter-of / spouse-of / disjointness loop whose obstruction is a non-trivial 1-cocycle that pairwise donto_argument edges cannot see — on a real subgraph. That gate is itself a piece of this benchmark: a cycle-contradiction is a multi-entity identity dispute that no pairwise merge decision can resolve, which is why the cohomological reading (cycle contradictions) and the identity reading share an operator.

What would falsify the conjecture. Easy-case non-parity (if N and M disagree on instances with no reversal, the edge representation is leaking and the framing is wrong); or hard-case non-recovery (if N's rebuild does not return to gold after a reversal — e.g. because union-find under cannot-link gets stuck in a contested-but-unresolved state). Both are measurable on the benchmark above. A sharp honest negative — "non-destructive ER recovers, but the contested-limbo rate is high enough that users prefer a confident wrong merge" — would be a real result.

6. Status: proven / conjectured / speculative

Proven / built (solid). The non-destructive representation is live and verified: symbols are never fused (260 distinct, 0 merged), co-reference is reversible bitemporal edges (169) with a first-class cannot-link relation (41 distinct_referent), competing hypotheses coexist (51), and the clustering is a union-find cache rebuilt from edges (donto_rebuild_identity_clusters; 7,659 cache rows, 909 contested across 12 clusters) — delete the cache and the answer is unchanged. I3 + migration 0149 forbid the merge mechanically. That ER can be done without merging, on a ~42M-statement store, is demonstrated, not conjectured.
Conjectured (open, named failure mode). The §4 weak-dominance claim. The recovery mechanism is exact (a reversal is one edge-close + one rebuild, traced end-to-end above), but the aggregate dominance is unmeasured — there is no disputed-identity benchmark yet, only the hand-staged Kitty/Rosie cases. Failure mode to watch: union-find under cannot-link can leave symbols in a sustained contested limbo (the live store already shows 909 such rows), and if that rate is high, "never wrong but often undecided" may be operationally worse than "occasionally wrong but always decided" for some consumers. This is exactly what the §5 benchmark is built to measure, and it could come back negative.
Speculative (flagged). That the learned restriction-map verdict (Stage-1 H¹ read) beats the current stored-confidence threshold on calibration. Today only 2 of 169 edges come from embedding+llm_identity_adjudication; the learned-map path is specified (PRD-03 §3.2) but unbuilt, gated behind Stage-0 Case-B recovery. Until that gate is met, the geometric verdict of §2.1 is a design, not a measurement — we flag it as such and do not claim the calibration win.

The contribution is not a leaderboard number; it is a representation result with a forced corollary: if identity is an edge with a recomputed verdict and the store never deletes, then entity resolution becomes reversible, and on contested data reversibility is more valuable than the cleanliness a merge buys — at least until the contested-limbo rate proves otherwise. donto is the store where that "if" is already true.

See also: the contradiction operator this verdict reads, The Bitemporal Sheaf; the multi-entity loops a pairwise merge cannot resolve, Cycle contradictions; the math + Stage-1 identity task in sheaf neural networks for donto; how barely the identity machinery is currently exercised, capability vs. exercise; the full program in the donto research agenda.