The substrate
contradiction-preserving · evidence-first
tutoriallive data

How donto works, one document at a time

donto is a bitemporal, paraconsistent, evidence-first claim substrate. Generating typed knowledge used to be the scarce step; now an LLM emits an unbounded firehose of claims for ~$0.0001 each. donto's job is to hold that firehose without throwing most of it away — keeping contradictions, anchoring every claim to its source, and re-ranking by reality over time. Below, we follow a single source document through all ten stages, end to end, with nothing skipped.

42.4M
believed statements
1.1M
minted predicates
3.2M
evidence links
78.0K
contexts
00

Source document in → answer with provenance out

the pipeline
01

Ingest the source

blob · document · revision

A source document arrives. donto content-addresses it (SHA-256), stores the bytes as a GCS-backed blob, and records a document + revision. Nothing is summarised-and-discarded — the raw resource is kept so every fact can point back to it.

source documentSHA-256 blobdocumentrevision
What ingest stores
stored objectwhat it is
blobcontent-addressed bytes, deduped, GCS-backed
documentthe resource (searchable body + metadata)
revisiona point-in-time version of the document
evidence_linklater: fact → span inside this revision

Rule: never summarise-and-discard. An un-stored source is a bug.

02

Extract free claims

generative abundance

A guided LLM reads the source and emits an unbounded, multi-directional space of subject–predicate–object claims, inventing predicates as it goes. A gleaning loop pushes for coverage, not a count. Typing and joining are deferred — emit free now.

freely-minted predicates · live
ex:knownAs1.1M claims
0487069.2757663.21082376
mem:episodic/chunk486K claims
0487069.2757663.21082376
ex:knownAtLocation331K claims
0487069.2757663.21082376
ex:normalized_claims/predicate_canonical278K claims
0487069.2757663.21082376
ex:datePrecision277K claims
0487069.2757663.21082376
ex:whenText245K claims
0487069.2757663.21082376

1,084,128 distinct predicates — the signature of abundance, not a schema to maintain.

03

Anchor the evidence

always-on citer

A separate, always-on citer attaches each claim to an evidence span in the source — or honestly flags it as an unanchorable hypothesis (never a bogus span). Extraction says WHAT; anchoring says WHERE. This is what makes donto evidence-first.

7.58%
anchor coverage
0100

Every anchored claim carries an evidence span; the rest are honestly flagged as hypotheses — never given a bogus span. 3.2M evidence links and counting.

claimevidence spansource text
04

Hold it — paraconsistently

bitemporal · I3

Claims are written to donto_statement: bitemporal (valid-time + transaction-time) and paraconsistent. Contradictions are legal state held forever — never overwritten (invariant I3: retract/supersede, never DELETE). A vector DB collapses; donto holds.

valid time
when the claim is true in the world
transaction time
when donto came to believe it
belief A · kept
ex:tommy removedTo ex:yarrabah
belief B · kept
ex:tommy hasRemovalOrderTo ex:yarrabah

Both beliefs are kept. 6,487,078 contested subject–predicate pairs live side by side — invariant I3: retract or supersede, never DELETE.

05

Embed into the fabric

bge-small · 384d

Claims, predicates and entities are embedded (bge-small, 384-dim, HNSW) by a distributed coordinator that hands out disjoint work, so CPU and GPU workers scale linearly and never double-embed. This powers semantic recall and alignment.

embedding fabric · bge-small 384d
claim1.3M / 1.3M
0562651.2875235.21250336
entity3.2M / 3.2M
01422265.522124133160590
predicate1.0M / 1.0M
0466085.3725021.51035745

A distributed coordinator hands out disjoint work, so CPU and GPU workers scale linearly and never double-embed.

06

Align at query time

closure · identity-as-hypothesis

Freely-minted predicates are folded by similarity (donto_predicate_closure): occupation ↔ currentJob become comparable without a hand-kept synonym table. Entity identity is a hypothesis resolved at query time, not a destructive merge at write time.

occupation  ─┐
             ├─ closure fold (cos ≥ τ) ─▶ one fact
currentJob  ─┘     no synonym table — learned

ex:coen  ≈  ex:coen-qld     identity = hypothesis,
ex:coen  ≠  ex:mcivor       resolved at query time

Predicates are folded by similarity (donto_predicate_closure); entity identity stays a hypothesis until you ask — never a destructive write-time merge.

07

Measure contradiction

H⁰ / H¹ · cocycles

A cellular sheaf reads the argument, incompatibility and identity layers and computes H⁰ (the consistent core) and H¹ (irreducible disagreement) — including loop conflicts (cocycles) that pairwise edges literally cannot see. Pressure flows to standing.

a real loop conflict · H¹ ≠ 0
ex:tommy ──hasRemovalOrderTo──▶ ex:yarrabah
    │                                  ▲
    └──────────── removedTo ───────────┘
loop residual ‖δx‖ = 2√2 ≈ 2.828  (cocycle)

Two claims about ex:tommy → ex:yarrabah are each locally fine but disagree around the loop — a contradiction the pairwise argument layer cannot see. Total contradiction pressure across the substrate: 8.0M.

08

Rank by reality

⟨maturity, corroboration, pressure, recency⟩

Every claim carries a standing vector. donto re-ranks by reality over time instead of deleting on conflict — corroborated, low-pressure, recent claims rise; contested ones are surfaced, not silently dropped.

the standing vector
maturity82
04570100
corroboration64
04570100
contradiction pressure28
04570100
recency71
04570100

donto re-ranks by reality over time instead of deleting on conflict. Contested claims are surfaced, not silently dropped.

09

Recall + reconcile

FTS + vector + RRF

A question triggers hybrid recall (lexical FTS + semantic vector, fused with RRF), closure-folded and reconciled by a query-time sheaf pass that tags contested claims. Token-efficient, abstention-aware, scoped to the asker.

questionFTSvector ANNRRF fusesheaf reconcilebundle
Hybrid recall arms
armcatches
lexical FTSexact terms, names, IDs
vector ANNmeaning — paraphrases lexical misses
RRFfuses both into one ranking
reconciletags contested claims, re-rank-neutral
10

Return with provenance

fact → span → blob

The answer comes back WITH its trail: each fact links to its evidence span, the span to its revision, the revision to the original blob — plus contradiction tags. Fully auditable: you can always get back to the source sentence.

where it all lives · contexts by domain
Contexts by domain family
familycontexts
claims4,624
books363
budget-smoke7
anchor-test3
cross2
agent1
factevidence spanrevisionblob

The answer returns with its full trail — you can always get back to the source sentence.

11

The live substrate, right now

not claimed — measured
Live substrate metrics
metricvaluewhat it means
currently believed statements42,430,187live claim state
distinct predicates1,084,128freely minted
distinct contexts77,989scopes / provenance
evidence links3,217,207fact → span
contested subject–predicate pairs6,487,078held, not resolved
total contradiction pressure7,999,558Σ ‖δx‖
argument edges3,085typed support / attack
identity hypotheses169same-referent, query-time
consumer namespaces353domains sharing one instance
retracted statements365superseded, never deleted
abundance — emit free, defer the rest
claim volume — grows in all directions

Claim volume grows in all directions; typing & joining happen at query time.

Live from dontosrv /discovery — refreshed every few minutes.

See it for yourself.
donto is one substrate behind memory, genealogy and more. Talk to it through the agent-memory API, the MCP server, or read how it's built.