Future substrate report — what donto is still missing
Living document. 2026-06-12. A century-scale substrate gap report: what donto already has, what is still missing, and what has to become true before AI systems can treat donto-like memory as durable infrastructure. For the full reports index, see Reports.
donto now has a credible kernel. It is not yet a civilization substrate.
The kernel is real: append-only claims, bitemporality, paraconsistency, evidence links, policy, lenses, standing, loss reports, release artifacts, and agent discovery. The missing work is larger than another feature wave. donto needs the identity, source, semantic, cryptographic, operational, governance, economic, and human machinery that lets memory survive many models, many institutions, and many generations.
TL;DR
- The current win: donto has the right memory discipline: claims are evidence-linked, contradiction-tolerant, time-aware, lens-scoped, and exported with loss instead of pretending that every view is complete.
- The current gap: the system is still mostly a strong local kernel. It does not yet have universal identity, deep source packages, federation, cryptographic proof chains, governance protocols, scale architecture, or mature human review surfaces.
- The real target: not one world database. The target is an interoperable protocol where many stores can exchange claims, evidence, policies, contradictions, standing, and loss without flattening the context.
- The next move: make the kernel boring, then federate it, then govern it. A memory substrate only matters if outsiders can verify it, communities can constrain it, and future systems can migrate it.
donto should not become "the memory all AI queries." It should become the discipline by which AI memory stays attributable, contestable, policy-aware, and reconstructable.
Century-substrate scorecard
| layer | donto has now | missing before century-scale trust | next build |
|---|---|---|---|
| Claims | bitemporal, paraconsistent statements | richer modal, causal, and domain frames | domain packages + causal/event schemas |
| Evidence | spans, revisions, blobs, evidence links | media-native anchors and source packages | source package manifest + multimodal anchors |
| Identity | hypotheses, clusters, cannot-links | portable identity protocol with reversible merges | signed identity assertions + split/merge ledger |
| Semantics | free predicates + alignment closure | versioned predicate definitions and semantic diffs | predicate registry + compatibility reports |
| Policy | capsules, attestations, restrictiveness order | consent, licensing, jurisdiction, appeal workflows | policy simulation + consent receipts |
| Trust | signed release manifests | signed claims, reviews, runs, and query-result proofs | Merkle release/query proofs |
| Federation | release artifacts, interop exports | cross-instance sync with policy and loss preserved | signed exchange envelopes |
| APIs | HTTP, SDKs, CLI, DontoQL, SPARQL subset, A2A | stable /v1, streaming, subscriptions, conformance |
versioned API namespace |
| Scale | Postgres-native practical kernel | partitioning, cold storage, distributed query, mirrors | 10M/1B benchmark tracks |
| Humans | TUI, docs, reports | review workbenches, contradiction dashboards, governance UI | first review surface |
1. What donto already has
The completed execution work turned many implicit conventions into durable substrate behavior:
| capability | why it matters |
|---|---|
| Append-first statement safety | memory can be corrected without erasing what was believed before |
| Bitemporal statements | donto can distinguish when a fact was true from when donto learned it |
| Paraconsistent storage | disagreement stays queryable instead of being collapsed into a fake winner |
| Evidence links | claims can point back to source spans, revisions, and blobs |
| Lenses | different users/tasks can ask for different views without mutating the substrate |
| Standing | claims can be ranked by maturity, corroboration, pressure, and policy context |
| Loss reports | exports and adapters state what they failed to carry |
| Release artifacts | datasets can be packaged, checked, and published reproducibly |
| Agent discovery | agents can find capabilities instead of hard-coding private assumptions |
That is a serious kernel. The rest of this report is the gap between a kernel and infrastructure.
2. The target shape
A century-scale AI memory substrate has to satisfy several properties at once:
- It remembers without pretending memory is truth.
- It preserves disagreement without turning disagreement into chaos.
- It lets many agents contribute without giving any agent total authority.
- It supports private, community-governed, public, commercial, and scientific knowledge under different rules.
- It lets future agents reconstruct why a belief existed, which evidence supported it, which policies constrained it, and what later work changed it.
- It survives software rewrites, hardware generations, organization collapse, legal changes, and model-family turnover.
- It makes extraction, inference, alignment, ranking, and forgetting accountable.
The shape is therefore a protocol, not a single database:
source bytes
-> anchored observations
-> claims + time + policy
-> contradiction-aware lenses
-> standing + loss reports
-> signed releases / query results
-> agents that know what they may use, quote, train on, or publish
3. Missing foundation map
Identity, sources, and semantics
| foundation | current kernel | missing layer |
|---|---|---|
| Durable identity | cautious identity hypotheses, clusters, edges, cannot-links | persistent subject IDs, signed identity assertions, reversible merges, versioned splits, domain-specific identity criteria, cross-instance exchange |
| Source object depth | documents, revisions, spans, blobs, evidence links | source packages for audio, video, images, code, sensor streams, samples, model checkpoints, OCR/transcript transformations, and partial/corrupt source state |
| Semantic stability | freely minted predicates and alignment closure | predicate definitions with owners/examples/domains/ranges, versioning, compatibility rules, schema packages, semantic diff reports |
Why this is first: memory fails fastest at the boundary between "same thing", "possibly same thing", "not same thing", and "same only under this lens." If identity and predicates drift silently, a century of memory becomes a bag of strings with timestamps.
Governance, trust, and lifecycle
| foundation | current kernel | missing layer |
|---|---|---|
| Memory lifecycle | append-first, delete-resistant history | retention classes, legal/ethical forgetting, sealed knowledge, reversible redaction, tombstoned bytes, aging confidence, compaction with reconstructability |
| Global governance | policy capsules, attestations, allowed actions, restrictiveness ordering | consent receipts, delegated authority, appeal trails, jurisdictional overlays, treaty-like federation, community-governed data spaces |
| Cryptographic trust | signed release manifests | signed claims/reviews/policies/extraction runs, transparent logs, verifiable credentials, Merkle query proofs, key rotation/recovery, post-quantum migration |
Hard part: not "delete or keep." The hard part is recording what can be forgotten, what must remain auditable, what can survive only as a hash/policy event, and what cannot be used for a purpose even if it is still known.
Federation, APIs, and scale
| foundation | current kernel | missing layer |
|---|---|---|
| Federation | release manifests, exports, local service contracts | remote release discovery, signed exchange envelopes, policy-preserving sync, cross-instance identity/predicate alignment, offline-first replication, public mirrors |
| Universal APIs | HTTP, SDKs, CLI, DontoQL, SPARQL subset, OpenAPI, A2A | versioned /v1, streaming query/release APIs, subscriptions, capability negotiation, stable errors, SDK parity, conformance tests |
| Massive scale | Postgres-native store with practical SQL surfaces | partitioning, hot/warm/cold tiers, columnar/vector sidecars, distributed query planning, lens caches, incremental standing recomputation, multi-region mirrors |
Federation rule: copy claims only with their evidence, policy, lineage, standing context, contradictions, and loss reports. Anything less is not interop; it is a misleading export.
Multimodal, causal, and model-native memory
| foundation | current kernel | missing layer |
|---|---|---|
| Multimodal memory | claim-centric document/source model | pixel/region/timecode/table/frame anchors, cross-modal evidence links, model identity for embeddings, drift tracking, preservation formats |
| Causal and counterfactual reasoning | claims, argument links, contradictions | causal claims with intervention semantics, event graphs, mechanism vs correlation, simulation evidence, counterfactual frames, causal standing |
| Model integration | retrieval, MCP tools, agent discovery | memory write protocols, model-output quarantine, standing-aware context assembly, contradiction-aware answering, downstream outcome feedback, training-data audit hooks |
An AI using donto should know when it is reading evidence, when it is reading a summary, when it is reading a contested claim, and when it is not authorized to know something.
Human, economic, adversarial, and formal layers
| foundation | current kernel | missing layer |
|---|---|---|
| Human interfaces | TUI, docs, reports | review workbenches, contradiction dashboards, release review, policy authoring, identity resolution, predicate alignment labs, public citation pages |
| Incentives | contribution is technically possible | credit, curator reputation, stewardship budgets, reuse accounting, licensing hooks, anti-spam economics |
| Adversarial robustness | paraconsistency makes attacks contestable | claim-spam resistance, source-forgery detection, sybil-resistant reputation, poisoning audits, quarantine workflows, incident reports |
| Formal verification | Lean sidecar, shape validation | verified migrations, policy monotonicity, release completeness, lens algebra laws, query loss-report contracts, proof-carrying adapters |
Formal methods should not try to prove every stored claim. They should prove the substrate does not lie about its own behavior.
4. What the future user experience should feel like
An agent encountering a claim should be able to ask:
| question | substrate answer |
|---|---|
| What is the claim? | canonical claim text plus structured subject/predicate/object |
| Who asserted it? | source, extractor, reviewer, institution, or agent identity |
| What supports it? | exact source bytes, span/region/timecode/table/frame, transformation lineage |
| What conflicts with it? | contradictions, rebuttals, cannot-links, suppressed lenses |
| Which policy controls it? | consent, license, jurisdiction, purpose, retention, publication rules |
| How reliable is it here? | standing under the current lens/task, not a global truth score |
| What loss happened? | loss report for the query, release, export, or adapter |
| Can I use it? | explicit reuse/quote/train/publish permissions and refusals |
| What would improve it? | next evidence suggestions, review tasks, identity/predicate gaps |
That is accountable memory: not omniscience, but a reconstructable trail from answer back to source, policy, disagreement, and loss.
5. Roadmap by horizon
| horizon | focus | concrete moves |
|---|---|---|
| Near term | make the kernel boring | keep /reports current, stabilize SDKs around standing/loss, add release conformance tests, version the API, expand adapter loss reports, benchmark bigger stores |
| Medium term | make the substrate federated | signed exchange envelopes, remote release discovery, partial replication, policy-preserving sync, public mirrors, cross-instance identity/predicate alignment |
| Long term | make memory governable | consent receipts, community policy protocols, appeals, stewardship incentives, formal policy proofs, human review institutions |
| Century term | make the protocol outlive this implementation | open specs, independent implementations, archival test vectors, migration proofs, stable release formats, post-quantum signature migration, inheritable governance |
6. Core risk
The biggest risk is not that donto lacks features. The biggest risk is that it becomes a powerful local system without becoming an interoperable public discipline.
A private memory substrate can be useful. A century-scale AI substrate must be legible to outsiders, hostile auditors, future maintainers, communities whose knowledge is stored, agents that did not exist when the data was written, and institutions that need to rely on it after the original operators are gone.
7. Bottom line
donto is missing the layers that turn a strong knowledge kernel into durable civil infrastructure:
| class | missing layers |
|---|---|
| Truth discipline | durable identity, semantic stability, causal reasoning |
| Evidence discipline | deep source objects, multimodal anchors, transformation provenance |
| Governance discipline | memory lifecycle, consent, policy appeals, community authority |
| Trust discipline | signed claims/runs/reviews, transparent logs, query/release proofs |
| Network discipline | federation, universal APIs, conformance, independent implementations |
| Operational discipline | massive scale, mirrors, compaction, observability, incident response |
| Human discipline | review UI, stewardship incentives, domain packages, public citation |
The execution plan made donto credible as a governed paraconsistent substrate. The next challenge is to make it durable, federated, inspectable, and worth trusting for generations.
See also: reports index · the shape of donto's return · single-shot vs agentic memory · benchmarks.