Future substrate report — what donto is still missing

Living document. 2026-06-12. A century-scale substrate gap report: what donto already has, what is still missing, and what has to become true before AI systems can treat donto-like memory as durable infrastructure. For the full reports index, see Reports.

donto now has a credible kernel. It is not yet a civilization substrate.

The kernel is real: append-only claims, bitemporality, paraconsistency, evidence links, policy, lenses, standing, loss reports, release artifacts, and agent discovery. The missing work is larger than another feature wave. donto needs the identity, source, semantic, cryptographic, operational, governance, economic, and human machinery that lets memory survive many models, many institutions, and many generations.

TL;DR

The current win: donto has the right memory discipline: claims are evidence-linked, contradiction-tolerant, time-aware, lens-scoped, and exported with loss instead of pretending that every view is complete.
The current gap: the system is still mostly a strong local kernel. It does not yet have universal identity, deep source packages, federation, cryptographic proof chains, governance protocols, scale architecture, or mature human review surfaces.
The real target: not one world database. The target is an interoperable protocol where many stores can exchange claims, evidence, policies, contradictions, standing, and loss without flattening the context.
The next move: make the kernel boring, then federate it, then govern it. A memory substrate only matters if outsiders can verify it, communities can constrain it, and future systems can migrate it.

donto should not become "the memory all AI queries." It should become the discipline by which AI memory stays attributable, contestable, policy-aware, and reconstructable.

Century-substrate scorecard

layer	donto has now	missing before century-scale trust	next build
Claims	bitemporal, paraconsistent statements	richer modal, causal, and domain frames	domain packages + causal/event schemas
Evidence	spans, revisions, blobs, evidence links	media-native anchors and source packages	source package manifest + multimodal anchors
Identity	hypotheses, clusters, cannot-links	portable identity protocol with reversible merges	signed identity assertions + split/merge ledger
Semantics	free predicates + alignment closure	versioned predicate definitions and semantic diffs	predicate registry + compatibility reports
Policy	capsules, attestations, restrictiveness order	consent, licensing, jurisdiction, appeal workflows	policy simulation + consent receipts
Trust	signed release manifests	signed claims, reviews, runs, and query-result proofs	Merkle release/query proofs
Federation	release artifacts, interop exports	cross-instance sync with policy and loss preserved	signed exchange envelopes
APIs	HTTP, SDKs, CLI, DontoQL, SPARQL subset, A2A	stable `/v1`, streaming, subscriptions, conformance	versioned API namespace
Scale	Postgres-native practical kernel	partitioning, cold storage, distributed query, mirrors	10M/1B benchmark tracks
Humans	TUI, docs, reports	review workbenches, contradiction dashboards, governance UI	first review surface

1. What donto already has

The completed execution work turned many implicit conventions into durable substrate behavior:

capability	why it matters
Append-first statement safety	memory can be corrected without erasing what was believed before
Bitemporal statements	donto can distinguish when a fact was true from when donto learned it
Paraconsistent storage	disagreement stays queryable instead of being collapsed into a fake winner
Evidence links	claims can point back to source spans, revisions, and blobs
Lenses	different users/tasks can ask for different views without mutating the substrate
Standing	claims can be ranked by maturity, corroboration, pressure, and policy context
Loss reports	exports and adapters state what they failed to carry
Release artifacts	datasets can be packaged, checked, and published reproducibly
Agent discovery	agents can find capabilities instead of hard-coding private assumptions

That is a serious kernel. The rest of this report is the gap between a kernel and infrastructure.

2. The target shape

A century-scale AI memory substrate has to satisfy several properties at once:

It remembers without pretending memory is truth.
It preserves disagreement without turning disagreement into chaos.
It lets many agents contribute without giving any agent total authority.
It supports private, community-governed, public, commercial, and scientific knowledge under different rules.
It lets future agents reconstruct why a belief existed, which evidence supported it, which policies constrained it, and what later work changed it.
It survives software rewrites, hardware generations, organization collapse, legal changes, and model-family turnover.
It makes extraction, inference, alignment, ranking, and forgetting accountable.

The shape is therefore a protocol, not a single database:

source bytes
  -> anchored observations
  -> claims + time + policy
  -> contradiction-aware lenses
  -> standing + loss reports
  -> signed releases / query results
  -> agents that know what they may use, quote, train on, or publish

3. Missing foundation map

Identity, sources, and semantics

foundation	current kernel	missing layer
Durable identity	cautious identity hypotheses, clusters, edges, cannot-links	persistent subject IDs, signed identity assertions, reversible merges, versioned splits, domain-specific identity criteria, cross-instance exchange
Source object depth	documents, revisions, spans, blobs, evidence links	source packages for audio, video, images, code, sensor streams, samples, model checkpoints, OCR/transcript transformations, and partial/corrupt source state
Semantic stability	freely minted predicates and alignment closure	predicate definitions with owners/examples/domains/ranges, versioning, compatibility rules, schema packages, semantic diff reports

Why this is first: memory fails fastest at the boundary between "same thing", "possibly same thing", "not same thing", and "same only under this lens." If identity and predicates drift silently, a century of memory becomes a bag of strings with timestamps.

Governance, trust, and lifecycle

foundation	current kernel	missing layer
Memory lifecycle	append-first, delete-resistant history	retention classes, legal/ethical forgetting, sealed knowledge, reversible redaction, tombstoned bytes, aging confidence, compaction with reconstructability
Global governance	policy capsules, attestations, allowed actions, restrictiveness ordering	consent receipts, delegated authority, appeal trails, jurisdictional overlays, treaty-like federation, community-governed data spaces
Cryptographic trust	signed release manifests	signed claims/reviews/policies/extraction runs, transparent logs, verifiable credentials, Merkle query proofs, key rotation/recovery, post-quantum migration

Hard part: not "delete or keep." The hard part is recording what can be forgotten, what must remain auditable, what can survive only as a hash/policy event, and what cannot be used for a purpose even if it is still known.

Federation, APIs, and scale

foundation	current kernel	missing layer
Federation	release manifests, exports, local service contracts	remote release discovery, signed exchange envelopes, policy-preserving sync, cross-instance identity/predicate alignment, offline-first replication, public mirrors
Universal APIs	HTTP, SDKs, CLI, DontoQL, SPARQL subset, OpenAPI, A2A	versioned `/v1`, streaming query/release APIs, subscriptions, capability negotiation, stable errors, SDK parity, conformance tests
Massive scale	Postgres-native store with practical SQL surfaces	partitioning, hot/warm/cold tiers, columnar/vector sidecars, distributed query planning, lens caches, incremental standing recomputation, multi-region mirrors

Federation rule: copy claims only with their evidence, policy, lineage, standing context, contradictions, and loss reports. Anything less is not interop; it is a misleading export.

Multimodal, causal, and model-native memory

foundation	current kernel	missing layer
Multimodal memory	claim-centric document/source model	pixel/region/timecode/table/frame anchors, cross-modal evidence links, model identity for embeddings, drift tracking, preservation formats
Causal and counterfactual reasoning	claims, argument links, contradictions	causal claims with intervention semantics, event graphs, mechanism vs correlation, simulation evidence, counterfactual frames, causal standing
Model integration	retrieval, MCP tools, agent discovery	memory write protocols, model-output quarantine, standing-aware context assembly, contradiction-aware answering, downstream outcome feedback, training-data audit hooks

An AI using donto should know when it is reading evidence, when it is reading a summary, when it is reading a contested claim, and when it is not authorized to know something.

Human, economic, adversarial, and formal layers

foundation	current kernel	missing layer
Human interfaces	TUI, docs, reports	review workbenches, contradiction dashboards, release review, policy authoring, identity resolution, predicate alignment labs, public citation pages
Incentives	contribution is technically possible	credit, curator reputation, stewardship budgets, reuse accounting, licensing hooks, anti-spam economics
Adversarial robustness	paraconsistency makes attacks contestable	claim-spam resistance, source-forgery detection, sybil-resistant reputation, poisoning audits, quarantine workflows, incident reports
Formal verification	Lean sidecar, shape validation	verified migrations, policy monotonicity, release completeness, lens algebra laws, query loss-report contracts, proof-carrying adapters

Formal methods should not try to prove every stored claim. They should prove the substrate does not lie about its own behavior.

4. What the future user experience should feel like

An agent encountering a claim should be able to ask:

question	substrate answer
What is the claim?	canonical claim text plus structured subject/predicate/object
Who asserted it?	source, extractor, reviewer, institution, or agent identity
What supports it?	exact source bytes, span/region/timecode/table/frame, transformation lineage
What conflicts with it?	contradictions, rebuttals, cannot-links, suppressed lenses
Which policy controls it?	consent, license, jurisdiction, purpose, retention, publication rules
How reliable is it here?	standing under the current lens/task, not a global truth score
What loss happened?	loss report for the query, release, export, or adapter
Can I use it?	explicit reuse/quote/train/publish permissions and refusals
What would improve it?	next evidence suggestions, review tasks, identity/predicate gaps

That is accountable memory: not omniscience, but a reconstructable trail from answer back to source, policy, disagreement, and loss.

5. Roadmap by horizon

horizon	focus	concrete moves
Near term	make the kernel boring	keep `/reports` current, stabilize SDKs around standing/loss, add release conformance tests, version the API, expand adapter loss reports, benchmark bigger stores
Medium term	make the substrate federated	signed exchange envelopes, remote release discovery, partial replication, policy-preserving sync, public mirrors, cross-instance identity/predicate alignment
Long term	make memory governable	consent receipts, community policy protocols, appeals, stewardship incentives, formal policy proofs, human review institutions
Century term	make the protocol outlive this implementation	open specs, independent implementations, archival test vectors, migration proofs, stable release formats, post-quantum signature migration, inheritable governance

6. Core risk

The biggest risk is not that donto lacks features. The biggest risk is that it becomes a powerful local system without becoming an interoperable public discipline.

A private memory substrate can be useful. A century-scale AI substrate must be legible to outsiders, hostile auditors, future maintainers, communities whose knowledge is stored, agents that did not exist when the data was written, and institutions that need to rely on it after the original operators are gone.

7. Bottom line

donto is missing the layers that turn a strong knowledge kernel into durable civil infrastructure:

class	missing layers
Truth discipline	durable identity, semantic stability, causal reasoning
Evidence discipline	deep source objects, multimodal anchors, transformation provenance
Governance discipline	memory lifecycle, consent, policy appeals, community authority
Trust discipline	signed claims/runs/reviews, transparent logs, query/release proofs
Network discipline	federation, universal APIs, conformance, independent implementations
Operational discipline	massive scale, mirrors, compaction, observability, incident response
Human discipline	review UI, stewardship incentives, domain packages, public citation

The execution plan made donto credible as a governed paraconsistent substrate. The next challenge is to make it durable, federated, inspectable, and worth trusting for generations.