dontoreport

Future substrate report — what donto is still missing

Living document. 2026-06-12. A century-scale substrate gap report: what donto already has, what is still missing, and what has to become true before AI systems can treat donto-like memory as durable infrastructure. For the full reports index, see Reports.

donto now has a credible kernel. It is not yet a civilization substrate.

The kernel is real: append-only claims, bitemporality, paraconsistency, evidence links, policy, lenses, standing, loss reports, release artifacts, and agent discovery. The missing work is larger than another feature wave. donto needs the identity, source, semantic, cryptographic, operational, governance, economic, and human machinery that lets memory survive many models, many institutions, and many generations.


TL;DR

  • The current win: donto has the right memory discipline: claims are evidence-linked, contradiction-tolerant, time-aware, lens-scoped, and exported with loss instead of pretending that every view is complete.
  • The current gap: the system is still mostly a strong local kernel. It does not yet have universal identity, deep source packages, federation, cryptographic proof chains, governance protocols, scale architecture, or mature human review surfaces.
  • The real target: not one world database. The target is an interoperable protocol where many stores can exchange claims, evidence, policies, contradictions, standing, and loss without flattening the context.
  • The next move: make the kernel boring, then federate it, then govern it. A memory substrate only matters if outsiders can verify it, communities can constrain it, and future systems can migrate it.

donto should not become "the memory all AI queries." It should become the discipline by which AI memory stays attributable, contestable, policy-aware, and reconstructable.


Century-substrate scorecard

layer donto has now missing before century-scale trust next build
Claims bitemporal, paraconsistent statements richer modal, causal, and domain frames domain packages + causal/event schemas
Evidence spans, revisions, blobs, evidence links media-native anchors and source packages source package manifest + multimodal anchors
Identity hypotheses, clusters, cannot-links portable identity protocol with reversible merges signed identity assertions + split/merge ledger
Semantics free predicates + alignment closure versioned predicate definitions and semantic diffs predicate registry + compatibility reports
Policy capsules, attestations, restrictiveness order consent, licensing, jurisdiction, appeal workflows policy simulation + consent receipts
Trust signed release manifests signed claims, reviews, runs, and query-result proofs Merkle release/query proofs
Federation release artifacts, interop exports cross-instance sync with policy and loss preserved signed exchange envelopes
APIs HTTP, SDKs, CLI, DontoQL, SPARQL subset, A2A stable /v1, streaming, subscriptions, conformance versioned API namespace
Scale Postgres-native practical kernel partitioning, cold storage, distributed query, mirrors 10M/1B benchmark tracks
Humans TUI, docs, reports review workbenches, contradiction dashboards, governance UI first review surface

1. What donto already has

The completed execution work turned many implicit conventions into durable substrate behavior:

capability why it matters
Append-first statement safety memory can be corrected without erasing what was believed before
Bitemporal statements donto can distinguish when a fact was true from when donto learned it
Paraconsistent storage disagreement stays queryable instead of being collapsed into a fake winner
Evidence links claims can point back to source spans, revisions, and blobs
Lenses different users/tasks can ask for different views without mutating the substrate
Standing claims can be ranked by maturity, corroboration, pressure, and policy context
Loss reports exports and adapters state what they failed to carry
Release artifacts datasets can be packaged, checked, and published reproducibly
Agent discovery agents can find capabilities instead of hard-coding private assumptions

That is a serious kernel. The rest of this report is the gap between a kernel and infrastructure.


2. The target shape

A century-scale AI memory substrate has to satisfy several properties at once:

  1. It remembers without pretending memory is truth.
  2. It preserves disagreement without turning disagreement into chaos.
  3. It lets many agents contribute without giving any agent total authority.
  4. It supports private, community-governed, public, commercial, and scientific knowledge under different rules.
  5. It lets future agents reconstruct why a belief existed, which evidence supported it, which policies constrained it, and what later work changed it.
  6. It survives software rewrites, hardware generations, organization collapse, legal changes, and model-family turnover.
  7. It makes extraction, inference, alignment, ranking, and forgetting accountable.

The shape is therefore a protocol, not a single database:

source bytes
  -> anchored observations
  -> claims + time + policy
  -> contradiction-aware lenses
  -> standing + loss reports
  -> signed releases / query results
  -> agents that know what they may use, quote, train on, or publish

3. Missing foundation map

Identity, sources, and semantics

foundation current kernel missing layer
Durable identity cautious identity hypotheses, clusters, edges, cannot-links persistent subject IDs, signed identity assertions, reversible merges, versioned splits, domain-specific identity criteria, cross-instance exchange
Source object depth documents, revisions, spans, blobs, evidence links source packages for audio, video, images, code, sensor streams, samples, model checkpoints, OCR/transcript transformations, and partial/corrupt source state
Semantic stability freely minted predicates and alignment closure predicate definitions with owners/examples/domains/ranges, versioning, compatibility rules, schema packages, semantic diff reports

Why this is first: memory fails fastest at the boundary between "same thing", "possibly same thing", "not same thing", and "same only under this lens." If identity and predicates drift silently, a century of memory becomes a bag of strings with timestamps.

Governance, trust, and lifecycle

foundation current kernel missing layer
Memory lifecycle append-first, delete-resistant history retention classes, legal/ethical forgetting, sealed knowledge, reversible redaction, tombstoned bytes, aging confidence, compaction with reconstructability
Global governance policy capsules, attestations, allowed actions, restrictiveness ordering consent receipts, delegated authority, appeal trails, jurisdictional overlays, treaty-like federation, community-governed data spaces
Cryptographic trust signed release manifests signed claims/reviews/policies/extraction runs, transparent logs, verifiable credentials, Merkle query proofs, key rotation/recovery, post-quantum migration

Hard part: not "delete or keep." The hard part is recording what can be forgotten, what must remain auditable, what can survive only as a hash/policy event, and what cannot be used for a purpose even if it is still known.

Federation, APIs, and scale

foundation current kernel missing layer
Federation release manifests, exports, local service contracts remote release discovery, signed exchange envelopes, policy-preserving sync, cross-instance identity/predicate alignment, offline-first replication, public mirrors
Universal APIs HTTP, SDKs, CLI, DontoQL, SPARQL subset, OpenAPI, A2A versioned /v1, streaming query/release APIs, subscriptions, capability negotiation, stable errors, SDK parity, conformance tests
Massive scale Postgres-native store with practical SQL surfaces partitioning, hot/warm/cold tiers, columnar/vector sidecars, distributed query planning, lens caches, incremental standing recomputation, multi-region mirrors

Federation rule: copy claims only with their evidence, policy, lineage, standing context, contradictions, and loss reports. Anything less is not interop; it is a misleading export.

Multimodal, causal, and model-native memory

foundation current kernel missing layer
Multimodal memory claim-centric document/source model pixel/region/timecode/table/frame anchors, cross-modal evidence links, model identity for embeddings, drift tracking, preservation formats
Causal and counterfactual reasoning claims, argument links, contradictions causal claims with intervention semantics, event graphs, mechanism vs correlation, simulation evidence, counterfactual frames, causal standing
Model integration retrieval, MCP tools, agent discovery memory write protocols, model-output quarantine, standing-aware context assembly, contradiction-aware answering, downstream outcome feedback, training-data audit hooks

An AI using donto should know when it is reading evidence, when it is reading a summary, when it is reading a contested claim, and when it is not authorized to know something.

Human, economic, adversarial, and formal layers

foundation current kernel missing layer
Human interfaces TUI, docs, reports review workbenches, contradiction dashboards, release review, policy authoring, identity resolution, predicate alignment labs, public citation pages
Incentives contribution is technically possible credit, curator reputation, stewardship budgets, reuse accounting, licensing hooks, anti-spam economics
Adversarial robustness paraconsistency makes attacks contestable claim-spam resistance, source-forgery detection, sybil-resistant reputation, poisoning audits, quarantine workflows, incident reports
Formal verification Lean sidecar, shape validation verified migrations, policy monotonicity, release completeness, lens algebra laws, query loss-report contracts, proof-carrying adapters

Formal methods should not try to prove every stored claim. They should prove the substrate does not lie about its own behavior.


4. What the future user experience should feel like

An agent encountering a claim should be able to ask:

question substrate answer
What is the claim? canonical claim text plus structured subject/predicate/object
Who asserted it? source, extractor, reviewer, institution, or agent identity
What supports it? exact source bytes, span/region/timecode/table/frame, transformation lineage
What conflicts with it? contradictions, rebuttals, cannot-links, suppressed lenses
Which policy controls it? consent, license, jurisdiction, purpose, retention, publication rules
How reliable is it here? standing under the current lens/task, not a global truth score
What loss happened? loss report for the query, release, export, or adapter
Can I use it? explicit reuse/quote/train/publish permissions and refusals
What would improve it? next evidence suggestions, review tasks, identity/predicate gaps

That is accountable memory: not omniscience, but a reconstructable trail from answer back to source, policy, disagreement, and loss.


5. Roadmap by horizon

horizon focus concrete moves
Near term make the kernel boring keep /reports current, stabilize SDKs around standing/loss, add release conformance tests, version the API, expand adapter loss reports, benchmark bigger stores
Medium term make the substrate federated signed exchange envelopes, remote release discovery, partial replication, policy-preserving sync, public mirrors, cross-instance identity/predicate alignment
Long term make memory governable consent receipts, community policy protocols, appeals, stewardship incentives, formal policy proofs, human review institutions
Century term make the protocol outlive this implementation open specs, independent implementations, archival test vectors, migration proofs, stable release formats, post-quantum signature migration, inheritable governance

6. Core risk

The biggest risk is not that donto lacks features. The biggest risk is that it becomes a powerful local system without becoming an interoperable public discipline.

A private memory substrate can be useful. A century-scale AI substrate must be legible to outsiders, hostile auditors, future maintainers, communities whose knowledge is stored, agents that did not exist when the data was written, and institutions that need to rely on it after the original operators are gone.


7. Bottom line

donto is missing the layers that turn a strong knowledge kernel into durable civil infrastructure:

class missing layers
Truth discipline durable identity, semantic stability, causal reasoning
Evidence discipline deep source objects, multimodal anchors, transformation provenance
Governance discipline memory lifecycle, consent, policy appeals, community authority
Trust discipline signed claims/runs/reviews, transparent logs, query/release proofs
Network discipline federation, universal APIs, conformance, independent implementations
Operational discipline massive scale, mirrors, compaction, observability, incident response
Human discipline review UI, stewardship incentives, domain packages, public citation

The execution plan made donto credible as a governed paraconsistent substrate. The next challenge is to make it durable, federated, inspectable, and worth trusting for generations.

See also: reports index · the shape of donto's return · single-shot vs agentic memory · benchmarks.