dontoreport

Past 48 hours change report - what changed and why

Living document. Window audited: 2026-06-10 10:14:50 UTC -> 2026-06-12 10:14:50 UTC. This report covers the committed work I made across the main donto substrate repository and the public donto-web site during that window. For the full reports index, see Reports.

The short version: the last 48 hours turned donto from a strong claim-store kernel into a broader, documented substrate prototype. The work landed in two places:

  • donto: 42 commits, 211 files changed, 29,776 insertions, 1,014 deletions. This completed the execution-plan waves from W0-2 through H8, then published the execution/future reports in the docs app.
  • donto-web: 30 commits, 110 files changed, 24,144 insertions, 396 deletions. This built the public donto.org documentation, benchmark boards, comparison pages, reports system, LoCoMo analysis reports, and the corrected /reports index/linking behavior.

TL;DR

area what changed why it mattered status
Substrate safety delete guard, extraction provenance, blob storage migration, live config snapshots make memory auditable and reconstructable instead of mutable folklore shipped in donto
Judgment layer standing, lenses, review-to-argument bridge, release exports, property-driven sweeps give claims context, pressure, exportability, and machine-checkable review paths shipped in donto
Loop machinery evidence suggestions, loss reports, canonical literals, aligned matching, rule agenda, loop protocol routes turn evaluation feedback into substrate-native work queues and durable loss records shipped in donto
Query/interop expansion SPARQL subset, standards exports, HTTP conformance, DontoQL identity/lens support make the substrate usable by agents and outside systems without private assumptions shipped in donto
Horizon features generic rule evaluator, A2A capability cards, reputation overlays, tournaments, analogy, calibration frames, export tier closures, policy restrictiveness order push beyond storage into governed, agent-discoverable knowledge infrastructure shipped in donto
Public surface donto.org docs, styleguide, comparison, questions, benchmarks, reports, sitemap entries make the work legible, inspectable, and linkable outside the repo shipped in donto-web
Benchmark truthfulness Hyades extraction review, LoCoMo Config C, single-shot vs agentic correction, return-shape report prevent benchmark theatre; show what works, what fails, and what the numbers really mean shipped in donto-web

The main reason for the work was to move donto from "we can store many claims" toward "we can explain, constrain, export, audit, compare, and govern memory under pressure."


1. Substrate execution work in donto

The execution plan work landed as a sequence of small commits, each with a matching migration, code path, test, or documentation update where appropriate. The pattern was deliberate: do the smallest durable change, prove it with a check, then move to the next layer.

W0: Make the base state auditable

commit change reason
d7d047a Enforced an I3 statement delete guard. Keeps claim history append-first; deleting statements becomes an explicit protected operation instead of an accidental data-loss path.
c87926d Recorded extraction run provenance. Lets later users know which model/run/settings produced extracted claims, so evidence can be traced back to a concrete run.
63c2e42 Migrated blob storage to GCS. Moves large source artifacts out of local-only assumptions and toward durable object storage.
6bf490d Snapshotted live config. Captures operational settings so later behavior can be explained instead of guessed.

Why this wave came first: if the system cannot explain where claims came from, how blobs were stored, and what live settings were active, later "intelligence" features become unverifiable.

W1: Add lenses, standing, release shape, and review paths

commit change reason
8682a6c Added standing v1. Gives claims a first maturity/standing layer instead of forcing every query to treat all claims equally.
7aecc7e Added the lens registry. Makes view definitions first-class so different contexts can ask for different slices without mutating the substrate.
ce2399c Added lens query support. Turns registered lenses into usable query constraints.
b352601 Applied identity in DontoQL. Lets DontoQL use identity-aware resolution rather than only raw string equality.
88aa159 Bridged reviews to arguments. Converts review activity into argument structure so criticism/support can be queried and preserved.
7f47075 Added dataset release export. Makes reproducible release artifacts possible from the live database.
b430eb6 Drove sweeps from property constraints. Turns property constraints into active work discovery, not passive schema decoration.

Reasoning: this is where donto starts to become a substrate rather than a store. Claims need views, standing, review pressure, and exports, otherwise downstream agents only get undifferentiated facts.

W2: Add feedback loops and loss accounting

commit change reason
b760cf2 Added next-evidence suggestions. Lets the system surface what evidence would improve a weak or contested claim.
95d4363 Persisted loss reports. Makes export/query/adapter loss explicit instead of silently pretending every representation is complete.
d06fab0 Canonicalized constrained literals. Improves matching and validation for literal values under known constraints.
4eeb6f2 Folded value mappings in aligned match. Lets aligned predicates/values resolve together during matching.
f6d9ab4 Activated the rule agenda worker. Turns rules into scheduled agenda work rather than static metadata.
43681f7 Added loop protocol routes. Exposes loop state through service routes so agents/tools can participate.
9c250b7 Deepened lens semantics. Makes lenses more expressive and closer to real view semantics.

Reasoning: loss reports and evidence suggestions are the accountability layer. They tell an agent not just what donto knows, but where donto is weak, lossy, or ready for another pass.

W3: Expand query, contradiction, interop, and TUI surfaces

commit change reason
cb697b1 Added live claim frame examples. Gives concrete examples for claim framing, not just abstract schema.
499d336 Wrote release lens memberships. Carries lens context into release artifacts.
77f835a Filtered machinery claims from search. Keeps internal machinery from polluting user-facing claim retrieval.
d2fe27d Added scoped canonical shadow rebuilds. Lets canonical material be rebuilt in scoped ways instead of broad unsafe rebuilds.
d307d04 Added standards interop exports. Makes external standards-oriented exchange possible.
289946b Matured the SPARQL subset. Improves standards-compatible query access.
b792eec Gated HTTP contract conformance. Protects API behavior with conformance tests.
2f44d81 Populated standing v2 components. Deepens standing beyond the first kernel.
0ebd2e7 Honored identity cannot-links. Prevents identity resolution from merging entities that have explicit negative evidence.
3e23fb8 Composed predicate closure. Lets predicate relationships carry through query/reasoning paths.
eafa309 Folded incompatible predicates in paraconsistency. Improves contradiction handling when predicates are related but incompatible.
fd5aaaa Deepened lens algebra. Gives lenses compositional behavior that can be tested and reasoned about.
0df4347 Added grounded contradiction policy. Makes contradiction handling policy-aware instead of ad hoc.
2315b94 Surfaced observatory loops in the TUI. Makes the feedback loops visible to operators, not just hidden in tables.

Reasoning: this wave made donto more inspectable and interoperable. It also tightened the hard parts: identity cannot-links, predicate closure, paraconsistency, lens algebra, and API conformance.

H1-H8: Horizon features

commit change reason
78c3003 Added a generic rule evaluator. Moves rule execution toward a reusable engine.
37d1080 Published A2A capability cards. Lets agents discover what the service can do instead of relying on private docs.
fd574c0 Added agent reputation overlays. Starts tracking agent trust/context as part of memory operations.
ced0142 Added hypothesis tournaments. Gives competing hypotheses a substrate-native comparison path.
c8b7ebe Added a cross-domain analogy engine. Tests whether structured memory can support analogy across domains.
9ec690d Added science instrument calibration frames. Extends the substrate toward scientific provenance and calibration use cases.
cfd6306 Added export tier closures. Makes exports respect tiered closure rules.
72478dc Added policy restrictiveness order. Lets policies be ordered by restrictiveness, which is necessary for safe combination and downgrade checks.

Reasoning: the H-series is where the system starts to resemble AI-memory infrastructure: agent discovery, reputation, policy ordering, analogy, scientific calibration, and tournament-style hypothesis pressure.

Documentation publication in donto

commit change reason
8582b7a Published execution and future reports in the docs app. Made the completed execution work and long-range substrate gap analysis visible in the documentation surface.
ad74f07 Served the future report at the docs reports root. Adjusted the docs route so the report was reachable at the expected location in that app.

2. Public site work in donto-web

The web work did two jobs: expose donto clearly to humans, and publish honest benchmark/report pages that keep the substrate claims grounded in measured behavior.

Site foundation, docs, and product surface

commit change reason
2e914d5 Added the donto-green rebrand and comprehensive component library with tests. Established a consistent visual system and reusable UI primitives.
3e9fe6b Built donto.org/docs with eight ground-truth documentation pages. Put core concepts, schema, solving model, embeddings, evidence, and alignment docs on the public site.
f27efeb Added a nine-page styleguide at /styleguide. Made the design system inspectable and reusable across future donto.org work.
0d5c996 Added the distributed-embedding tracker page and worker error-report panel. Improved admin observability for embedding and worker failure states.
74d51a1 Added genealogy reports, PDF/TeX routes, people/person/source browsers. Expanded another donto-backed public surface and report generator pattern.
7b4b474 Snapshotted live web units. Captured production service wiring for the web apps.
fb2a617 Showed standing on claim pages. Surfaced substrate standing directly in admin claim inspection.
2818bf7 Built /questions with 15 deep-research briefs. Made the frontier research questions visible instead of burying them in private notes.
2854e15 Built /comparison against 24 alternatives. Made donto's strengths and weaknesses legible against the memory/knowledge field.
3fdabbc Built /benchmarks with living run checklists. Gave LoCoMo, LongMemEval, and BEAM benchmark work a public status board.
b509af0 Added /reports and the first report, the Hyades extraction review. Created the public reports system and seeded it with the extraction-engine analysis.

Reasoning: the site needed more than a landing page. It needed docs, comparison, benchmarks, reports, admin surfaces, and visual conventions so future changes have a coherent public home.

Extraction and LoCoMo benchmark reporting

commit change reason
5db829b Added prompt re-evaluation and model bake-off section. Started documenting extraction quality as an experiment rather than a claim.
4be428a Published bake-off A results. Showed holo3.1 saturation and Hyades timeout behavior under a 32K setting.
c7a285d Published bake-off B isolated results. Separated gateway contention from per-request timeout behavior.
c9b4e79 Published bake-off C and bottom line. Showed 16K did not rescue heavy models and that budget affects faithfulness, not just count.
59c3fad Updated LoCoMo extraction checklist: 271/272 chunks, ~187 facts/chunk, no empty chunks. Made extraction progress visible and quantified.
b7fb2f6 Prepared LoCoMo claim layer/folding and report bake-off D streaming fix. Documented that streaming changes the Hyades architecture by avoiding silent 524s.
278e674 Published LoCoMo Config C claims-only report. Recorded the negative result: claims-only context shrank heavily but accuracy collapsed.
bfe5c5f Scored LoCoMo Config C: claims-only 0.244 vs episodic 0.837. Made the bottleneck explicit: recall/readability, not merely claim existence.
0dd4966 Added LoCoMo run table and per-run AI transcript thread view. Made benchmark runs navigable and inspectable.
ef71d79 Added honest hybrid result and temporal valid-time enrichment. Captured that temporal enrichment helps, while hybrid assumptions needed measurement.
871d7b1 Recorded clean verdict that cosine beat hybrid and reverted to cosine. Avoided keeping a more complex retrieval path after it underperformed.
68c5f2e Updated Config C: bitemporal valid_time lifted temporal accuracy 0.50 -> 0.725. Showed a real donto-specific advantage from valid-time modeling.
387272e Updated Config C passage recall to 0.650 and mapped the cheap-regime ceiling. Showed prose passages beat triples for weak readers and clarified the gap to Zep-style multi-scope rerank.

Reasoning: these reports matter because they stop the system from overclaiming. donto can hold rich claims, but the benchmark showed the reader needs the right return shape, temporal tags, and better recall/reranking before the substrate wins cleanly.

Reports, benchmark interpretation, and the corrected /reports index

commit change reason
2bfc53d Published "The shape of donto's return." Documented the memory-bundle schema and why each returned field exists.
38d00ba Published "Single-shot vs agentic memory." Compared benchmark methodology and clarified what leaderboard numbers measure.
ef54d87 Corrected the single-shot vs agentic report after verifying Zep's methodology. Fixed the interpretation: Zep's 94.7% is single-read with multi-scope retrieval; auto-search is 86.5%.
7e38d8e Linked the future substrate report from the /reports index. Fixed the route mistake by making /reports remain the index and putting the future report on its own slug.
37b1575 Added the return-shape facet checklist. Mapped roughly 20 return facets, which ones were tested, and which benchmark problems they target.
d883438 Restyled the future substrate report. Brought the report closer to the rest of the site: TL;DR, scorecard, tables, roadmap, and clearer section rhythm.

Reasoning: /reports must be an index, not a single report. The correction split the future substrate report onto /reports/future-substrate, kept /reports as the report directory, and then improved the future report formatting to match the other public notes.


3. Why these changes fit together

The work was not random feature accumulation. It follows one spine:

source bytes
  -> extracted claims with run provenance
  -> evidence links, valid time, policy, identity, standing
  -> contradiction-aware lenses and query surfaces
  -> loss reports, releases, and conformance checks
  -> public reports that say what worked, failed, and remains missing

The substrate side makes the memory more accountable. The public-site side makes the accountability visible.

The most important design decisions were:

decision reason
Append-first memory over destructive correction AI memory needs correction history, not silent overwrite.
Bitemporal facts over plain timestamps The system must distinguish when something was true from when donto learned it.
Paraconsistency over single-winner truth Real memory contains conflict; deleting disagreement creates false certainty.
Lenses over global views Different tasks need different views without corrupting the underlying store.
Standing and loss reports over raw confidence theatre Agents need to know maturity, pressure, and what was lost in translation.
Public reports over private conclusions Benchmark and architecture claims should be inspectable, linkable, and revisable.

4. What changed for users and agents

audience before after
Human readers mostly repo-local knowledge and scattered operational context public docs, reports, comparison, questions, benchmark pages, and a styled report index
Operators more hidden service/config/run state config snapshots, live unit docs, benchmark checklists, admin worker/error surfaces
Agents fewer explicit discovery and loop affordances A2A capability cards, loop routes, rule agenda, evidence suggestions, loss reports
Benchmark reviewers easy to overread claims-only and leaderboard numbers explicit LoCoMo results, Zep methodology correction, return-shape checklist
Future implementers fewer contracts and examples conformance gates, standards exports, docs pages, reports, and migration-backed substrate features

5. What the future should look like from here

The next phase should be less about adding isolated powers and more about making the powers boring, versioned, and externally trustworthy.

horizon next shape
Near term Keep /reports current; stabilize memory bundle return shapes; add stronger cross-encoder/reranker experiments; make LoCoMo run artifacts easier to inspect; finish cleaning any local benchmark work before publishing it.
Kernel hardening Version public APIs, expand conformance tests, keep migration tests close to every substrate rule, and make standing/loss/lens behavior cheap to query.
Federation Sign release/query artifacts, preserve policy/loss through exchange, add remote release discovery, and make cross-instance identity/predicate alignment explicit.
Governance Add consent receipts, appeal trails, review workbenches, policy simulation, and community-level authority models.
Century substrate path Move toward independent implementations, archival test vectors, signed memory envelopes, formal policy/lens proofs, multimodal source packages, and long-horizon migration discipline.

6. Caveats and boundaries

  • This report covers committed work visible in local git history during the audited 48-hour window.
  • It does not claim unrelated dirty local benchmark files are finished. At the time this report was written, there were uncommitted local changes under the benchmark run surface; those were intentionally not included in this report commit.
  • Some public benchmark conclusions are explicitly provisional. The LoCoMo reports already say where the current cheap regime tops out and where reranking/reader strength remain blocked or unproven.
  • The report index mistake was corrected: /reports is the index, and individual reports live under their own slugs.

7. Commit ledger

donto ledger

time UTC commit subject
2026-06-12 08:54 ad74f07 docs: serve future report at reports root
2026-06-12 08:24 8582b7a docs: publish execution and future reports
2026-06-12 02:03 72478dc H8: add policy restrictiveness order
2026-06-12 01:55 cfd6306 H7: add export tier closures
2026-06-12 01:47 9ec690d H6: add science instrument calibration frames
2026-06-12 01:31 c8b7ebe H5: add cross-domain analogy engine
2026-06-12 01:25 ced0142 H4: add hypothesis tournaments
2026-06-12 01:15 fd574c0 H3: add agent reputation overlays
2026-06-12 01:01 37d1080 H2: publish A2A capability cards
2026-06-12 00:42 78c3003 H1: add generic rule evaluator
2026-06-12 00:25 2315b94 W3-14: surface observatory loops in TUI
2026-06-12 00:01 0df4347 W3-13: add grounded contradiction policy
2026-06-11 23:43 fd5aaaa W3-12: deepen lens algebra
2026-06-11 23:25 eafa309 W3-11: fold incompatible predicates in paraconsistency
2026-06-11 22:56 3e23fb8 W3-10: compose predicate closure
2026-06-11 22:16 0ebd2e7 W3-9: honor identity cannot-links
2026-06-11 22:02 2f44d81 W3-8: populate standing v2 components
2026-06-11 21:40 b792eec W3-7: gate HTTP contract conformance
2026-06-11 21:20 289946b W3-6: mature SPARQL subset
2026-06-11 20:38 d307d04 W3-5: add standards interop exports
2026-06-11 20:03 d2fe27d W3-4: add scoped canonical shadow rebuilds
2026-06-11 19:38 77f835a W3-3: filter machinery claims from search
2026-06-11 19:22 499d336 W3-2: write release lens memberships
2026-06-11 19:12 cb697b1 W3-1: add live claim frame examples
2026-06-11 18:49 9c250b7 W2-7: deepen lens semantics
2026-06-11 18:29 43681f7 W2-6: add loop protocol routes
2026-06-11 17:56 f6d9ab4 W2-5: activate rule agenda worker
2026-06-11 17:44 4eeb6f2 W2-4: fold value mappings in aligned match
2026-06-11 17:29 d06fab0 W2-3: canonicalize constrained literals
2026-06-11 17:17 95d4363 W2-2: persist loss reports
2026-06-11 12:44 b760cf2 W2-1: add next evidence suggestions
2026-06-11 12:20 b430eb6 W1-7: drive sweep from property constraints
2026-06-11 11:55 7f47075 W1-6: add dataset release export
2026-06-11 10:54 88aa159 W1-5: bridge reviews to arguments
2026-06-11 10:35 b352601 W1-4: apply identity in DontoQL
2026-06-11 10:12 ce2399c W1-3: add lens query
2026-06-11 09:45 7aecc7e W1-2: add lens registry
2026-06-11 09:35 8682a6c W1-1: add standing v1
2026-06-11 08:52 6bf490d W0-5: snapshot live config
2026-06-11 08:43 63c2e42 W0-4: migrate blob storage to GCS
2026-06-11 08:17 c87926d W0-3: record extraction run provenance
2026-06-11 07:51 d7d047a W0-2: enforce I3 statement delete guard

donto-web ledger

time UTC commit subject
2026-06-12 10:11 d883438 reports: restyle future substrate report
2026-06-12 09:54 37b1575 return-shape report: add facet checklist
2026-06-12 09:24 387272e Config C Update 4: passage recall 0.650
2026-06-12 09:18 7e38d8e reports: link future substrate report from index
2026-06-12 09:05 ef54d87 reports: correct single-shot-vs-agentic methodology
2026-06-12 09:02 38d00ba reports: single-shot vs agentic memory
2026-06-12 08:50 2bfc53d reports: the shape of donto's return
2026-06-12 05:07 68c5f2e LoCoMo Config C: bitemporal valid_time lifts temporal
2026-06-12 03:28 871d7b1 LoCoMo Config C: cosine beats hybrid
2026-06-12 02:56 ef71d79 LoCoMo Config C: hybrid result and temporal enrichment
2026-06-12 01:37 0dd4966 benchmarks/locomo: runs table and transcript thread view
2026-06-11 22:26 bfe5c5f LoCoMo Config C scored: claims-only 0.244 vs episodic 0.837
2026-06-11 21:23 278e674 reports: LoCoMo Config C claims-only report
2026-06-11 20:20 b7fb2f6 LoCoMo claim layer, folding, and bake-off D streaming fix
2026-06-11 17:44 59c3fad benchmarks/locomo: extraction done and checklist updated
2026-06-11 17:42 c9b4e79 report: bake-off C and bottom line
2026-06-11 17:27 c7a285d report: bake-off B isolated
2026-06-11 17:13 4be428a report: bake-off A results
2026-06-11 16:43 5db829b report: prompt re-evaluation and model bake-off section
2026-06-11 15:49 b509af0 home: /reports and first Hyades extraction review
2026-06-11 12:43 3fdabbc home: /benchmarks living run checklists
2026-06-11 09:54 2854e15 home: /comparison vs 24 alternatives
2026-06-11 09:36 fb2a617 W1-1: show standing on claim pages
2026-06-11 08:53 7b4b474 W0-5: snapshot live web units
2026-06-11 08:32 2818bf7 home: /questions deep-research briefs
2026-06-10 16:18 74d51a1 genealogy: reports, PDF/TeX routes, people/person/source browsers
2026-06-10 16:18 0d5c996 admin: distributed-embedding tracker and worker error panel
2026-06-10 16:17 f27efeb home: styleguide
2026-06-10 16:17 3e9fe6b home: docs
2026-06-10 16:17 2e914d5 ui: rebrand and component library

Bottom line

In the last 48 hours, donto gained a much stronger governed-memory spine: safety guards, provenance, object storage, standing, lenses, review/argument machinery, loss reporting, rule agendas, loop routes, query/interop expansion, conformance checks, agent discovery, reputation overlays, hypothesis tournaments, export closures, and policy ordering.

donto.org gained the public scaffolding needed to make that work inspectable: docs, comparison pages, benchmark boards, a reports index, detailed LoCoMo/Hyades reports, a corrected future substrate report route, and now this change report.

The future work is to make these capabilities stable enough that external agents and independent implementations can rely on them without needing private context from the current codebase.

See also: Future Substrate Report · The shape of donto's return · Single-shot vs agentic memory · LoCoMo Config C.