Past 48 hours change report - what changed and why

Living document. Window audited: 2026-06-10 10:14:50 UTC -> 2026-06-12 10:14:50 UTC. This report covers the committed work I made across the main donto substrate repository and the public donto-web site during that window. For the full reports index, see Reports.

The short version: the last 48 hours turned donto from a strong claim-store kernel into a broader, documented substrate prototype. The work landed in two places:

donto: 42 commits, 211 files changed, 29,776 insertions, 1,014 deletions. This completed the execution-plan waves from W0-2 through H8, then published the execution/future reports in the docs app.
donto-web: 30 commits, 110 files changed, 24,144 insertions, 396 deletions. This built the public donto.org documentation, benchmark boards, comparison pages, reports system, LoCoMo analysis reports, and the corrected /reports index/linking behavior.

TL;DR

area	what changed	why it mattered	status
Substrate safety	delete guard, extraction provenance, blob storage migration, live config snapshots	make memory auditable and reconstructable instead of mutable folklore	shipped in `donto`
Judgment layer	standing, lenses, review-to-argument bridge, release exports, property-driven sweeps	give claims context, pressure, exportability, and machine-checkable review paths	shipped in `donto`
Loop machinery	evidence suggestions, loss reports, canonical literals, aligned matching, rule agenda, loop protocol routes	turn evaluation feedback into substrate-native work queues and durable loss records	shipped in `donto`
Query/interop expansion	SPARQL subset, standards exports, HTTP conformance, DontoQL identity/lens support	make the substrate usable by agents and outside systems without private assumptions	shipped in `donto`
Horizon features	generic rule evaluator, A2A capability cards, reputation overlays, tournaments, analogy, calibration frames, export tier closures, policy restrictiveness order	push beyond storage into governed, agent-discoverable knowledge infrastructure	shipped in `donto`
Public surface	donto.org docs, styleguide, comparison, questions, benchmarks, reports, sitemap entries	make the work legible, inspectable, and linkable outside the repo	shipped in `donto-web`
Benchmark truthfulness	Hyades extraction review, LoCoMo Config C, single-shot vs agentic correction, return-shape report	prevent benchmark theatre; show what works, what fails, and what the numbers really mean	shipped in `donto-web`

The main reason for the work was to move donto from "we can store many claims" toward "we can explain, constrain, export, audit, compare, and govern memory under pressure."

1. Substrate execution work in `donto`

The execution plan work landed as a sequence of small commits, each with a matching migration, code path, test, or documentation update where appropriate. The pattern was deliberate: do the smallest durable change, prove it with a check, then move to the next layer.

W0: Make the base state auditable

commit	change	reason
`d7d047a`	Enforced an I3 statement delete guard.	Keeps claim history append-first; deleting statements becomes an explicit protected operation instead of an accidental data-loss path.
`c87926d`	Recorded extraction run provenance.	Lets later users know which model/run/settings produced extracted claims, so evidence can be traced back to a concrete run.
`63c2e42`	Migrated blob storage to GCS.	Moves large source artifacts out of local-only assumptions and toward durable object storage.
`6bf490d`	Snapshotted live config.	Captures operational settings so later behavior can be explained instead of guessed.

Why this wave came first: if the system cannot explain where claims came from, how blobs were stored, and what live settings were active, later "intelligence" features become unverifiable.

W1: Add lenses, standing, release shape, and review paths

commit	change	reason
`8682a6c`	Added standing v1.	Gives claims a first maturity/standing layer instead of forcing every query to treat all claims equally.
`7aecc7e`	Added the lens registry.	Makes view definitions first-class so different contexts can ask for different slices without mutating the substrate.
`ce2399c`	Added lens query support.	Turns registered lenses into usable query constraints.
`b352601`	Applied identity in DontoQL.	Lets DontoQL use identity-aware resolution rather than only raw string equality.
`88aa159`	Bridged reviews to arguments.	Converts review activity into argument structure so criticism/support can be queried and preserved.
`7f47075`	Added dataset release export.	Makes reproducible release artifacts possible from the live database.
`b430eb6`	Drove sweeps from property constraints.	Turns property constraints into active work discovery, not passive schema decoration.

Reasoning: this is where donto starts to become a substrate rather than a store. Claims need views, standing, review pressure, and exports, otherwise downstream agents only get undifferentiated facts.

W2: Add feedback loops and loss accounting

commit	change	reason
`b760cf2`	Added next-evidence suggestions.	Lets the system surface what evidence would improve a weak or contested claim.
`95d4363`	Persisted loss reports.	Makes export/query/adapter loss explicit instead of silently pretending every representation is complete.
`d06fab0`	Canonicalized constrained literals.	Improves matching and validation for literal values under known constraints.
`4eeb6f2`	Folded value mappings in aligned match.	Lets aligned predicates/values resolve together during matching.
`f6d9ab4`	Activated the rule agenda worker.	Turns rules into scheduled agenda work rather than static metadata.
`43681f7`	Added loop protocol routes.	Exposes loop state through service routes so agents/tools can participate.
`9c250b7`	Deepened lens semantics.	Makes lenses more expressive and closer to real view semantics.

Reasoning: loss reports and evidence suggestions are the accountability layer. They tell an agent not just what donto knows, but where donto is weak, lossy, or ready for another pass.

W3: Expand query, contradiction, interop, and TUI surfaces

commit	change	reason
`cb697b1`	Added live claim frame examples.	Gives concrete examples for claim framing, not just abstract schema.
`499d336`	Wrote release lens memberships.	Carries lens context into release artifacts.
`77f835a`	Filtered machinery claims from search.	Keeps internal machinery from polluting user-facing claim retrieval.
`d2fe27d`	Added scoped canonical shadow rebuilds.	Lets canonical material be rebuilt in scoped ways instead of broad unsafe rebuilds.
`d307d04`	Added standards interop exports.	Makes external standards-oriented exchange possible.
`289946b`	Matured the SPARQL subset.	Improves standards-compatible query access.
`b792eec`	Gated HTTP contract conformance.	Protects API behavior with conformance tests.
`2f44d81`	Populated standing v2 components.	Deepens standing beyond the first kernel.
`0ebd2e7`	Honored identity cannot-links.	Prevents identity resolution from merging entities that have explicit negative evidence.
`3e23fb8`	Composed predicate closure.	Lets predicate relationships carry through query/reasoning paths.
`eafa309`	Folded incompatible predicates in paraconsistency.	Improves contradiction handling when predicates are related but incompatible.
`fd5aaaa`	Deepened lens algebra.	Gives lenses compositional behavior that can be tested and reasoned about.
`0df4347`	Added grounded contradiction policy.	Makes contradiction handling policy-aware instead of ad hoc.
`2315b94`	Surfaced observatory loops in the TUI.	Makes the feedback loops visible to operators, not just hidden in tables.

Reasoning: this wave made donto more inspectable and interoperable. It also tightened the hard parts: identity cannot-links, predicate closure, paraconsistency, lens algebra, and API conformance.

H1-H8: Horizon features

commit	change	reason
`78c3003`	Added a generic rule evaluator.	Moves rule execution toward a reusable engine.
`37d1080`	Published A2A capability cards.	Lets agents discover what the service can do instead of relying on private docs.
`fd574c0`	Added agent reputation overlays.	Starts tracking agent trust/context as part of memory operations.
`ced0142`	Added hypothesis tournaments.	Gives competing hypotheses a substrate-native comparison path.
`c8b7ebe`	Added a cross-domain analogy engine.	Tests whether structured memory can support analogy across domains.
`9ec690d`	Added science instrument calibration frames.	Extends the substrate toward scientific provenance and calibration use cases.
`cfd6306`	Added export tier closures.	Makes exports respect tiered closure rules.
`72478dc`	Added policy restrictiveness order.	Lets policies be ordered by restrictiveness, which is necessary for safe combination and downgrade checks.

Reasoning: the H-series is where the system starts to resemble AI-memory infrastructure: agent discovery, reputation, policy ordering, analogy, scientific calibration, and tournament-style hypothesis pressure.

Documentation publication in `donto`

commit	change	reason
`8582b7a`	Published execution and future reports in the docs app.	Made the completed execution work and long-range substrate gap analysis visible in the documentation surface.
`ad74f07`	Served the future report at the docs reports root.	Adjusted the docs route so the report was reachable at the expected location in that app.

2. Public site work in `donto-web`

The web work did two jobs: expose donto clearly to humans, and publish honest benchmark/report pages that keep the substrate claims grounded in measured behavior.

Site foundation, docs, and product surface

commit	change	reason
`2e914d5`	Added the donto-green rebrand and comprehensive component library with tests.	Established a consistent visual system and reusable UI primitives.
`3e9fe6b`	Built `donto.org/docs` with eight ground-truth documentation pages.	Put core concepts, schema, solving model, embeddings, evidence, and alignment docs on the public site.
`f27efeb`	Added a nine-page styleguide at `/styleguide`.	Made the design system inspectable and reusable across future donto.org work.
`0d5c996`	Added the distributed-embedding tracker page and worker error-report panel.	Improved admin observability for embedding and worker failure states.
`74d51a1`	Added genealogy reports, PDF/TeX routes, people/person/source browsers.	Expanded another donto-backed public surface and report generator pattern.
`7b4b474`	Snapshotted live web units.	Captured production service wiring for the web apps.
`fb2a617`	Showed standing on claim pages.	Surfaced substrate standing directly in admin claim inspection.
`2818bf7`	Built `/questions` with 15 deep-research briefs.	Made the frontier research questions visible instead of burying them in private notes.
`2854e15`	Built `/comparison` against 24 alternatives.	Made donto's strengths and weaknesses legible against the memory/knowledge field.
`3fdabbc`	Built `/benchmarks` with living run checklists.	Gave LoCoMo, LongMemEval, and BEAM benchmark work a public status board.
`b509af0`	Added `/reports` and the first report, the Hyades extraction review.	Created the public reports system and seeded it with the extraction-engine analysis.

Reasoning: the site needed more than a landing page. It needed docs, comparison, benchmarks, reports, admin surfaces, and visual conventions so future changes have a coherent public home.

Extraction and LoCoMo benchmark reporting

commit	change	reason
`5db829b`	Added prompt re-evaluation and model bake-off section.	Started documenting extraction quality as an experiment rather than a claim.
`4be428a`	Published bake-off A results.	Showed holo3.1 saturation and Hyades timeout behavior under a 32K setting.
`c7a285d`	Published bake-off B isolated results.	Separated gateway contention from per-request timeout behavior.
`c9b4e79`	Published bake-off C and bottom line.	Showed 16K did not rescue heavy models and that budget affects faithfulness, not just count.
`59c3fad`	Updated LoCoMo extraction checklist: 271/272 chunks, ~187 facts/chunk, no empty chunks.	Made extraction progress visible and quantified.
`b7fb2f6`	Prepared LoCoMo claim layer/folding and report bake-off D streaming fix.	Documented that streaming changes the Hyades architecture by avoiding silent 524s.
`278e674`	Published LoCoMo Config C claims-only report.	Recorded the negative result: claims-only context shrank heavily but accuracy collapsed.
`bfe5c5f`	Scored LoCoMo Config C: claims-only 0.244 vs episodic 0.837.	Made the bottleneck explicit: recall/readability, not merely claim existence.
`0dd4966`	Added LoCoMo run table and per-run AI transcript thread view.	Made benchmark runs navigable and inspectable.
`ef71d79`	Added honest hybrid result and temporal valid-time enrichment.	Captured that temporal enrichment helps, while hybrid assumptions needed measurement.
`871d7b1`	Recorded clean verdict that cosine beat hybrid and reverted to cosine.	Avoided keeping a more complex retrieval path after it underperformed.
`68c5f2e`	Updated Config C: bitemporal `valid_time` lifted temporal accuracy 0.50 -> 0.725.	Showed a real donto-specific advantage from valid-time modeling.
`387272e`	Updated Config C passage recall to 0.650 and mapped the cheap-regime ceiling.	Showed prose passages beat triples for weak readers and clarified the gap to Zep-style multi-scope rerank.

Reasoning: these reports matter because they stop the system from overclaiming. donto can hold rich claims, but the benchmark showed the reader needs the right return shape, temporal tags, and better recall/reranking before the substrate wins cleanly.

Reports, benchmark interpretation, and the corrected `/reports` index

commit	change	reason
`2bfc53d`	Published "The shape of donto's return."	Documented the memory-bundle schema and why each returned field exists.
`38d00ba`	Published "Single-shot vs agentic memory."	Compared benchmark methodology and clarified what leaderboard numbers measure.
`ef54d87`	Corrected the single-shot vs agentic report after verifying Zep's methodology.	Fixed the interpretation: Zep's 94.7% is single-read with multi-scope retrieval; auto-search is 86.5%.
`7e38d8e`	Linked the future substrate report from the `/reports` index.	Fixed the route mistake by making `/reports` remain the index and putting the future report on its own slug.
`37b1575`	Added the return-shape facet checklist.	Mapped roughly 20 return facets, which ones were tested, and which benchmark problems they target.
`d883438`	Restyled the future substrate report.	Brought the report closer to the rest of the site: TL;DR, scorecard, tables, roadmap, and clearer section rhythm.

Reasoning: /reports must be an index, not a single report. The correction split the future substrate report onto /reports/future-substrate, kept /reports as the report directory, and then improved the future report formatting to match the other public notes.

3. Why these changes fit together

The work was not random feature accumulation. It follows one spine:

source bytes
  -> extracted claims with run provenance
  -> evidence links, valid time, policy, identity, standing
  -> contradiction-aware lenses and query surfaces
  -> loss reports, releases, and conformance checks
  -> public reports that say what worked, failed, and remains missing

The substrate side makes the memory more accountable. The public-site side makes the accountability visible.

The most important design decisions were:

decision	reason
Append-first memory over destructive correction	AI memory needs correction history, not silent overwrite.
Bitemporal facts over plain timestamps	The system must distinguish when something was true from when donto learned it.
Paraconsistency over single-winner truth	Real memory contains conflict; deleting disagreement creates false certainty.
Lenses over global views	Different tasks need different views without corrupting the underlying store.
Standing and loss reports over raw confidence theatre	Agents need to know maturity, pressure, and what was lost in translation.
Public reports over private conclusions	Benchmark and architecture claims should be inspectable, linkable, and revisable.

4. What changed for users and agents

audience	before	after
Human readers	mostly repo-local knowledge and scattered operational context	public docs, reports, comparison, questions, benchmark pages, and a styled report index
Operators	more hidden service/config/run state	config snapshots, live unit docs, benchmark checklists, admin worker/error surfaces
Agents	fewer explicit discovery and loop affordances	A2A capability cards, loop routes, rule agenda, evidence suggestions, loss reports
Benchmark reviewers	easy to overread claims-only and leaderboard numbers	explicit LoCoMo results, Zep methodology correction, return-shape checklist
Future implementers	fewer contracts and examples	conformance gates, standards exports, docs pages, reports, and migration-backed substrate features

5. What the future should look like from here

The next phase should be less about adding isolated powers and more about making the powers boring, versioned, and externally trustworthy.

horizon	next shape
Near term	Keep `/reports` current; stabilize memory bundle return shapes; add stronger cross-encoder/reranker experiments; make LoCoMo run artifacts easier to inspect; finish cleaning any local benchmark work before publishing it.
Kernel hardening	Version public APIs, expand conformance tests, keep migration tests close to every substrate rule, and make standing/loss/lens behavior cheap to query.
Federation	Sign release/query artifacts, preserve policy/loss through exchange, add remote release discovery, and make cross-instance identity/predicate alignment explicit.
Governance	Add consent receipts, appeal trails, review workbenches, policy simulation, and community-level authority models.
Century substrate path	Move toward independent implementations, archival test vectors, signed memory envelopes, formal policy/lens proofs, multimodal source packages, and long-horizon migration discipline.

6. Caveats and boundaries

This report covers committed work visible in local git history during the audited 48-hour window.
It does not claim unrelated dirty local benchmark files are finished. At the time this report was written, there were uncommitted local changes under the benchmark run surface; those were intentionally not included in this report commit.
Some public benchmark conclusions are explicitly provisional. The LoCoMo reports already say where the current cheap regime tops out and where reranking/reader strength remain blocked or unproven.
The report index mistake was corrected: /reports is the index, and individual reports live under their own slugs.

7. Commit ledger

`donto` ledger

time UTC	commit	subject
2026-06-12 08:54	`ad74f07`	docs: serve future report at reports root
2026-06-12 08:24	`8582b7a`	docs: publish execution and future reports
2026-06-12 02:03	`72478dc`	H8: add policy restrictiveness order
2026-06-12 01:55	`cfd6306`	H7: add export tier closures
2026-06-12 01:47	`9ec690d`	H6: add science instrument calibration frames
2026-06-12 01:31	`c8b7ebe`	H5: add cross-domain analogy engine
2026-06-12 01:25	`ced0142`	H4: add hypothesis tournaments
2026-06-12 01:15	`fd574c0`	H3: add agent reputation overlays
2026-06-12 01:01	`37d1080`	H2: publish A2A capability cards
2026-06-12 00:42	`78c3003`	H1: add generic rule evaluator
2026-06-12 00:25	`2315b94`	W3-14: surface observatory loops in TUI
2026-06-12 00:01	`0df4347`	W3-13: add grounded contradiction policy
2026-06-11 23:43	`fd5aaaa`	W3-12: deepen lens algebra
2026-06-11 23:25	`eafa309`	W3-11: fold incompatible predicates in paraconsistency
2026-06-11 22:56	`3e23fb8`	W3-10: compose predicate closure
2026-06-11 22:16	`0ebd2e7`	W3-9: honor identity cannot-links
2026-06-11 22:02	`2f44d81`	W3-8: populate standing v2 components
2026-06-11 21:40	`b792eec`	W3-7: gate HTTP contract conformance
2026-06-11 21:20	`289946b`	W3-6: mature SPARQL subset
2026-06-11 20:38	`d307d04`	W3-5: add standards interop exports
2026-06-11 20:03	`d2fe27d`	W3-4: add scoped canonical shadow rebuilds
2026-06-11 19:38	`77f835a`	W3-3: filter machinery claims from search
2026-06-11 19:22	`499d336`	W3-2: write release lens memberships
2026-06-11 19:12	`cb697b1`	W3-1: add live claim frame examples
2026-06-11 18:49	`9c250b7`	W2-7: deepen lens semantics
2026-06-11 18:29	`43681f7`	W2-6: add loop protocol routes
2026-06-11 17:56	`f6d9ab4`	W2-5: activate rule agenda worker
2026-06-11 17:44	`4eeb6f2`	W2-4: fold value mappings in aligned match
2026-06-11 17:29	`d06fab0`	W2-3: canonicalize constrained literals
2026-06-11 17:17	`95d4363`	W2-2: persist loss reports
2026-06-11 12:44	`b760cf2`	W2-1: add next evidence suggestions
2026-06-11 12:20	`b430eb6`	W1-7: drive sweep from property constraints
2026-06-11 11:55	`7f47075`	W1-6: add dataset release export
2026-06-11 10:54	`88aa159`	W1-5: bridge reviews to arguments
2026-06-11 10:35	`b352601`	W1-4: apply identity in DontoQL
2026-06-11 10:12	`ce2399c`	W1-3: add lens query
2026-06-11 09:45	`7aecc7e`	W1-2: add lens registry
2026-06-11 09:35	`8682a6c`	W1-1: add standing v1
2026-06-11 08:52	`6bf490d`	W0-5: snapshot live config
2026-06-11 08:43	`63c2e42`	W0-4: migrate blob storage to GCS
2026-06-11 08:17	`c87926d`	W0-3: record extraction run provenance
2026-06-11 07:51	`d7d047a`	W0-2: enforce I3 statement delete guard

`donto-web` ledger

time UTC	commit	subject
2026-06-12 10:11	`d883438`	reports: restyle future substrate report
2026-06-12 09:54	`37b1575`	return-shape report: add facet checklist
2026-06-12 09:24	`387272e`	Config C Update 4: passage recall 0.650
2026-06-12 09:18	`7e38d8e`	reports: link future substrate report from index
2026-06-12 09:05	`ef54d87`	reports: correct single-shot-vs-agentic methodology
2026-06-12 09:02	`38d00ba`	reports: single-shot vs agentic memory
2026-06-12 08:50	`2bfc53d`	reports: the shape of donto's return
2026-06-12 05:07	`68c5f2e`	LoCoMo Config C: bitemporal valid_time lifts temporal
2026-06-12 03:28	`871d7b1`	LoCoMo Config C: cosine beats hybrid
2026-06-12 02:56	`ef71d79`	LoCoMo Config C: hybrid result and temporal enrichment
2026-06-12 01:37	`0dd4966`	benchmarks/locomo: runs table and transcript thread view
2026-06-11 22:26	`bfe5c5f`	LoCoMo Config C scored: claims-only 0.244 vs episodic 0.837
2026-06-11 21:23	`278e674`	reports: LoCoMo Config C claims-only report
2026-06-11 20:20	`b7fb2f6`	LoCoMo claim layer, folding, and bake-off D streaming fix
2026-06-11 17:44	`59c3fad`	benchmarks/locomo: extraction done and checklist updated
2026-06-11 17:42	`c9b4e79`	report: bake-off C and bottom line
2026-06-11 17:27	`c7a285d`	report: bake-off B isolated
2026-06-11 17:13	`4be428a`	report: bake-off A results
2026-06-11 16:43	`5db829b`	report: prompt re-evaluation and model bake-off section
2026-06-11 15:49	`b509af0`	home: /reports and first Hyades extraction review
2026-06-11 12:43	`3fdabbc`	home: /benchmarks living run checklists
2026-06-11 09:54	`2854e15`	home: /comparison vs 24 alternatives
2026-06-11 09:36	`fb2a617`	W1-1: show standing on claim pages
2026-06-11 08:53	`7b4b474`	W0-5: snapshot live web units
2026-06-11 08:32	`2818bf7`	home: /questions deep-research briefs
2026-06-10 16:18	`74d51a1`	genealogy: reports, PDF/TeX routes, people/person/source browsers
2026-06-10 16:18	`0d5c996`	admin: distributed-embedding tracker and worker error panel
2026-06-10 16:17	`f27efeb`	home: styleguide
2026-06-10 16:17	`3e9fe6b`	home: docs
2026-06-10 16:17	`2e914d5`	ui: rebrand and component library

Bottom line

In the last 48 hours, donto gained a much stronger governed-memory spine: safety guards, provenance, object storage, standing, lenses, review/argument machinery, loss reporting, rule agendas, loop routes, query/interop expansion, conformance checks, agent discovery, reputation overlays, hypothesis tournaments, export closures, and policy ordering.

donto.org gained the public scaffolding needed to make that work inspectable: docs, comparison pages, benchmark boards, a reports index, detailed LoCoMo/Hyades reports, a corrected future substrate report route, and now this change report.

The future work is to make these capabilities stable enough that external agents and independent implementations can rely on them without needing private context from the current codebase.