Past 48 hours change report - what changed and why
Living document. Window audited: 2026-06-10 10:14:50 UTC -> 2026-06-12 10:14:50 UTC. This report covers the committed work I made across the main donto substrate repository and the public donto-web site during that window. For the full reports index, see Reports.
The short version: the last 48 hours turned donto from a strong claim-store kernel into a broader, documented substrate prototype. The work landed in two places:
donto: 42 commits, 211 files changed, 29,776 insertions, 1,014 deletions. This completed the execution-plan waves from W0-2 through H8, then published the execution/future reports in the docs app.donto-web: 30 commits, 110 files changed, 24,144 insertions, 396 deletions. This built the public donto.org documentation, benchmark boards, comparison pages, reports system, LoCoMo analysis reports, and the corrected/reportsindex/linking behavior.
TL;DR
| area | what changed | why it mattered | status |
|---|---|---|---|
| Substrate safety | delete guard, extraction provenance, blob storage migration, live config snapshots | make memory auditable and reconstructable instead of mutable folklore | shipped in donto |
| Judgment layer | standing, lenses, review-to-argument bridge, release exports, property-driven sweeps | give claims context, pressure, exportability, and machine-checkable review paths | shipped in donto |
| Loop machinery | evidence suggestions, loss reports, canonical literals, aligned matching, rule agenda, loop protocol routes | turn evaluation feedback into substrate-native work queues and durable loss records | shipped in donto |
| Query/interop expansion | SPARQL subset, standards exports, HTTP conformance, DontoQL identity/lens support | make the substrate usable by agents and outside systems without private assumptions | shipped in donto |
| Horizon features | generic rule evaluator, A2A capability cards, reputation overlays, tournaments, analogy, calibration frames, export tier closures, policy restrictiveness order | push beyond storage into governed, agent-discoverable knowledge infrastructure | shipped in donto |
| Public surface | donto.org docs, styleguide, comparison, questions, benchmarks, reports, sitemap entries | make the work legible, inspectable, and linkable outside the repo | shipped in donto-web |
| Benchmark truthfulness | Hyades extraction review, LoCoMo Config C, single-shot vs agentic correction, return-shape report | prevent benchmark theatre; show what works, what fails, and what the numbers really mean | shipped in donto-web |
The main reason for the work was to move donto from "we can store many claims" toward "we can explain, constrain, export, audit, compare, and govern memory under pressure."
1. Substrate execution work in donto
The execution plan work landed as a sequence of small commits, each with a matching migration, code path, test, or documentation update where appropriate. The pattern was deliberate: do the smallest durable change, prove it with a check, then move to the next layer.
W0: Make the base state auditable
| commit | change | reason |
|---|---|---|
d7d047a |
Enforced an I3 statement delete guard. | Keeps claim history append-first; deleting statements becomes an explicit protected operation instead of an accidental data-loss path. |
c87926d |
Recorded extraction run provenance. | Lets later users know which model/run/settings produced extracted claims, so evidence can be traced back to a concrete run. |
63c2e42 |
Migrated blob storage to GCS. | Moves large source artifacts out of local-only assumptions and toward durable object storage. |
6bf490d |
Snapshotted live config. | Captures operational settings so later behavior can be explained instead of guessed. |
Why this wave came first: if the system cannot explain where claims came from, how blobs were stored, and what live settings were active, later "intelligence" features become unverifiable.
W1: Add lenses, standing, release shape, and review paths
| commit | change | reason |
|---|---|---|
8682a6c |
Added standing v1. | Gives claims a first maturity/standing layer instead of forcing every query to treat all claims equally. |
7aecc7e |
Added the lens registry. | Makes view definitions first-class so different contexts can ask for different slices without mutating the substrate. |
ce2399c |
Added lens query support. | Turns registered lenses into usable query constraints. |
b352601 |
Applied identity in DontoQL. | Lets DontoQL use identity-aware resolution rather than only raw string equality. |
88aa159 |
Bridged reviews to arguments. | Converts review activity into argument structure so criticism/support can be queried and preserved. |
7f47075 |
Added dataset release export. | Makes reproducible release artifacts possible from the live database. |
b430eb6 |
Drove sweeps from property constraints. | Turns property constraints into active work discovery, not passive schema decoration. |
Reasoning: this is where donto starts to become a substrate rather than a store. Claims need views, standing, review pressure, and exports, otherwise downstream agents only get undifferentiated facts.
W2: Add feedback loops and loss accounting
| commit | change | reason |
|---|---|---|
b760cf2 |
Added next-evidence suggestions. | Lets the system surface what evidence would improve a weak or contested claim. |
95d4363 |
Persisted loss reports. | Makes export/query/adapter loss explicit instead of silently pretending every representation is complete. |
d06fab0 |
Canonicalized constrained literals. | Improves matching and validation for literal values under known constraints. |
4eeb6f2 |
Folded value mappings in aligned match. | Lets aligned predicates/values resolve together during matching. |
f6d9ab4 |
Activated the rule agenda worker. | Turns rules into scheduled agenda work rather than static metadata. |
43681f7 |
Added loop protocol routes. | Exposes loop state through service routes so agents/tools can participate. |
9c250b7 |
Deepened lens semantics. | Makes lenses more expressive and closer to real view semantics. |
Reasoning: loss reports and evidence suggestions are the accountability layer. They tell an agent not just what donto knows, but where donto is weak, lossy, or ready for another pass.
W3: Expand query, contradiction, interop, and TUI surfaces
| commit | change | reason |
|---|---|---|
cb697b1 |
Added live claim frame examples. | Gives concrete examples for claim framing, not just abstract schema. |
499d336 |
Wrote release lens memberships. | Carries lens context into release artifacts. |
77f835a |
Filtered machinery claims from search. | Keeps internal machinery from polluting user-facing claim retrieval. |
d2fe27d |
Added scoped canonical shadow rebuilds. | Lets canonical material be rebuilt in scoped ways instead of broad unsafe rebuilds. |
d307d04 |
Added standards interop exports. | Makes external standards-oriented exchange possible. |
289946b |
Matured the SPARQL subset. | Improves standards-compatible query access. |
b792eec |
Gated HTTP contract conformance. | Protects API behavior with conformance tests. |
2f44d81 |
Populated standing v2 components. | Deepens standing beyond the first kernel. |
0ebd2e7 |
Honored identity cannot-links. | Prevents identity resolution from merging entities that have explicit negative evidence. |
3e23fb8 |
Composed predicate closure. | Lets predicate relationships carry through query/reasoning paths. |
eafa309 |
Folded incompatible predicates in paraconsistency. | Improves contradiction handling when predicates are related but incompatible. |
fd5aaaa |
Deepened lens algebra. | Gives lenses compositional behavior that can be tested and reasoned about. |
0df4347 |
Added grounded contradiction policy. | Makes contradiction handling policy-aware instead of ad hoc. |
2315b94 |
Surfaced observatory loops in the TUI. | Makes the feedback loops visible to operators, not just hidden in tables. |
Reasoning: this wave made donto more inspectable and interoperable. It also tightened the hard parts: identity cannot-links, predicate closure, paraconsistency, lens algebra, and API conformance.
H1-H8: Horizon features
| commit | change | reason |
|---|---|---|
78c3003 |
Added a generic rule evaluator. | Moves rule execution toward a reusable engine. |
37d1080 |
Published A2A capability cards. | Lets agents discover what the service can do instead of relying on private docs. |
fd574c0 |
Added agent reputation overlays. | Starts tracking agent trust/context as part of memory operations. |
ced0142 |
Added hypothesis tournaments. | Gives competing hypotheses a substrate-native comparison path. |
c8b7ebe |
Added a cross-domain analogy engine. | Tests whether structured memory can support analogy across domains. |
9ec690d |
Added science instrument calibration frames. | Extends the substrate toward scientific provenance and calibration use cases. |
cfd6306 |
Added export tier closures. | Makes exports respect tiered closure rules. |
72478dc |
Added policy restrictiveness order. | Lets policies be ordered by restrictiveness, which is necessary for safe combination and downgrade checks. |
Reasoning: the H-series is where the system starts to resemble AI-memory infrastructure: agent discovery, reputation, policy ordering, analogy, scientific calibration, and tournament-style hypothesis pressure.
Documentation publication in donto
| commit | change | reason |
|---|---|---|
8582b7a |
Published execution and future reports in the docs app. | Made the completed execution work and long-range substrate gap analysis visible in the documentation surface. |
ad74f07 |
Served the future report at the docs reports root. | Adjusted the docs route so the report was reachable at the expected location in that app. |
2. Public site work in donto-web
The web work did two jobs: expose donto clearly to humans, and publish honest benchmark/report pages that keep the substrate claims grounded in measured behavior.
Site foundation, docs, and product surface
| commit | change | reason |
|---|---|---|
2e914d5 |
Added the donto-green rebrand and comprehensive component library with tests. | Established a consistent visual system and reusable UI primitives. |
3e9fe6b |
Built donto.org/docs with eight ground-truth documentation pages. |
Put core concepts, schema, solving model, embeddings, evidence, and alignment docs on the public site. |
f27efeb |
Added a nine-page styleguide at /styleguide. |
Made the design system inspectable and reusable across future donto.org work. |
0d5c996 |
Added the distributed-embedding tracker page and worker error-report panel. | Improved admin observability for embedding and worker failure states. |
74d51a1 |
Added genealogy reports, PDF/TeX routes, people/person/source browsers. | Expanded another donto-backed public surface and report generator pattern. |
7b4b474 |
Snapshotted live web units. | Captured production service wiring for the web apps. |
fb2a617 |
Showed standing on claim pages. | Surfaced substrate standing directly in admin claim inspection. |
2818bf7 |
Built /questions with 15 deep-research briefs. |
Made the frontier research questions visible instead of burying them in private notes. |
2854e15 |
Built /comparison against 24 alternatives. |
Made donto's strengths and weaknesses legible against the memory/knowledge field. |
3fdabbc |
Built /benchmarks with living run checklists. |
Gave LoCoMo, LongMemEval, and BEAM benchmark work a public status board. |
b509af0 |
Added /reports and the first report, the Hyades extraction review. |
Created the public reports system and seeded it with the extraction-engine analysis. |
Reasoning: the site needed more than a landing page. It needed docs, comparison, benchmarks, reports, admin surfaces, and visual conventions so future changes have a coherent public home.
Extraction and LoCoMo benchmark reporting
| commit | change | reason |
|---|---|---|
5db829b |
Added prompt re-evaluation and model bake-off section. | Started documenting extraction quality as an experiment rather than a claim. |
4be428a |
Published bake-off A results. | Showed holo3.1 saturation and Hyades timeout behavior under a 32K setting. |
c7a285d |
Published bake-off B isolated results. | Separated gateway contention from per-request timeout behavior. |
c9b4e79 |
Published bake-off C and bottom line. | Showed 16K did not rescue heavy models and that budget affects faithfulness, not just count. |
59c3fad |
Updated LoCoMo extraction checklist: 271/272 chunks, ~187 facts/chunk, no empty chunks. | Made extraction progress visible and quantified. |
b7fb2f6 |
Prepared LoCoMo claim layer/folding and report bake-off D streaming fix. | Documented that streaming changes the Hyades architecture by avoiding silent 524s. |
278e674 |
Published LoCoMo Config C claims-only report. | Recorded the negative result: claims-only context shrank heavily but accuracy collapsed. |
bfe5c5f |
Scored LoCoMo Config C: claims-only 0.244 vs episodic 0.837. | Made the bottleneck explicit: recall/readability, not merely claim existence. |
0dd4966 |
Added LoCoMo run table and per-run AI transcript thread view. | Made benchmark runs navigable and inspectable. |
ef71d79 |
Added honest hybrid result and temporal valid-time enrichment. | Captured that temporal enrichment helps, while hybrid assumptions needed measurement. |
871d7b1 |
Recorded clean verdict that cosine beat hybrid and reverted to cosine. | Avoided keeping a more complex retrieval path after it underperformed. |
68c5f2e |
Updated Config C: bitemporal valid_time lifted temporal accuracy 0.50 -> 0.725. |
Showed a real donto-specific advantage from valid-time modeling. |
387272e |
Updated Config C passage recall to 0.650 and mapped the cheap-regime ceiling. | Showed prose passages beat triples for weak readers and clarified the gap to Zep-style multi-scope rerank. |
Reasoning: these reports matter because they stop the system from overclaiming. donto can hold rich claims, but the benchmark showed the reader needs the right return shape, temporal tags, and better recall/reranking before the substrate wins cleanly.
Reports, benchmark interpretation, and the corrected /reports index
| commit | change | reason |
|---|---|---|
2bfc53d |
Published "The shape of donto's return." | Documented the memory-bundle schema and why each returned field exists. |
38d00ba |
Published "Single-shot vs agentic memory." | Compared benchmark methodology and clarified what leaderboard numbers measure. |
ef54d87 |
Corrected the single-shot vs agentic report after verifying Zep's methodology. | Fixed the interpretation: Zep's 94.7% is single-read with multi-scope retrieval; auto-search is 86.5%. |
7e38d8e |
Linked the future substrate report from the /reports index. |
Fixed the route mistake by making /reports remain the index and putting the future report on its own slug. |
37b1575 |
Added the return-shape facet checklist. | Mapped roughly 20 return facets, which ones were tested, and which benchmark problems they target. |
d883438 |
Restyled the future substrate report. | Brought the report closer to the rest of the site: TL;DR, scorecard, tables, roadmap, and clearer section rhythm. |
Reasoning: /reports must be an index, not a single report. The correction split the future substrate report onto /reports/future-substrate, kept /reports as the report directory, and then improved the future report formatting to match the other public notes.
3. Why these changes fit together
The work was not random feature accumulation. It follows one spine:
source bytes
-> extracted claims with run provenance
-> evidence links, valid time, policy, identity, standing
-> contradiction-aware lenses and query surfaces
-> loss reports, releases, and conformance checks
-> public reports that say what worked, failed, and remains missing
The substrate side makes the memory more accountable. The public-site side makes the accountability visible.
The most important design decisions were:
| decision | reason |
|---|---|
| Append-first memory over destructive correction | AI memory needs correction history, not silent overwrite. |
| Bitemporal facts over plain timestamps | The system must distinguish when something was true from when donto learned it. |
| Paraconsistency over single-winner truth | Real memory contains conflict; deleting disagreement creates false certainty. |
| Lenses over global views | Different tasks need different views without corrupting the underlying store. |
| Standing and loss reports over raw confidence theatre | Agents need to know maturity, pressure, and what was lost in translation. |
| Public reports over private conclusions | Benchmark and architecture claims should be inspectable, linkable, and revisable. |
4. What changed for users and agents
| audience | before | after |
|---|---|---|
| Human readers | mostly repo-local knowledge and scattered operational context | public docs, reports, comparison, questions, benchmark pages, and a styled report index |
| Operators | more hidden service/config/run state | config snapshots, live unit docs, benchmark checklists, admin worker/error surfaces |
| Agents | fewer explicit discovery and loop affordances | A2A capability cards, loop routes, rule agenda, evidence suggestions, loss reports |
| Benchmark reviewers | easy to overread claims-only and leaderboard numbers | explicit LoCoMo results, Zep methodology correction, return-shape checklist |
| Future implementers | fewer contracts and examples | conformance gates, standards exports, docs pages, reports, and migration-backed substrate features |
5. What the future should look like from here
The next phase should be less about adding isolated powers and more about making the powers boring, versioned, and externally trustworthy.
| horizon | next shape |
|---|---|
| Near term | Keep /reports current; stabilize memory bundle return shapes; add stronger cross-encoder/reranker experiments; make LoCoMo run artifacts easier to inspect; finish cleaning any local benchmark work before publishing it. |
| Kernel hardening | Version public APIs, expand conformance tests, keep migration tests close to every substrate rule, and make standing/loss/lens behavior cheap to query. |
| Federation | Sign release/query artifacts, preserve policy/loss through exchange, add remote release discovery, and make cross-instance identity/predicate alignment explicit. |
| Governance | Add consent receipts, appeal trails, review workbenches, policy simulation, and community-level authority models. |
| Century substrate path | Move toward independent implementations, archival test vectors, signed memory envelopes, formal policy/lens proofs, multimodal source packages, and long-horizon migration discipline. |
6. Caveats and boundaries
- This report covers committed work visible in local git history during the audited 48-hour window.
- It does not claim unrelated dirty local benchmark files are finished. At the time this report was written, there were uncommitted local changes under the benchmark run surface; those were intentionally not included in this report commit.
- Some public benchmark conclusions are explicitly provisional. The LoCoMo reports already say where the current cheap regime tops out and where reranking/reader strength remain blocked or unproven.
- The report index mistake was corrected:
/reportsis the index, and individual reports live under their own slugs.
7. Commit ledger
donto ledger
| time UTC | commit | subject |
|---|---|---|
| 2026-06-12 08:54 | ad74f07 |
docs: serve future report at reports root |
| 2026-06-12 08:24 | 8582b7a |
docs: publish execution and future reports |
| 2026-06-12 02:03 | 72478dc |
H8: add policy restrictiveness order |
| 2026-06-12 01:55 | cfd6306 |
H7: add export tier closures |
| 2026-06-12 01:47 | 9ec690d |
H6: add science instrument calibration frames |
| 2026-06-12 01:31 | c8b7ebe |
H5: add cross-domain analogy engine |
| 2026-06-12 01:25 | ced0142 |
H4: add hypothesis tournaments |
| 2026-06-12 01:15 | fd574c0 |
H3: add agent reputation overlays |
| 2026-06-12 01:01 | 37d1080 |
H2: publish A2A capability cards |
| 2026-06-12 00:42 | 78c3003 |
H1: add generic rule evaluator |
| 2026-06-12 00:25 | 2315b94 |
W3-14: surface observatory loops in TUI |
| 2026-06-12 00:01 | 0df4347 |
W3-13: add grounded contradiction policy |
| 2026-06-11 23:43 | fd5aaaa |
W3-12: deepen lens algebra |
| 2026-06-11 23:25 | eafa309 |
W3-11: fold incompatible predicates in paraconsistency |
| 2026-06-11 22:56 | 3e23fb8 |
W3-10: compose predicate closure |
| 2026-06-11 22:16 | 0ebd2e7 |
W3-9: honor identity cannot-links |
| 2026-06-11 22:02 | 2f44d81 |
W3-8: populate standing v2 components |
| 2026-06-11 21:40 | b792eec |
W3-7: gate HTTP contract conformance |
| 2026-06-11 21:20 | 289946b |
W3-6: mature SPARQL subset |
| 2026-06-11 20:38 | d307d04 |
W3-5: add standards interop exports |
| 2026-06-11 20:03 | d2fe27d |
W3-4: add scoped canonical shadow rebuilds |
| 2026-06-11 19:38 | 77f835a |
W3-3: filter machinery claims from search |
| 2026-06-11 19:22 | 499d336 |
W3-2: write release lens memberships |
| 2026-06-11 19:12 | cb697b1 |
W3-1: add live claim frame examples |
| 2026-06-11 18:49 | 9c250b7 |
W2-7: deepen lens semantics |
| 2026-06-11 18:29 | 43681f7 |
W2-6: add loop protocol routes |
| 2026-06-11 17:56 | f6d9ab4 |
W2-5: activate rule agenda worker |
| 2026-06-11 17:44 | 4eeb6f2 |
W2-4: fold value mappings in aligned match |
| 2026-06-11 17:29 | d06fab0 |
W2-3: canonicalize constrained literals |
| 2026-06-11 17:17 | 95d4363 |
W2-2: persist loss reports |
| 2026-06-11 12:44 | b760cf2 |
W2-1: add next evidence suggestions |
| 2026-06-11 12:20 | b430eb6 |
W1-7: drive sweep from property constraints |
| 2026-06-11 11:55 | 7f47075 |
W1-6: add dataset release export |
| 2026-06-11 10:54 | 88aa159 |
W1-5: bridge reviews to arguments |
| 2026-06-11 10:35 | b352601 |
W1-4: apply identity in DontoQL |
| 2026-06-11 10:12 | ce2399c |
W1-3: add lens query |
| 2026-06-11 09:45 | 7aecc7e |
W1-2: add lens registry |
| 2026-06-11 09:35 | 8682a6c |
W1-1: add standing v1 |
| 2026-06-11 08:52 | 6bf490d |
W0-5: snapshot live config |
| 2026-06-11 08:43 | 63c2e42 |
W0-4: migrate blob storage to GCS |
| 2026-06-11 08:17 | c87926d |
W0-3: record extraction run provenance |
| 2026-06-11 07:51 | d7d047a |
W0-2: enforce I3 statement delete guard |
donto-web ledger
| time UTC | commit | subject |
|---|---|---|
| 2026-06-12 10:11 | d883438 |
reports: restyle future substrate report |
| 2026-06-12 09:54 | 37b1575 |
return-shape report: add facet checklist |
| 2026-06-12 09:24 | 387272e |
Config C Update 4: passage recall 0.650 |
| 2026-06-12 09:18 | 7e38d8e |
reports: link future substrate report from index |
| 2026-06-12 09:05 | ef54d87 |
reports: correct single-shot-vs-agentic methodology |
| 2026-06-12 09:02 | 38d00ba |
reports: single-shot vs agentic memory |
| 2026-06-12 08:50 | 2bfc53d |
reports: the shape of donto's return |
| 2026-06-12 05:07 | 68c5f2e |
LoCoMo Config C: bitemporal valid_time lifts temporal |
| 2026-06-12 03:28 | 871d7b1 |
LoCoMo Config C: cosine beats hybrid |
| 2026-06-12 02:56 | ef71d79 |
LoCoMo Config C: hybrid result and temporal enrichment |
| 2026-06-12 01:37 | 0dd4966 |
benchmarks/locomo: runs table and transcript thread view |
| 2026-06-11 22:26 | bfe5c5f |
LoCoMo Config C scored: claims-only 0.244 vs episodic 0.837 |
| 2026-06-11 21:23 | 278e674 |
reports: LoCoMo Config C claims-only report |
| 2026-06-11 20:20 | b7fb2f6 |
LoCoMo claim layer, folding, and bake-off D streaming fix |
| 2026-06-11 17:44 | 59c3fad |
benchmarks/locomo: extraction done and checklist updated |
| 2026-06-11 17:42 | c9b4e79 |
report: bake-off C and bottom line |
| 2026-06-11 17:27 | c7a285d |
report: bake-off B isolated |
| 2026-06-11 17:13 | 4be428a |
report: bake-off A results |
| 2026-06-11 16:43 | 5db829b |
report: prompt re-evaluation and model bake-off section |
| 2026-06-11 15:49 | b509af0 |
home: /reports and first Hyades extraction review |
| 2026-06-11 12:43 | 3fdabbc |
home: /benchmarks living run checklists |
| 2026-06-11 09:54 | 2854e15 |
home: /comparison vs 24 alternatives |
| 2026-06-11 09:36 | fb2a617 |
W1-1: show standing on claim pages |
| 2026-06-11 08:53 | 7b4b474 |
W0-5: snapshot live web units |
| 2026-06-11 08:32 | 2818bf7 |
home: /questions deep-research briefs |
| 2026-06-10 16:18 | 74d51a1 |
genealogy: reports, PDF/TeX routes, people/person/source browsers |
| 2026-06-10 16:18 | 0d5c996 |
admin: distributed-embedding tracker and worker error panel |
| 2026-06-10 16:17 | f27efeb |
home: styleguide |
| 2026-06-10 16:17 | 3e9fe6b |
home: docs |
| 2026-06-10 16:17 | 2e914d5 |
ui: rebrand and component library |
Bottom line
In the last 48 hours, donto gained a much stronger governed-memory spine: safety guards, provenance, object storage, standing, lenses, review/argument machinery, loss reporting, rule agendas, loop routes, query/interop expansion, conformance checks, agent discovery, reputation overlays, hypothesis tournaments, export closures, and policy ordering.
donto.org gained the public scaffolding needed to make that work inspectable: docs, comparison pages, benchmark boards, a reports index, detailed LoCoMo/Hyades reports, a corrected future substrate report route, and now this change report.
The future work is to make these capabilities stable enough that external agents and independent implementations can rely on them without needing private context from the current codebase.
See also: Future Substrate Report · The shape of donto's return · Single-shot vs agentic memory · LoCoMo Config C.