Reports
Working research notes from building donto — extraction engineering, benchmark studies, substrate design. Living documents, updated as the work progresses.
Last updated: 2026-06-12.
| report | what it covers |
|---|---|
| Is the Internet an extension of human memory? | A reading of a note the author wrote in 2011 — catching himself storing search actions instead of facts. It independently described cognitive offloading (Sparrow's "Google Effect," named that same year) and transactive memory, named the still-open creative cost (an index has coverage but no adjacency, and adjacency is where epiphany happens — "instead of a novel, I am a dictionary"), and proposed — almost verbatim — donto's storage contract: store the idea first, then the path to the source. The throughline: the 2011 personal dilemma is the exact design problem donto solves at the substrate level (evidence-anchored claims = idea + path; the episodic-vs-claims benchmark tension; the Lens/sheaf "epiphany over held premises") |
| Sheaf neural networks for donto | Research report on sheaf neural networks (cellular-sheaf GNNs) — arguably the missing mathematics for a contradiction-preserving substrate. Cellular sheaves put a vector space (stalk) on each node + a learned restriction map on each edge; their cohomology computably measures whether local views glue into a consistent whole (H⁰) and where they can't (H¹). Maps line-for-line onto donto: claims = stalks, query-time alignment/identity = restriction maps, paraconsistency = non-zero H¹, contradiction-pressure = localized disagreement norm, multi-source fusion + contested-source reasoning = sheaf data-fusion / discourse-sheaf opinion dynamics. Built to handle heterophily + oversmoothing — the exact failures donto's claim graph triggers. Includes concrete build proposals + honest compute caveats |
| Memory benchmarks — donto's honest scorecard | A multi-day free-reader synthesis across LoCoMo + LongMemEval. The pattern: donto's recall does its job everywhere; the accuracy limits are the reader and query-time routing, not retrieval. Demonstrated wins — token-efficiency (claims+aggregates = 86% of episodic accuracy at 3.8× fewer tokens), knowledge-update out-of-box (0.923, bitemporal baked into the dated representation), and targeted answer-shaping (a distilled-preference facet lifts single-session-preference 0.70→0.767). The honest limit: answer-shaping is a scalpel (helps focused-fact questions, hurts synthesis), and single-hop is a routing gap — both reader/router-bound, not retrieval |
| The Seven Sisters across 32 cultures | A new example consumer: the Pleiades / Seven Sisters story (Greek, Aboriginal Australian, Native American, Polynesian, Andean, African, Near Eastern + the deep-time "oldest story" debate) saved to disk and extracted into donto as evidence-anchored claims via the donto-agent GLM lane. Covers the engineering (opencode 0-facts → donto-agent; chunking ~doubles fact density; the 429-throttle + pacing; the 0-fact-≠-done fix) and the substrate findings (~11k claims, ~3k invented predicates, the pursuit motif at 246 stmts, the lost-Pleiad puzzle at 114, agricultural-calendar convergence, identity-as-hypothesis) |
| The 48-hour build, read closely | A verification pass over the change report: the 48-hour work is real and tested (22 migrations 0156–0177, each a SQL capability + invariants test), but almost entirely unexercised — governance/horizon tables hold 1–16 demo rows against a ~41.5M-statement store (analogy 1, tournament 2, reputation 4; only rule_agenda at 12.7k is under real load). Reads it together with the LoCoMo results to name the selection function: answer-shaping facts convert (valid_time +45%, standing, closure, aggregates), retrieval doesn't — so the benchmark is what makes the skeleton load-bearing, and the aggregate claim is the next thing to build |
| Past 48 Hours Change Report | Full audit of the committed work from 2026-06-10 10:14 UTC to 2026-06-12 10:14 UTC across donto and donto-web: substrate execution-plan waves, public docs, reports, benchmark analysis, route fixes, why each class of change was made, and what should come next |
| Future Substrate Report | What Donto is still missing to become a century-scale knowledge and memory substrate for AI systems: durable identity, deeper source objects, semantic stability, memory lifecycle, governance, cryptographic trust, federation, multimodal evidence, scale, and model integration |
| Single-shot vs agentic, one-call vs multi-scope | What the benchmarks measure, corrected from Zep's page: their 94.7% is single-read with multi-scope retrieval (5 composed searches + rerank, 5,760 tok); the plain single-call auto-search is 86.5% (2,680 tok). Two axes — retrieval composition vs reader agency — and which Zep number is donto's fair target |
| The shape of donto's return | The full schema of the memory bundle donto hands an agent — every field (subject·predicate·object, the temporal text tag, source), where it comes from in the bitemporal claim graph, and the measured reason each one earned its tokens. Substrate is maximal; the return is minimal |
| LoCoMo Config C — claims-only recall | Throwing away the dialogue and answering from the claim layer alone. Clean result: context collapses ~11× (19,029→1,690 tok) but so does accuracy — 0.244 vs episodic 0.837. An honest negative: the bottleneck is claim recall (worst on multi-hop), not claim existence. Next: semantic claim recall |
| Are we using Hyades effectively? | Review of how donto-agent drives the Hyades gateway for claim extraction; the token-budget finding from tuning GLM (40→520 facts/chunk); a model bake-off (incl. the streaming-mode 524 fix) + experiment plan |
See also: memory benchmarks status board · open research questions · comparison vs the field.