# Reports

Working research notes from building donto — extraction engineering, benchmark studies, substrate design. Living documents, updated as the work progresses.

_Last updated: 2026-07-02._

| report | what it covers |
|---|---|
| **[Hyades/holo3.1: the empty `tool_calls.arguments` bug — a gateway diagnostic](/reports/hyades-holo31-empty-tool-arguments-2026-07-02)** | A full technical diagnostic for the Hyades gateway operator: **~1 in 3 structured calls to holo3.1 returns HTTP 200 with a well-formed tool-call envelope whose `arguments` is empty — while `usage` bills 600–1,000+ completion tokens.** Production-scale measurement (1,064 calls / 204 runs in a day: 43% broad-pass vs 30% nudge empties, 2× input-size correlation, flat by hour, load-independent), a serial 8-call probe reproducing at 37.5% with raw captured bodies (success vs failure side-by-side), everything ruled out (schema, budgets, load, tool_choice form, transport), ranked harness hypotheses (tool-call parser drop is #1), the server-side fix wishlist, the client-side mitigations already shipped (dominated-rung ladder fix: worst-case slot burn halved), and a copy-paste repro. Fixing it recovers ~⅓ of the lane's effective throughput |
| **[📄 donto — a contradiction-preserving substrate for knowledge (LaTeX PDF treatise)](/reports/donto-treatise)** | **The consolidated, citable account of the two-day run as a 6-page PDF.** What donto *is* (bitemporal, paraconsistent, evidence-first claim substrate for the age of generative abundance); the instrument (reader-ladder; the definitive full **1,539-Q LoCoMo at 53.3%**, reader-bound); what was built (full LongMemEval claim layer; the **claim + evidence-span** recall unit, validated and *deployed* to `/recall`; the 19h-runaway kill + disk save); the **self-correction** (cross-reader check refuted "claims beat the dump" — robust wins are token-efficiency 3.5–4.8× + claims+spans > bare-claims; the accuracy win needs the non-fitting regime, an honest tie in the proxy); and a frank answer to *can donto be the future of AI and knowledge* — what's warranted vs the larger epistemic-OS bet. Every number measured, not asserted. |
| **[Claim + evidence-span recall — the LongMemEval pivot, and an honest LoCoMo ceiling](/reports/claim-evidence-span-recall)** | The definitive full run + the substrate change it produced. **Full LoCoMo, all 1,539 questions, cheap `flash-lite` reader = 53.3%** (single-hop 0.73 — the reader reads; inference categories 0.27–0.34 — the reader refuses to infer), framed against the reader-ladder (same dump: flash-lite 45% → deepseek 75% → gpt-5.4 86%): **LoCoMo is reader/benchmark-capped, not donto-capped**, and since its conversations *fit* in context the dump saturates retrieval so donto can't shine there either way. The pivot to **LongMemEval**: the right recall unit is the **claim + its evidence span** (lifts accuracy on both readers; atomized triples hurt synthesis, the anchored snippet restores it; `valid_time` was *not* the bottleneck — an honest negative) — but a **cross-reader check corrected the headline**: claims+spans beat the dump on temporal only with a *weak* reader; a *strong* reader (GLM 1.00) wins on a dump that fits. The reader-independent win is **token-efficiency (3.5–4.8× fewer)**; the accuracy win needs the non-fitting `_m` split. Honest science over a clean story. Now **deployed**: donto-memory `/recall` ships an `agent-claims` arm (`2535ab7`) surfacing `ctx:claims/<corpus>/<chunk>` claims+spans, verified live. Plus the ops: a 19h runaway query killed, disk 99%→39% + cron, the GIN-insert perf wall. |
| **[Every LoCoMo failure, and why it fails](/reports/locomo-failure-analysis)** | A per-question post-mortem of donto best LoCoMo run (turn-recall depth 100, **75.8%**). All 48 failures classified by root cause: **~60% reader/judge-bound** (the free reader won inference, or the judge rejected a correct relative-date answer — true accuracy is higher), **~40% recall-bound** (donto lever). Key actionable pattern: multi-hop failures are almost all **list/aggregation** questions (reader returns a subset) → pre-joined aggregate claims are the next experiment. Honest map of which levers are donto and which are the reader/grader. |
| **[What moved LoCoMo: the claim layer — and where the sheaf comes in](/reports/locomo-claim-layer)** | An honest accounting of how the recent substrate program changed the LoCoMo benchmark. **Semantic claim recall — enabled by taking claim-embedding coverage to 100% — lifted claims-only LoCoMo 2.4× (15.6% → 37.5%) at *fewer* tokens** (913 vs 1,435), and the **bitemporal** claim layer drove temporal accuracy **18% → 72%**. Claims-only answers at **~1/21 the context** of episodic stuffing (913 vs 19,029 tok) — the scaling story. Reported faithfully: the **sheaf contradiction-math is not yet in the LoCoMo recall path** and did not produce these numbers; it's the teed-up next lever (contradiction-pressure-aware reranking for knowledge-update; claim-graph bridging for multi-hop). Plus the no-shortcuts road to 100% |
| **[GPU acceleration for donto — a brief for an implementing friend](/reports/gpu-acceleration-brief)** | A practical hand-off for someone with a GPU. Two workloads: (1) **turnkey embedding backfill** — ~42M claims need `bge-small` (384-dim) vectors, ~2–4/sec on our CPU vs hundreds–thousands/sec on GPU; a ready-made distributed worker (`github thomasdavis/donto-embed-worker`, `donto.org/embed` queue, `fastembed-gpu`+`EMBED_CUDA=1`) means plug-and-play, no donto internals; (2) **sheaf NSD training** — GPU sweeps over exported subgraphs (JSON in / checkpoints out, no DB access), parity-gated into Rust inference. Includes hardware/cost guidance + a one-paragraph version to forward |
| **[Is the Internet an extension of human memory?](/reports/internet-as-extended-memory)** | A reading of a note the author wrote in **2011** — catching himself storing *search actions* instead of *facts*. It independently described **cognitive offloading** (Sparrow's "Google Effect," named that same year) and **transactive memory**, named the still-open creative cost (an index has coverage but no *adjacency*, and adjacency is where epiphany happens — "instead of a novel, I am a dictionary"), and proposed — almost verbatim — donto's storage contract: *store the idea first, then the path to the source.* The throughline: the 2011 personal dilemma is the exact design problem donto solves at the substrate level (evidence-anchored claims = idea + path; the episodic-vs-claims benchmark tension; the Lens/sheaf "epiphany over held premises") |
| **[Sheaf neural networks for donto](/reports/sheaf-neural-networks-for-donto)** | Research report on **sheaf neural networks** (cellular-sheaf GNNs) — arguably the missing mathematics for a contradiction-preserving substrate. Cellular sheaves put a vector space (stalk) on each node + a learned restriction map on each edge; their **cohomology computably measures whether local views glue into a consistent whole (`H⁰`) and where they can't (`H¹`)**. Maps line-for-line onto donto: claims = stalks, query-time alignment/identity = restriction maps, **paraconsistency = non-zero `H¹`**, contradiction-pressure = localized disagreement norm, multi-source fusion + contested-source reasoning = sheaf data-fusion / discourse-sheaf opinion dynamics. Built to handle **heterophily + oversmoothing** — the exact failures donto's claim graph triggers. Includes concrete build proposals + honest compute caveats |
| **[Memory benchmarks — donto's honest scorecard](/reports/memory-benchmarks-scorecard)** | A multi-day free-reader synthesis across LoCoMo + LongMemEval. The pattern: **donto's recall does its job everywhere; the accuracy limits are the reader and query-time routing, not retrieval.** Demonstrated wins — **token-efficiency** (claims+aggregates = 86% of episodic accuracy at 3.8× fewer tokens), **knowledge-update** out-of-box (0.923, bitemporal baked into the dated representation), and **targeted answer-shaping** (a distilled-preference facet lifts single-session-preference 0.70→0.767). The honest limit: answer-shaping is a *scalpel* (helps focused-fact questions, hurts synthesis), and single-hop is a routing gap — both reader/router-bound, not retrieval |
| **[The Seven Sisters across 32 cultures](/reports/seven-sisters-pleiades)** | A new example consumer: the Pleiades / Seven Sisters story (Greek, Aboriginal Australian, Native American, Polynesian, Andean, African, Near Eastern + the deep-time "oldest story" debate) saved to disk and extracted into donto as evidence-anchored claims via the `donto-agent` GLM lane. Covers the engineering (opencode 0-facts → donto-agent; **chunking ~doubles fact density**; the 429-throttle + pacing; the 0-fact-≠-done fix) and the substrate findings (~11k claims, ~3k invented predicates, the pursuit motif at **246** stmts, the lost-Pleiad puzzle at **114**, agricultural-calendar convergence, identity-as-hypothesis) |
| **[The 48-hour build, read closely](/reports/capability-vs-exercise)** | A verification pass over the change report: the 48-hour work is **real and tested** (22 migrations `0156–0177`, each a SQL capability + invariants test), but **almost entirely unexercised** — governance/horizon tables hold **1–16 demo rows** against a **~41.5M-statement** store (analogy 1, tournament 2, reputation 4; only `rule_agenda` at 12.7k is under real load). Reads it together with the LoCoMo results to name the selection function: **answer-shaping facts convert (valid_time +45%, standing, closure, aggregates), retrieval doesn't** — so the benchmark is what makes the skeleton load-bearing, and the **aggregate claim** is the next thing to build |
| **[Past 48 Hours Change Report](/reports/past-48-hours-change-report)** | Full audit of the committed work from 2026-06-10 10:14 UTC to 2026-06-12 10:14 UTC across `donto` and `donto-web`: substrate execution-plan waves, public docs, reports, benchmark analysis, route fixes, why each class of change was made, and what should come next |
| **[Future Substrate Report](/reports/future-substrate)** | What Donto is still missing to become a century-scale knowledge and memory substrate for AI systems: durable identity, deeper source objects, semantic stability, memory lifecycle, governance, cryptographic trust, federation, multimodal evidence, scale, and model integration |
| **[Single-shot vs agentic, one-call vs multi-scope](/reports/single-shot-vs-agentic-memory)** | What the benchmarks measure, corrected from Zep's page: their **94.7%** is single-*read* with **multi-scope** retrieval (5 composed searches + rerank, 5,760 tok); the plain single-call **auto-search is 86.5%** (2,680 tok). Two axes — retrieval composition vs reader agency — and which Zep number is donto's fair target |
| **[The shape of donto's return](/reports/donto-memory-return-shape)** | The full schema of the memory bundle donto hands an agent — every field (`subject·predicate·object`, the temporal `text` tag, `source`), where it comes from in the bitemporal claim graph, and the measured reason each one earned its tokens. Substrate is maximal; the return is minimal |
| **[LoCoMo Config C — claims-only recall](/reports/locomo-claims-only-config-c)** | Throwing away the dialogue and answering from the claim layer alone. Clean result: context collapses **~11×** (19,029→1,690 tok) but so does accuracy — **0.244 vs episodic 0.837**. An honest negative: the bottleneck is claim *recall* (worst on multi-hop), not claim existence. Next: semantic claim recall |
| **[Are we using Hyades effectively?](/reports/hyades-extraction-effectiveness)** | Review of how donto-agent drives the Hyades gateway for claim extraction; the token-budget finding from tuning GLM (40→520 facts/chunk); a model bake-off (incl. the streaming-mode 524 fix) + experiment plan |

_See also: [memory benchmarks status board](/benchmarks) · [open research questions](/questions) · [comparison vs the field](/comparison)._

## Full archive

_All research reports (migrated from the former genes.apexpots.com/research)._

- [admin.donto.org — Observability Audit](/reports/admin-donto-observability-audit-2026-06-09)
- [Benchmarking donto-memory on LongMemEval — a faithful study of what a memory layer actually adds](/reports/donto-longmemeval-faithful-2026-06-05)
- [Can the Substrate Connect Two Poems? A Cross-Document Test on Bukowski](/reports/donto-cross-document-bukowski-2026-06-04)
- [Claim-Substrate Report — Research Appendix (raw findings)](/reports/donto-claim-substrate-appendix-2026-06-02)
- [Connecting Two Poems in the Substrate: A Cross-Document Query Test on Bukowski](/reports/donto-two-poems-connection-2026-06-04)
- [Deconstructing a Cameron Winter Song into the donto Substrate](/reports/donto-cameron-winter-deconstruction-2026-06-04)
- [Deep-mode extraction on a 109-word Discord message yields 1000 facts](/reports/donto-deep-mode-eternal-recurrence-2026-05-31)
- [Does donto Work? — 105 Queries Against an Abundance-Extracted Knowledge Graph](/reports/donto-100-queries-2026-06-03)
- [Does the Lens Sweep Generalize? A Cross-Domain Extraction Test](/reports/donto-cross-domain-extraction-2026-06-03)
- [donto API reference](/reports/donto-api-reference-2026-06-07)
- [donto as a Claim/Discovery Substrate — Iteration 3 (Product Spec)](/reports/donto-claim-substrate-2026-06-02)
- [donto box — workers, jobs & capacity (live catalogue, 2026-06-29)](/reports/donto-infra-2026-06-29)
- [Donto Canon — The Operating System for Contested Reality](/reports/donto-canon-2026-06-10)
- [donto extraction — checkpoint (2026-06-07)](/reports/donto-extraction-checkpoint-2026-06-07)
- [donto model lab — per-model characterization, #1: holo3.1 on Hyades (2026-06-29)](/reports/donto-models-holo-2026-06-29)
- [donto — Extraction-Provider Bake-Off: Cerebras vs. z.ai vs. Codex, a Faithful 5-Way Run](/reports/donto-cerebras-bakeoff-2026-06-04)
- [donto — Generative-Abundance Knowledge Extraction: Vision, System, and a Measured Run](/reports/donto-extraction-system-2026-06-03)
- [donto — How the OpenCode Extraction Engine Works (and Where It Breaks)](/reports/opencode-extraction-2026-06-03)
- [donto — Operations Session Report (through 2026-06-09)](/reports/donto-session-report-2026-06-09)
- [donto — status snapshot (2026-05-28)](/reports/donto-status-2026-05-28)
- [donto — Substrate for Generative Abundance: Research Appendix](/reports/donto-abundance-appendix-2026-06-02)
- [donto — Substrate PRD](/reports/donto-substrate-prd-2026-05-28)
- [donto — The Engine for Contested Reality](/reports/donto-engine-for-contested-reality-2026-06-21)
- [Donto — the Epistemic Operating System for Agents (iteration 5, 2026-06-10)](/reports/donto-epistemic-os-2026-06-10)
- [donto — The Substrate for Generative Abundance](/reports/donto-abundance-2026-06-02)
- [donto's Alignment Engine — and How to Help Build It](/reports/donto-alignment-contribute-2026-06-06)
- [donto-memory deep-mode — engine reference](/reports/donto-memory-deep-mode-engine-2026-05-31)
- [donto-memory on LongMemEval — a faithful study of what a memory layer adds to a frontier reader](/reports/donto-longmemeval-study-2026-06-05)
- [donto-vision — Research Appendix (raw findings)](/reports/donto-company-vision-appendix-2026-06-01)
- [donto: A Strategy for Turning a Knowledge Substrate Into a Company](/reports/donto-company-vision-2026-06-01)
- [donto: An Evidence Operating System for Contested Knowledge](/reports/donto-paper-2026-05-28)
- [Donto: State of the Program, Novelty, and Direction Map](/reports/donto-state-of-the-program-2026-06-04)
- [Exhaustive Extraction of a Minimal Literary Text: A 40-Word Bukowski Poem Through donto](/reports/donto-poem-extraction-2026-06-04)
- [Extracting a Poem: Codex Deep-Extraction of Bukowski's *I Met a Genius*](/reports/donto-bukowski-genius-extraction-2026-06-04)
- [Extraction Engineering for Generative Abundance: Provider Rotation, Gleaning Loops, and the Coverage-not-Count Principle](/reports/donto-extraction-engineering-2026-06-04)
- [Faster Inference for donto: From ~8 Months to ~4 Weeks (Free) to ~13 Days (GPU)](/reports/donto-inference-speedup-2026-06-05)
- [Omega × donto — making the integration even better (2026-06-07)](/reports/omega-donto-integration-2026-06-07)
- [PRD — donto distributed embedding fabric (spare-machine worker)](/reports/donto-distributed-embedding-prd-2026-06-05)
- [Scaling donto to BEAM-10M — billion-token memory without sacrificing the vision](/reports/donto-beam-10m-plan-2026-06-05)
- [Seeing the Substrate: A Visual Language for Claims That Refuse to Collapse](/reports/donto-visual-language-2026-06-05)
- [Should donto build its own agentic harness? — a faster, leaner alternative to opencode for the abundance firehose](/reports/donto-agentic-harness-2026-06-05)
- [Testing donto Against LongMemEval — A Long-Term Memory Benchmark Study](/reports/donto-longmemeval-2026-06-04)
- [THE BELIEF NEBULA](/reports/donto-belief-nebula-2026-06-05)
- [The Embedding Fabric: How Pervasive Embeddings Make donto's Query-Time Vision Real](/reports/donto-embedding-fabric-2026-06-03)
- [The Lens Engine — Research Appendix (raw findings)](/reports/donto-lens-engine-appendix-2026-06-01)
- [The Lens Engine: Discovery at the Intersection of Many Apertures](/reports/donto-lens-engine-2026-06-01)
- [They're Made Out of Weights: A Dialogue, Read by a Weight](/reports/donto-made-of-weights-2026-06-04)
- [Total Extraction: Deconstructing a Source Through the Whole of Human Understanding](/reports/donto-total-extraction-2026-06-03)
- [Two Cuts of the Same Conversation: Re-chunking BEAM-10M After Six Hours](/reports/donto-beam-chunking-2026-06-05)