BEAM-10M — run checklist
Goal: the full BEAM-10M run (Hindsight/Vectorize's benchmark; 10 conversations × ~13.3M tokens; ~200 questions across tiers to 10M tokens), scored donto-memory vs Hindsight 64.1% — donto's hardest agent-memory stress test.
Reference (Hindsight, arXiv 2512.12818): BEAM-10M 64.1% (vs Honcho 40.6%, RAG baseline 24.9%). The benchmark itself is Tavakoli et al. (arXiv 2510.27246, ICLR 2026).
Status: ✅ done · 🔄 in progress · ⏸ paused · ⬜ not started. Last updated 2026-06-11. STATUS: PAUSED — box dedicated to the LoCoMo + LongMemEval Zep comparison.
Partial result (honest, in a deliberately under-built state)
donto scored 0.684 vs Hindsight's 0.641 on identical questions/metrics — but at ~5% claim coverage and ~4% vectors at run time. This is a floor, not a like-for-like: most of BEAM hadn't been ingested into the claim layer when it ran. Treat it as "donto clears the bar even mostly-empty," not a final number.
Foundation
- ✅ Dataset — 10 conversations, slim JSON loader.
- ✅ Firehose ingest — episodic chunks via asyncpg COPY (~36× faster than /memorize), bitemporal dates carried through.
Build the layer (the long haul)
- ⏸ Chunk embeddings to 100% — ~279K of ~1.27M embedded (~22%). Paused. This is the job for the contributor fleet (donto.org/help) — the auto-enqueue fleet-starvation bug was fixed 2026-06-11, so memory_chunk work now flows to workers.
- ⏸ Claim extraction —
donto-agent-beam, holo3.1, intoctx:claims/beam. ~24K chunks done of ~337K queued. Paused. - ⬜ Reconcile — verified claims in
donto_statement, no contention-empties (done≠ extracted).
Folding + scoring
- ⬜ Predicate + entity embeddings for BEAM claims, then alignment closure.
- ⬜ Scored answer run — codex answer phase (~200 Q).
- ⬜ Empty-instance baseline — the no-leakage control (proves recall, not memorization).
Why paused: LoCoMo + LongMemEval is the current priority (smaller corpora, faster signal, same philosophy under test). BEAM resumes after — the firehose ingest + fixed embed fleet make a full build tractable. → all benchmarks