# BEAM-10M — run checklist

**Goal:** the full **BEAM-10M** run (Hindsight/Vectorize's benchmark; 10 conversations × ~13.3M tokens; ~200 questions across tiers to 10M tokens), scored donto-memory vs **Hindsight 64.1%** — donto's hardest agent-memory stress test.

**Reference (Hindsight, arXiv 2512.12818):** BEAM-10M **64.1%** (vs Honcho 40.6%, RAG baseline 24.9%). The benchmark itself is Tavakoli et al. (arXiv 2510.27246, ICLR 2026).

_Status: ✅ done · 🔄 in progress · ⏸ paused · ⬜ not started. Last updated 2026-06-11._ **STATUS: PAUSED** — box dedicated to the LoCoMo + LongMemEval Zep comparison.

---

## Partial result (honest, in a deliberately under-built state)

donto scored **0.684** vs Hindsight's **0.641** on identical questions/metrics — **but at ~5% claim coverage and ~4% vectors at run time.** This is a *floor*, not a like-for-like: most of BEAM hadn't been ingested into the claim layer when it ran. Treat it as "donto clears the bar even mostly-empty," not a final number.

## Foundation

- ✅ **Dataset** — 10 conversations, slim JSON loader.
- ✅ **Firehose ingest** — episodic chunks via asyncpg COPY (~36× faster than /memorize), bitemporal dates carried through.

## Build the layer (the long haul)

- ⏸ **Chunk embeddings to 100%** — ~279K of ~1.27M embedded (~22%). Paused. *This is the job for the contributor fleet (donto.org/help) — the auto-enqueue fleet-starvation bug was fixed 2026-06-11, so memory_chunk work now flows to workers.*
- ⏸ **Claim extraction** — `donto-agent-beam`, holo3.1, into `ctx:claims/beam`. ~24K chunks done of ~337K queued. Paused.
- ⬜ **Reconcile** — verified claims in `donto_statement`, no contention-empties (`done` ≠ extracted).

## Folding + scoring

- ⬜ **Predicate + entity embeddings** for BEAM claims, then alignment closure.
- ⬜ **Scored answer run** — codex answer phase (~200 Q).
- ⬜ **Empty-instance baseline** — the no-leakage control (proves recall, not memorization).

---

_Why paused: LoCoMo + LongMemEval is the current priority (smaller corpora, faster signal, same philosophy under test). BEAM resumes after — the firehose ingest + fixed embed fleet make a full build tractable. → [all benchmarks](/benchmarks)_
