locomo-pilot
unknown · reader holo3.1 (hyades) · judge same. Lean view (this run predates transcript capture — question/gold reconstructed from the dataset; reader answer + judge verdict shown).
accuracy 63.6%
answered 11
correct 7
median ctx 19031 tok
multi-hop 100%
temporal 57%
open-domain 0%
11 questions · the AI conversation for each (retrieved claims → reader → judge)
conv-26_q3multi-hop✓ correct19027 ctx tok431 ms recall
Q: What did Caroline research?
gold: Adoption agencies
reader
Caroline researched two main areas:
1. **Counseling and mental health career paths** – She looked into becoming a counselor or working in mental health to support others, particularly within the LGBTQ+ community.
2. **Adoption agencies** – She researched options and processes for adopting children, eventually applying to agencies and passing interviews to become a adoptive parent.
▸ full JSON record
{
"qid": "conv-26_q3",
"category": 1,
"n_recalled": 19,
"n_claim_hits": 0,
"retrieval_ms": 431.1,
"ctx_tokens": 19027,
"hypothesis": "Caroline researched two main areas:\n\n1. **Counseling and mental health career paths** – She looked into becoming a counselor or working in mental health to support others, particularly within the LGBTQ+ community.\n2. **Adoption agencies** – She researched options and processes for adopting children, eventually applying to agencies and passing interviews to become a adoptive parent.",
"autoeval_label": true,
"judge_parsed": true,
"judge_raw": "VERDICT: CORRECT"
}conv-26_q4multi-hop✓ correct19027 ctx tok566 ms recall
Q: What is Caroline's identity?
gold: Transgender woman
reader
Caroline is a transgender woman. She's actively involved in LGBTQ+ advocacy, planning to adopt children, and interested in counseling/mental health work to help others, particularly the trans community. She expresses her identity through art and believes in authenticity and being true to oneself.
▸ full JSON record
{
"qid": "conv-26_q4",
"category": 1,
"n_recalled": 19,
"n_claim_hits": 0,
"retrieval_ms": 565.6,
"ctx_tokens": 19027,
"hypothesis": "Caroline is a transgender woman. She's actively involved in LGBTQ+ advocacy, planning to adopt children, and interested in counseling/mental health work to help others, particularly the trans community. She expresses her identity through art and believes in authenticity and being true to oneself.",
"autoeval_label": true,
"judge_parsed": true,
"judge_raw": "VERDICT: CORRECT"
}conv-26_q7multi-hop✓ correct19030 ctx tok825 ms recall
Q: What is Caroline's relationship status?
gold: Single
reader
Caroline is single.
▸ full JSON record
{
"qid": "conv-26_q7",
"category": 1,
"n_recalled": 19,
"n_claim_hits": 0,
"retrieval_ms": 824.6,
"ctx_tokens": 19030,
"hypothesis": "Caroline is single.",
"autoeval_label": true,
"judge_parsed": true,
"judge_raw": "VERDICT: CORRECT"
}conv-26_q0temporal✓ correct19032 ctx tok1194 ms recall
Q: When did Caroline go to the LGBTQ support group?
gold: 7 May 2023
reader
Caroline went to the LGBTQ support group on **May 7, 2023** (she mentioned attending it 'yesterday' in a conversation on May 8, 2023).
▸ full JSON record
{
"qid": "conv-26_q0",
"category": 2,
"n_recalled": 19,
"n_claim_hits": 0,
"retrieval_ms": 1194.1,
"ctx_tokens": 19032,
"hypothesis": "Caroline went to the LGBTQ support group on **May 7, 2023** (she mentioned attending it 'yesterday' in a conversation on May 8, 2023).",
"autoeval_label": true,
"judge_parsed": true,
"judge_raw": "VERDICT: CORRECT"
}conv-26_q1temporal✗ wrong19028 ctx tok1177 ms recall
Q: When did Melanie paint a sunrise?
gold: 2022
reader
Not enough information.
▸ full JSON record
{
"qid": "conv-26_q1",
"category": 2,
"n_recalled": 19,
"n_claim_hits": 0,
"retrieval_ms": 1177.2,
"ctx_tokens": 19028,
"hypothesis": "Not enough information.",
"autoeval_label": false,
"judge_parsed": true,
"judge_raw": "VERDICT: WRONG"
}conv-26_q10temporal✓ correct19035 ctx tok1207 ms recall
Q: How long has Caroline had her current group of friends for?
gold: 4 years
reader
Four years.
▸ full JSON record
{
"qid": "conv-26_q10",
"category": 2,
"n_recalled": 19,
"n_claim_hits": 0,
"retrieval_ms": 1207.4,
"ctx_tokens": 19035,
"hypothesis": "Four years.",
"autoeval_label": true,
"judge_parsed": true,
"judge_raw": "VERDICT: CORRECT"
}conv-26_q5temporal✗ wrong19029 ctx tok790 ms recall
Q: When did Melanie run a charity race?
gold: The sunday before 25 May 2023
reader
May 20, 2023 (Saturday before the May 25, 2023 conversation)
▸ full JSON record
{
"qid": "conv-26_q5",
"category": 2,
"n_recalled": 19,
"n_claim_hits": 0,
"retrieval_ms": 789.5,
"ctx_tokens": 19029,
"hypothesis": "May 20, 2023 (Saturday before the May 25, 2023 conversation)",
"autoeval_label": false,
"judge_parsed": true,
"judge_raw": "VERDICT: WRONG"
}conv-26_q6temporal✓ correct19031 ctx tok311 ms recall
Q: When is Melanie planning on going camping?
gold: June 2023
reader
**Camping Plan:** In the conversation on 25 May 2023, Melanie said, "My kids are so excited about summer break! We're thinking about going camping next month." This indicates a plan for June 2023. No more recent specific future camping dates are mentioned in the retrieved memories.
▸ full JSON record
{
"qid": "conv-26_q6",
"category": 2,
"n_recalled": 19,
"n_claim_hits": 0,
"retrieval_ms": 311.2,
"ctx_tokens": 19031,
"hypothesis": "**Camping Plan:** In the conversation on 25 May 2023, Melanie said, \"My kids are so excited about summer break! We're thinking about going camping next month.\" This indicates a plan for June 2023. No more recent specific future camping dates are mentioned in the retrieved memories.",
"autoeval_label": true,
"judge_parsed": true,
"judge_raw": "VERDICT: CORRECT"
}conv-26_q8temporal✓ correct19031 ctx tok828 ms recall
Q: When did Caroline give a speech at a school?
gold: The week before 9 June 2023
reader
Caroline gave a speech at a school in early June 2023, specifically the week before June 9, 2023 when she told Melanie about it. She talked about her transgender journey and encouraged students to get involved in the LGBTQ community.
▸ full JSON record
{
"qid": "conv-26_q8",
"category": 2,
"n_recalled": 19,
"n_claim_hits": 0,
"retrieval_ms": 827.9,
"ctx_tokens": 19031,
"hypothesis": "Caroline gave a speech at a school in early June 2023, specifically the week before June 9, 2023 when she told Melanie about it. She talked about her transgender journey and encouraged students to get involved in the LGBTQ community.",
"autoeval_label": true,
"judge_parsed": true,
"judge_raw": "VERDICT: CORRECT"
}conv-26_q9temporal✗ wrong19036 ctx tok1926 ms recall
Q: When did Caroline meet up with her friends, family, and mentors?
gold: The week before 9 June 2023
reader
Not enough information.
▸ full JSON record
{
"qid": "conv-26_q9",
"category": 2,
"n_recalled": 19,
"n_claim_hits": 0,
"retrieval_ms": 1925.9,
"ctx_tokens": 19036,
"hypothesis": "Not enough information.",
"autoeval_label": false,
"judge_parsed": true,
"judge_raw": "VERDICT: WRONG"
}conv-26_q2open-domain✗ wrong19036 ctx tok1193 ms recall
Q: What fields would Caroline be likely to pursue in her educaton?
gold: Psychology, counseling certification
reader
Caroline would likely pursue education in counseling and mental health fields, with a focus on working with transgender individuals and the LGBTQ+ community.
▸ full JSON record
{
"qid": "conv-26_q2",
"category": 3,
"n_recalled": 19,
"n_claim_hits": 0,
"retrieval_ms": 1193,
"ctx_tokens": 19036,
"hypothesis": "Caroline would likely pursue education in counseling and mental health fields, with a focus on working with transgender individuals and the LGBTQ+ community.",
"autoeval_label": false,
"judge_parsed": true,
"judge_raw": "VERDICT: WRONG"
}