Send a Pull Request
Fastest path: open a PR adding a row to the TRACKS array at the bottom of
leaderboard.html. Include a short description of the setup and a link to your run logs
or code in the PR body.
Submitted results on ATM-Bench, ATM-Bench-Hard, and the NIAH-100 long-context stress test. Use the tabs to switch boards, the chips to filter by system type, and click any column header to sort.
- indicates the field has not been reported by the submitter.
Memory Model is the LLM used to construct the memory store;
Retriever is the embedding model used at query time.
Click any column header to sort; click a filter chip to narrow by system type.
- indicates the field has not been reported.
Long-context stress test on ATM-Bench-Hard with a 100-item distractor pool.
This board is a placeholder while more NIAH-100 submissions are collected. Submit yours below.
We welcome new submissions across all three boards. To keep the leaderboard credible, please include reproduction details (system type, harness, model + version, code or commit, total token cost when applicable).
Fastest path: open a PR adding a row to the TRACKS array at the bottom of
leaderboard.html. Include a short description of the setup and a link to your run logs
or code in the PR body.
Prefer not to send a PR? File an issue with your system type, harness, scores, and a reproduction pointer. We will add the row on your behalf.
Open submission issueThis page was adopted from the Nerfies project page, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Many thanks to the Academic Project Page Template.