Index cards of companies on a British 'black list' in the 1940s. Assesses models' capability to recognize typed and handwritten information from index cards.
Dataset Description Result Overview Test Runs
This benchmark has been run 106 times. It uses fuzzy metric.
Tested providers: x-ai, openrouter, genai, openai, anthropic, mistral, alibaba, scicore
Tested models: claude-haiku-4-5-20251001, google/gemma-4-26b-a4b-it, qwen3.5-397b-a17b, qwen/qwen3.5-122b-a10b, gemini-2.0-flash, claude-opus-4-6, mistral-large-2411, qwen3.5-35b-a3b, claude-sonnet-4-5-20250929, mistral-small-2506, claude-3-7-sonnet-20250219, grok-4.20-0309-reasoning, gpt-5-nano, gemini-2.0-flash-lite, mistral-medium-2505, qwen/qwen3.5-flash-02-23, claude-opus-4-20250514, gemini-2.5-flash-lite, o3, meta-llama/llama-4-maverick, gemini-2.5-flash-preview-09-2025, qwen/qwen3-vl-8b-instruct, gpt-4o-mini, ministral-14b-2512, mistral-medium-2508, gemini-2.5-flash-lite-preview-09-2025, qwen3.5-122b-a10b, pixtral-12b, claude-opus-4-5-20251101, magistral-medium-2509, claude-opus-4-1-20250805, gpt-5.5-2026-04-23, gpt-5.4-2026-03-05, gemini-3.1-pro-preview, gpt-5, gemini-2.5-pro, gpt-5.2-2025-12-11, qwen3.5-27b, qwen/qwen3.5-27b, claude-sonnet-4-20250514, gemini-3.1-flash-lite-preview, qwen/qwen3-vl-8b-thinking, gpt-5-mini, claude-opus-4-7, gpt-4.1, magistral-small-2509, qwen3.5-flash-2026-02-23, gemini-2.5-flash, qwen/qwen3.5-35b-a3b, qwen/qwen3.5-397b-a17b, qwen/qwen3.5-9b, gpt-4.1-mini, qwen/qwen3-vl-30b-a3b-instruct, claude-3-5-sonnet-20241022, qwen3.5-plus-2026-02-15, claude-sonnet-4-6, qwen/qwen3.6-plus, gpt-4.1-nano, gpt-4o, gpt-5.1-2025-11-13, gemini-3-flash-preview, pixtral-large-2411, mistral-large-2512, ministral-8b-2512, gpt-5.3-codex, GLM-4.5V-FP8, google/gemma-4-31b-it, gemini-3-pro-preview, claude-3-opus-20240229, qwen/qwen3.5-plus-02-15, x-ai/grok-4
| Score | Date | Provider | Model |
|---|---|---|---|
| 85.96 | 3 weeks ago | openai | gpt-5.5-2026-04-23 |
| 85.30 | 4 weeks ago | openrouter | qwen/qwen3.5-9b |
| 88.64 | 1 month ago | openrouter | qwen/qwen3.5-plus-02-15 |
| 91.75 | 1 month ago | openrouter | google/gemma-4-26b-a4b-it |
| 92.48 | 1 month ago | openrouter | qwen/qwen3.6-plus |
| Role | Contributors |
|---|---|
| Domain expert | Lea Kasper |
| Data curator | Sorin Marti |
| Annotator | Lea Kasper, Sorin Marti |
| Analyst | Lea Kasper, Sorin Marti |
| Engineer | Sorin Marti |