Evaluates models' ability to transcribe and interpret personnel index cards of Swiss federal employees (1941–1961), containing typed and handwritten entries on job title, work location, pay grade, salary, and related notes in German and French.
Dataset Description Result Overview Test Runs
This benchmark has been run 59 times. It uses f1_micro metric.
Tested providers: mistral, openai, openrouter, x-ai, alibaba, genai, cohere, anthropic
Tested models: ministral-14b-2512, qwen3.5-397b-a17b, pixtral-large-2411, claude-opus-4-5-20251101, o3, gpt-4.1-mini, claude-sonnet-4-6, x-ai/grok-4, magistral-small-2509, qwen/qwen3-vl-8b-thinking, qwen3.5-35b-a3b, qwen/qwen3-vl-8b-instruct, qwen/qwen3-vl-30b-a3b-instruct, gemini-2.0-flash, claude-opus-4-1-20250805, qwen3.5-flash-2026-02-23, gemini-2.5-flash, gpt-5, magistral-medium-2509, gpt-5-mini, gemini-3.1-flash-lite-preview, command-a-vision-07-2025, ministral-8b-2512, gemini-3-flash-preview, gemini-2.5-flash-preview-09-2025, mistral-small-2506, gpt-4.1, gpt-4o-mini, grok-4.20-0309-reasoning, gemini-2.5-pro, mistral-large-2512, claude-opus-4-20250514, gpt-5.3-codex, qwen3.5-27b, qwen3.5-122b-a10b, claude-sonnet-4-20250514, qwen3.5-plus-2026-02-15, gpt-4o, gemini-3.1-pro-preview, gpt-4.1-nano, meta-llama/llama-4-maverick, gemini-2.5-flash-lite, gemini-2.5-flash-lite-preview-09-2025, claude-haiku-4-5-20251001, gpt-5-nano, mistral-medium-2505, gemini-3-pro-preview, gpt-5.2-2025-12-11, mistral-large-2411, gpt-5.4-2026-03-05, claude-opus-4-6, mistral-medium-2508, gemini-2.0-flash-lite, gpt-5.1-2025-11-13, claude-sonnet-4-5-20250929
| Score | Date | Provider | Model |
|---|---|---|---|
| 96.67 | 1 week ago | alibaba | qwen3.5-397b-a17b |
| 85.55 | 1 week ago | alibaba | qwen3.5-flash-2026-02-23 |
| 96.51 | 1 week ago | alibaba | qwen3.5-35b-a3b |
| 97.69 | 1 week ago | alibaba | qwen3.5-122b-a10b |
| 96.96 | 1 week ago | alibaba | qwen3.5-27b |
| Role | Contributors |
|---|---|
| Domain expert | tabea_wullschleger |
| Data curator | tabea_wullschleger |
| Annotator | tabea_wullschleger |
| Analyst | Maximilian Hindermann, tabea_wullschleger |
| Engineer | Maximilian Hindermann |