Evaluates models on page segmentation and handwritten text extraction from 15th century medieval manuscripts written in late medieval German. Tests the ability to transcribe historical handwriting, identify folio numbers, distinguish main text from marginal additions, and maintain historical spelling and formatting. Performance is measured using fuzzy string matching and Character Error Rate (CER).
Dataset Description Result Overview Test Runs
This benchmark has been run 92 times. It uses cer metric.
Tested providers: mistral, openrouter, x-ai, alibaba, genai, anthropic, scicore, openai
Tested models: ministral-14b-2512, qwen3.5-397b-a17b, pixtral-large-2411, claude-opus-4-5-20251101, o3, gpt-4.1-mini, claude-sonnet-4-6, x-ai/grok-4, magistral-small-2509, qwen/qwen3-vl-8b-thinking, qwen3.5-35b-a3b, qwen/qwen3-vl-8b-instruct, qwen/qwen3-vl-30b-a3b-instruct, pixtral-12b, gemini-2.0-flash, claude-opus-4-1-20250805, qwen3.5-flash-2026-02-23, gemini-2.5-flash, gpt-5, magistral-medium-2509, gpt-5-mini, gemini-3.1-flash-lite-preview, ministral-8b-2512, gemini-3-flash-preview, claude-3-opus-20240229, gemini-2.5-flash-preview-09-2025, mistral-small-2506, gpt-4.1, gpt-4o-mini, grok-4.20-0309-reasoning, claude-3-5-sonnet-20241022, claude-opus-4-20250514, gemini-2.5-pro, mistral-large-2512, GLM-4.5V-FP8, gpt-5.3-codex, qwen3.5-122b-a10b, claude-sonnet-4-20250514, qwen3.5-plus-2026-02-15, qwen3.5-27b, claude-3-7-sonnet-20250219, gpt-4o, gemini-3.1-pro-preview, gpt-4.1-nano, gemini-2.5-flash-lite, gemini-2.5-flash-lite-preview-09-2025, gpt-5-nano, claude-haiku-4-5-20251001, mistral-medium-2505, claude-sonnet-4-5-20250929, gemini-3-pro-preview, mistral-large-2411, gpt-5.2-2025-12-11, gpt-5.4-2026-03-05, mistral-medium-2508, claude-opus-4-6, gemini-2.0-flash-lite, gpt-5.1-2025-11-13, meta-llama/llama-4-maverick
| Score | Date | Provider | Model |
|---|---|---|---|
| 0.00 | 4 days ago | mistral | ministral-14b-2512 |
| 69.00 | 4 days ago | genai | gemini-2.5-flash-lite-preview-09-2025 |
| 58.70 | 4 days ago | genai | gemini-2.5-flash-lite |
| 68.00 | 1 week ago | alibaba | qwen3.5-flash-2026-02-23 |
| 67.40 | 1 week ago | alibaba | qwen3.5-397b-a17b |
| Role | Contributors |
|---|---|
| Domain expert | Ina Serif |
| Data curator | Ina Serif |
| Annotator | Ina Serif |
| Analyst | Maximilian Hindermann |
| Engineer | Maximilian Hindermann, Ina Serif |