Evaluates models' ability to extract bibliographic information from historical documents such as publication details, authors, dates, and other metadata from digitized sources.
Dataset Description Result Overview Test Runs
This benchmark has been run 145 times. It uses fuzzy metric.
Tested providers: x-ai, openrouter, genai, openai, anthropic, mistral, alibaba, scicore
Tested models: claude-haiku-4-5-20251001, google/gemma-4-26b-a4b-it, qwen3.5-397b-a17b, qwen/qwen3.5-122b-a10b, gemini-2.0-flash, claude-opus-4-6, mistral-large-2411, qwen3.5-35b-a3b, claude-sonnet-4-5-20250929, mistral-small-2506, claude-3-7-sonnet-20250219, grok-4.20-0309-reasoning, gpt-5-nano, x-ai/grok-4, gemini-2.0-flash-lite, mistral-medium-2505, qwen/qwen3.5-flash-02-23, claude-opus-4-20250514, o3, gemini-2.5-flash-lite, meta-llama/llama-4-maverick, gpt-4o-mini, gemini-2.5-flash-preview-09-2025, ministral-14b-2512, qwen/qwen3-vl-8b-instruct, mistral-medium-2508, gemini-2.5-flash-lite-preview-09-2025, gemini-1.5-flash, pixtral-12b, claude-opus-4-5-20251101, qwen3.5-122b-a10b, magistral-medium-2509, claude-opus-4-1-20250805, gpt-5.5-2026-04-23, gpt-5.4-2026-03-05, gemini-3.1-pro-preview, gpt-5, gemini-2.5-pro, gpt-5.2-2025-12-11, qwen3.5-27b, qwen/qwen3.5-27b, claude-sonnet-4-20250514, gemini-3.1-flash-lite-preview, qwen/qwen3-vl-8b-thinking, gpt-5-mini, gpt-4.5-preview, gpt-4.1, magistral-small-2509, qwen3.5-flash-2026-02-23, gemini-2.5-flash, claude-opus-4-7, qwen/qwen3.5-35b-a3b, qwen/qwen3.5-397b-a17b, qwen/qwen3.5-9b, gpt-4.1-mini, qwen/qwen3-vl-30b-a3b-instruct, claude-3-5-sonnet-20241022, qwen3.5-plus-2026-02-15, claude-sonnet-4-6, qwen/qwen3.6-plus, gpt-4.1-nano, gpt-4o, gemini-3-flash-preview, pixtral-large-2411, mistral-large-2512, gemini-1.5-pro, ministral-8b-2512, gpt-5.3-codex, GLM-4.5V-FP8, google/gemma-4-31b-it, gemini-3-pro-preview, claude-3-opus-20240229, qwen/qwen3.5-plus-02-15, gpt-5.1-2025-11-13
| Score | Date | Provider | Model |
|---|---|---|---|
| 71.55 | 3 weeks ago | openai | gpt-5.5-2026-04-23 |
| 0.00 | 4 weeks ago | openrouter | qwen/qwen3.5-9b |
| 41.71 | 1 month ago | openrouter | google/gemma-4-26b-a4b-it |
| 46.10 | 1 month ago | openrouter | qwen/qwen3.5-9b |
| 61.35 | 1 month ago | openrouter | google/gemma-4-31b-it |
| Role | Contributors |
|---|---|
| Domain expert | Pema Frick |
| Data curator | Pema Frick |
| Annotator | Sven Burkhardt, Pema Frick |
| Analyst | Pema Frick, Sorin Marti |
| Engineer | Pema Frick, Sorin Marti |