Evaluates models' ability to extract bibliographic information from historical documents such as publication details, authors, dates, and other metadata from digitized sources.
Dataset Description Result Overview Test Runs
This benchmark has been run 132 times. It uses fuzzy metric.
Tested providers: mistral, openai, openrouter, x-ai, alibaba, genai, scicore, anthropic
Tested models: ministral-14b-2512, qwen3.5-397b-a17b, pixtral-large-2411, claude-opus-4-5-20251101, o3, gpt-4.1-mini, claude-sonnet-4-6, x-ai/grok-4, magistral-small-2509, qwen/qwen3-vl-8b-thinking, qwen3.5-35b-a3b, qwen/qwen3-vl-8b-instruct, qwen/qwen3-vl-30b-a3b-instruct, gemini-1.5-pro, pixtral-12b, gemini-2.0-flash, claude-opus-4-1-20250805, qwen3.5-flash-2026-02-23, gemini-2.5-flash, gpt-5, magistral-medium-2509, gpt-5-mini, gemini-3.1-flash-lite-preview, gemini-1.5-flash, ministral-8b-2512, gemini-3-flash-preview, claude-3-opus-20240229, gemini-2.5-flash-preview-09-2025, mistral-small-2506, gpt-4.1, gpt-4o-mini, grok-4.20-0309-reasoning, claude-3-5-sonnet-20241022, gpt-4.5-preview, claude-opus-4-20250514, gemini-2.5-pro, GLM-4.5V-FP8, mistral-large-2512, gpt-5.3-codex, claude-sonnet-4-20250514, qwen3.5-plus-2026-02-15, qwen3.5-27b, claude-3-7-sonnet-20250219, qwen3.5-122b-a10b, gpt-4o, gemini-3.1-pro-preview, gpt-4.1-nano, meta-llama/llama-4-maverick, gemini-2.5-flash-lite, gpt-5-nano, gemini-2.5-flash-lite-preview-09-2025, claude-haiku-4-5-20251001, mistral-medium-2505, gemini-3-pro-preview, mistral-large-2411, gpt-5.2-2025-12-11, gpt-5.4-2026-03-05, mistral-medium-2508, claude-opus-4-6, gemini-2.0-flash-lite, gpt-5.1-2025-11-13, claude-sonnet-4-5-20250929
| Score | Date | Provider | Model |
|---|---|---|---|
| 64.87 | 1 week ago | alibaba | qwen3.5-35b-a3b |
| 67.87 | 1 week ago | alibaba | qwen3.5-397b-a17b |
| 67.34 | 1 week ago | alibaba | qwen3.5-flash-2026-02-23 |
| 66.45 | 1 week ago | alibaba | qwen3.5-27b |
| 65.34 | 1 week ago | alibaba | qwen3.5-122b-a10b |
| Role | Contributors |
|---|---|
| Domain expert | Pema Frick |
| Data curator | Pema Frick |
| Annotator | Sven Burkhardt, Pema Frick |
| Analyst | Pema Frick, Sorin Marti |
| Engineer | Pema Frick, Sorin Marti |