Evaluates models' ability to extract bibliographic information from historical documents such as publication details, authors, dates, and other metadata from digitized sources.

| Score | Date | Provider | Model |
|---|---|---|---|
| 71.43 | 1 year ago | openai | gpt-4o |
| 71.05 | 2 months ago | openai | gpt-5.1-2025-11-13 |
| 70.23 | 6 months ago | genai | gemini-2.5-flash-preview-09-2025 |
| 69.83 | 2 weeks ago | openai | gpt-5.4-2026-03-05 |
| 69.18 | 1 week ago | alibaba | qwen3.5-plus-2026-02-15 |
| Score | Date | Provider | Model |
|---|---|---|---|
| 64.87 | 1 week ago | alibaba | qwen3.5-35b-a3b |
| 67.87 | 1 week ago | alibaba | qwen3.5-397b-a17b |
| 67.34 | 1 week ago | alibaba | qwen3.5-flash-2026-02-23 |
| 66.45 | 1 week ago | alibaba | qwen3.5-27b |
| 65.34 | 1 week ago | alibaba | qwen3.5-122b-a10b |
Index cards of companies on a British 'black list' in the 1940s. Assesses models' capability to recognize typed and handwritten information from index cards.

| Score | Date | Provider | Model |
|---|---|---|---|
| 96.87 | 2 months ago | genai | gemini-2.5-pro |
| 96.76 | 2 weeks ago | anthropic | claude-sonnet-4-6 |
| 96.35 | 2 months ago | anthropic | claude-opus-4-5-20251101 |
| 95.80 | 2 weeks ago | anthropic | claude-opus-4-6 |
| 95.65 | 5 months ago | openai | gpt-4.1-mini |
| Score | Date | Provider | Model |
|---|---|---|---|
| 94.54 | 1 week ago | alibaba | qwen3.5-27b |
| 90.89 | 1 week ago | alibaba | qwen3.5-35b-a3b |
| 62.77 | 1 week ago | alibaba | qwen3.5-flash-2026-02-23 |
| 91.55 | 1 week ago | alibaba | qwen3.5-122b-a10b |
| 88.34 | 1 week ago | alibaba | qwen3.5-397b-a17b |
| Score | Date | Provider | Model |
|---|---|---|---|
| 98.61 | 1 week ago | x-ai | grok-4.20-0309-reasoning |
| 97.54 | 2 months ago | openrouter | x-ai/grok-4 |
| 97.47 | 3 months ago | anthropic | claude-sonnet-4-5-20250929 |
| 97.46 | 2 weeks ago | anthropic | claude-opus-4-6 |
| 97.39 | 2 months ago | anthropic | claude-sonnet-4-5-20250929 |
| Score | Date | Provider | Model |
|---|---|---|---|
| 92.93 | 1 week ago | alibaba | qwen3.5-35b-a3b |
| 96.39 | 1 week ago | deepseek | deepseek-chat |
| 95.84 | 1 week ago | alibaba | qwen3.5-27b |
| 96.00 | 1 week ago | alibaba | qwen3.5-122b-a10b |
| 95.85 | 1 week ago | deepseek | deepseek-reasoner |
Tests models on extracting structured metadata from historical correspondence, including person names, organizations, dates, locations, and other contextual information from 20th century Swiss historical letters.

| Score | Date | Provider | Model |
|---|---|---|---|
| 81.00 | 1 month ago | openai | gpt-5 |
| 77.00 | 7 months ago | openai | gpt-5 |
| 72.00 | 2 months ago | genai | gemini-3-flash-preview |
| 72.00 | 2 weeks ago | genai | gemini-3.1-pro-preview |
| 71.00 | 1 month ago | openai | gpt-5 |
| Score | Date | Provider | Model |
|---|---|---|---|
| 0.00 | 4 days ago | mistral | ministral-8b-2512 |
| 0.00 | 4 days ago | mistral | ministral-8b-2512 |
| 57.00 | 1 week ago | alibaba | qwen3.5-flash-2026-02-23 |
| 55.00 | 1 week ago | alibaba | qwen3.5-35b-a3b |
| 58.00 | 1 week ago | alibaba | qwen3.5-397b-a17b |

| Score | Date | Provider | Model |
|---|---|---|---|
| 59.80 | 2 months ago | openai | gpt-5 |
| 58.40 | 5 months ago | openai | gpt-5 |
| 55.47 | 2 weeks ago | genai | gemini-3.1-pro-preview |
| 55.47 | 1 week ago | alibaba | qwen3.5-122b-a10b |
| 55.20 | 5 months ago | openai | o3 |
| Score | Date | Provider | Model |
|---|---|---|---|
| 52.40 | 1 week ago | alibaba | qwen3.5-flash-2026-02-23 |
| 51.13 | 1 week ago | alibaba | qwen3.5-397b-a17b |
| 51.33 | 1 week ago | alibaba | qwen3.5-35b-a3b |
| 28.27 | 1 week ago | alibaba | qwen3.5-27b |
| 46.87 | 1 week ago | alibaba | qwen3.5-122b-a10b |
Assesses models' capability to recognize and transcribe historical German Fraktur script, a Gothic typeface commonly used in German-language documents.

| Score | Date | Provider | Model |
|---|---|---|---|
| 97.90 | 2 weeks ago | genai | gemini-3.1-pro-preview |
| 97.30 | 2 weeks ago | anthropic | claude-sonnet-4-6 |
| 95.90 | 2 months ago | genai | gemini-3-flash-preview |
| 95.90 | 1 week ago | alibaba | qwen3.5-122b-a10b |
| 95.80 | 2 weeks ago | anthropic | claude-opus-4-6 |
| Score | Date | Provider | Model |
|---|---|---|---|
| 52.90 | 1 week ago | alibaba | qwen3.5-flash-2026-02-23 |
| 76.40 | 1 week ago | alibaba | qwen3.5-27b |
| 51.30 | 1 week ago | alibaba | qwen3.5-35b-a3b |
| 66.40 | 1 week ago | alibaba | qwen3.5-397b-a17b |
| 95.90 | 1 week ago | alibaba | qwen3.5-122b-a10b |
Extract names, locations, signatures from table-like meeting minutes of Mines de Costano S.A., 1930s - 1960s

| Score | Date | Provider | Model |
|---|---|---|---|
| 88.64 | 2 weeks ago | openai | gpt-5.4-2026-03-05 |
| 86.20 | 2 weeks ago | openai | gpt-5.4-2026-03-05 |
| 85.59 | 2 weeks ago | openai | gpt-5.3-codex |
| 84.93 | 3 weeks ago | openai | gpt-5.2-2025-12-11 |
| 84.79 | 1 week ago | openai | gpt-5.3-codex |
| Score | Date | Provider | Model |
|---|---|---|---|
| 83.63 | 1 week ago | openai | gpt-5.3-codex |
| 83.31 | 1 week ago | openai | gpt-5.3-codex |
| 84.79 | 1 week ago | openai | gpt-5.3-codex |
| 83.29 | 1 week ago | openai | gpt-5.3-codex |
| 81.99 | 1 week ago | openai | gpt-5.3-codex |
A comprehensive benchmark focused on catalog card analysis and information extraction from historical library catalog systems. This benchmark evaluates models on structured data extraction from digitized catalog cards, testing their ability to parse complex bibliographic information, author names, dates, and hierarchical catalog structures from historical Swiss library records.

| Score | Date | Provider | Model |
|---|---|---|---|
| 89.51 | 7 months ago | openai | gpt-5 |
| 89.39 | 7 months ago | openai | gpt-4.1 |
| 89.36 | 7 months ago | openai | gpt-4o |
| 89.10 | 4 months ago | genai | gemini-3-pro-preview |
| 88.46 | 1 week ago | alibaba | qwen3.5-plus-2026-02-15 |
| Score | Date | Provider | Model |
|---|---|---|---|
| 38.61 | 1 week ago | alibaba | qwen3.5-flash-2026-02-23 |
| 86.85 | 1 week ago | alibaba | qwen3.5-27b |
| 83.80 | 1 week ago | alibaba | qwen3.5-35b-a3b |
| 85.29 | 1 week ago | alibaba | qwen3.5-122b-a10b |
| 88.25 | 1 week ago | alibaba | qwen3.5-397b-a17b |
Examines a model's ability to extract bounding boxes of advertisements from magazine pages.

| Score | Date | Provider | Model |
|---|---|---|---|
| 88.50 | 1 week ago | openai | gpt-5.2-2025-12-11 |
| 86.00 | 2 weeks ago | openai | gpt-5.3-codex |
| 84.80 | 1 week ago | genai | gemini-3-flash-preview |
| 80.20 | 2 weeks ago | contour_local | opencv-contour |
| 78.70 | 2 weeks ago | openai | gpt-5 |
| Score | Date | Provider | Model |
|---|---|---|---|
| 0.00 | 1 week ago | alibaba | qwen3.5-flash-2026-02-23 |
| 0.00 | 1 week ago | alibaba | qwen3.5-27b |
| 0.00 | 1 week ago | alibaba | qwen3.5-35b-a3b |
| 0.00 | 1 week ago | alibaba | qwen3.5-122b-a10b |
| 0.00 | 1 week ago | alibaba | qwen3.5-397b-a17b |
Evaluates models on page segmentation and handwritten text extraction from 15th century medieval manuscripts written in late medieval German. Tests the ability to transcribe historical handwriting, identify folio numbers, distinguish main text from marginal additions, and maintain historical spelling and formatting. Performance is measured using fuzzy string matching and Character Error Rate (CER).

| Score | Date | Provider | Model |
|---|---|---|---|
| 84.90 | 2 months ago | anthropic | claude-opus-4-5-20251101 |
| 80.70 | 4 months ago | genai | gemini-3-pro-preview |
| 79.80 | 2 weeks ago | anthropic | claude-opus-4-6 |
| 77.90 | 2 weeks ago | genai | gemini-3.1-flash-lite-preview |
| 77.60 | 2 months ago | genai | gemini-2.5-flash-preview-09-2025 |
| Score | Date | Provider | Model |
|---|---|---|---|
| 0.00 | 4 days ago | mistral | ministral-14b-2512 |
| 69.00 | 4 days ago | genai | gemini-2.5-flash-lite-preview-09-2025 |
| 58.70 | 4 days ago | genai | gemini-2.5-flash-lite |
| 68.00 | 1 week ago | alibaba | qwen3.5-flash-2026-02-23 |
| 67.40 | 1 week ago | alibaba | qwen3.5-397b-a17b |