RISE Humanities Data Benchmark, 0.5.0-pre1

Benchmark Results

Filter by benchmark title
 

Bibliographic Data

Evaluates models' ability to extract bibliographic information from historical documents such as publication details, authors, dates, and other metadata from digitized sources.

Image

Top 5 Runs

ScoreDateProviderModel
71.431 year agoopenaigpt-4o
71.052 months agoopenaigpt-5.1-2025-11-13
70.236 months agogenaigemini-2.5-flash-preview-09-2025
69.832 weeks agoopenaigpt-5.4-2026-03-05
69.181 week agoalibabaqwen3.5-plus-2026-02-15

Last 5 Runs

ScoreDateProviderModel
64.871 week agoalibabaqwen3.5-35b-a3b
67.871 week agoalibabaqwen3.5-397b-a17b
67.341 week agoalibabaqwen3.5-flash-2026-02-23
66.451 week agoalibabaqwen3.5-27b
65.341 week agoalibabaqwen3.5-122b-a10b

Tags
  • Type(s): book-page
  • Benchmark task(s):  information-extraction
  • Writing: printed
  • Source creation (century): 20
  • Source Layout: list
  • Language(s): en

Blacklist Cards

Index cards of companies on a British 'black list' in the 1940s. Assesses models' capability to recognize typed and handwritten information from index cards.

Image

Top 5 Runs

ScoreDateProviderModel
96.872 months agogenaigemini-2.5-pro
96.762 weeks agoanthropicclaude-sonnet-4-6
96.352 months agoanthropicclaude-opus-4-5-20251101
95.802 weeks agoanthropicclaude-opus-4-6
95.655 months agoopenaigpt-4.1-mini

Last 5 Runs

ScoreDateProviderModel
94.541 week agoalibabaqwen3.5-27b
90.891 week agoalibabaqwen3.5-35b-a3b
62.771 week agoalibabaqwen3.5-flash-2026-02-23
91.551 week agoalibabaqwen3.5-122b-a10b
88.341 week agoalibabaqwen3.5-397b-a17b

Tags
  • Type(s): index-card
  • Benchmark task(s):  information-extraction
  • Writing: typed, handwritten
  • Source creation (century): 20
  • Source Layout: n/a
  • Language(s): de

Book Advert XML files (malformed) from Avisblatt

line_2.json (JSON)
Loading file content...

Top 5 Runs

ScoreDateProviderModel
98.611 week agox-aigrok-4.20-0309-reasoning
97.542 months agoopenrouterx-ai/grok-4
97.473 months agoanthropicclaude-sonnet-4-5-20250929
97.462 weeks agoanthropicclaude-opus-4-6
97.392 months agoanthropicclaude-sonnet-4-5-20250929

Last 5 Runs

ScoreDateProviderModel
92.931 week agoalibabaqwen3.5-35b-a3b
96.391 week agodeepseekdeepseek-chat
95.841 week agoalibabaqwen3.5-27b
96.001 week agoalibabaqwen3.5-122b-a10b
95.851 week agodeepseekdeepseek-reasoner

Tags
  • Type(s): newspaper-page
  • Benchmark task(s):  data-correction
  • Writing: n/a
  • Source creation (century): 18
  • Source Layout: n/a
  • Language(s): en

Business Letters

Tests models on extracting structured metadata from historical correspondence, including person names, organizations, dates, locations, and other contextual information from 20th century Swiss historical letters.

Image

Top 5 Runs

ScoreDateProviderModel
81.001 month agoopenaigpt-5
77.007 months agoopenaigpt-5
72.002 months agogenaigemini-3-flash-preview
72.002 weeks agogenaigemini-3.1-pro-preview
71.001 month agoopenaigpt-5

Last 5 Runs

ScoreDateProviderModel
0.004 days agomistralministral-8b-2512
0.004 days agomistralministral-8b-2512
57.001 week agoalibabaqwen3.5-flash-2026-02-23
55.001 week agoalibabaqwen3.5-35b-a3b
58.001 week agoalibabaqwen3.5-397b-a17b

Tags
  • Type(s): letter
  • Benchmark task(s):  information-extraction
  • Writing: typed, handwritten
  • Source creation (century): 20
  • Source Layout: prose
  • Language(s): de

Company Lists

Image

Top 5 Runs

ScoreDateProviderModel
59.802 months agoopenaigpt-5
58.405 months agoopenaigpt-5
55.472 weeks agogenaigemini-3.1-pro-preview
55.471 week agoalibabaqwen3.5-122b-a10b
55.205 months agoopenaio3

Last 5 Runs

ScoreDateProviderModel
52.401 week agoalibabaqwen3.5-flash-2026-02-23
51.131 week agoalibabaqwen3.5-397b-a17b
51.331 week agoalibabaqwen3.5-35b-a3b
28.271 week agoalibabaqwen3.5-27b
46.871 week agoalibabaqwen3.5-122b-a10b

Tags
  • Type(s): book-page
  • Benchmark task(s):  information-extraction
  • Writing: printed
  • Source creation (century): 20
  • Source Layout: list
  • Language(s): en, de

Fraktur Adverts

Assesses models' capability to recognize and transcribe historical German Fraktur script, a Gothic typeface commonly used in German-language documents.

Image

Top 5 Runs

ScoreDateProviderModel
97.902 weeks agogenaigemini-3.1-pro-preview
97.302 weeks agoanthropicclaude-sonnet-4-6
95.902 months agogenaigemini-3-flash-preview
95.901 week agoalibabaqwen3.5-122b-a10b
95.802 weeks agoanthropicclaude-opus-4-6

Last 5 Runs

ScoreDateProviderModel
52.901 week agoalibabaqwen3.5-flash-2026-02-23
76.401 week agoalibabaqwen3.5-27b
51.301 week agoalibabaqwen3.5-35b-a3b
66.401 week agoalibabaqwen3.5-397b-a17b
95.901 week agoalibabaqwen3.5-122b-a10b

Tags
  • Type(s): book-page
  • Benchmark task(s):  transcription
  • Writing: printed
  • Source creation (century): 18, 19
  • Source Layout: prose
  • Language(s): de

General Meeting Minutes

Extract names, locations, signatures from table-like meeting minutes of Mines de Costano S.A., 1930s - 1960s

Image

Top 5 Runs

ScoreDateProviderModel
88.642 weeks agoopenaigpt-5.4-2026-03-05
86.202 weeks agoopenaigpt-5.4-2026-03-05
85.592 weeks agoopenaigpt-5.3-codex
84.933 weeks agoopenaigpt-5.2-2025-12-11
84.791 week agoopenaigpt-5.3-codex

Last 5 Runs

ScoreDateProviderModel
83.631 week agoopenaigpt-5.3-codex
83.311 week agoopenaigpt-5.3-codex
84.791 week agoopenaigpt-5.3-codex
83.291 week agoopenaigpt-5.3-codex
81.991 week agoopenaigpt-5.3-codex

Tags
  • Type(s): minutes
  • Benchmark task(s):  information-extraction
  • Writing: typed, handwritten
  • Source creation (century): 20
  • Source Layout: table
  • Language(s): it, fr, de

Library Cards

A comprehensive benchmark focused on catalog card analysis and information extraction from historical library catalog systems. This benchmark evaluates models on structured data extraction from digitized catalog cards, testing their ability to parse complex bibliographic information, author names, dates, and hierarchical catalog structures from historical Swiss library records.

Image

Top 5 Runs

ScoreDateProviderModel
89.517 months agoopenaigpt-5
89.397 months agoopenaigpt-4.1
89.367 months agoopenaigpt-4o
89.104 months agogenaigemini-3-pro-preview
88.461 week agoalibabaqwen3.5-plus-2026-02-15

Last 5 Runs

ScoreDateProviderModel
38.611 week agoalibabaqwen3.5-flash-2026-02-23
86.851 week agoalibabaqwen3.5-27b
83.801 week agoalibabaqwen3.5-35b-a3b
85.291 week agoalibabaqwen3.5-122b-a10b
88.251 week agoalibabaqwen3.5-397b-a17b

Tags
  • Type(s): index-card
  • Benchmark task(s):  information-extraction
  • Writing: typed, printed, handwritten
  • Source creation (century): 20
  • Source Layout: index
  • Language(s): n/a

Magazine Pages - Anglo Swiss Trade Review

Examines a model's ability to extract bounding boxes of advertisements from magazine pages.

Image

Top 5 Runs

ScoreDateProviderModel
88.501 week agoopenaigpt-5.2-2025-12-11
86.002 weeks agoopenaigpt-5.3-codex
84.801 week agogenaigemini-3-flash-preview
80.202 weeks agocontour_localopencv-contour
78.702 weeks agoopenaigpt-5

Last 5 Runs

ScoreDateProviderModel
0.001 week agoalibabaqwen3.5-flash-2026-02-23
0.001 week agoalibabaqwen3.5-27b
0.001 week agoalibabaqwen3.5-35b-a3b
0.001 week agoalibabaqwen3.5-122b-a10b
0.001 week agoalibabaqwen3.5-397b-a17b

Tags
  • Type(s): newspaper-page
  • Benchmark task(s):  document-understanding
  • Writing: printed
  • Source creation (century): 20
  • Source Layout: prose, columns
  • Language(s): en

Medieval Manuscripts

Evaluates models on page segmentation and handwritten text extraction from 15th century medieval manuscripts written in late medieval German. Tests the ability to transcribe historical handwriting, identify folio numbers, distinguish main text from marginal additions, and maintain historical spelling and formatting. Performance is measured using fuzzy string matching and Character Error Rate (CER).

Image

Top 5 Runs

ScoreDateProviderModel
84.902 months agoanthropicclaude-opus-4-5-20251101
80.704 months agogenaigemini-3-pro-preview
79.802 weeks agoanthropicclaude-opus-4-6
77.902 weeks agogenaigemini-3.1-flash-lite-preview
77.602 months agogenaigemini-2.5-flash-preview-09-2025

Last 5 Runs

ScoreDateProviderModel
0.004 days agomistralministral-14b-2512
69.004 days agogenaigemini-2.5-flash-lite-preview-09-2025
58.704 days agogenaigemini-2.5-flash-lite
68.001 week agoalibabaqwen3.5-flash-2026-02-23
67.401 week agoalibabaqwen3.5-397b-a17b

Tags
  • Type(s): manuscript
  • Benchmark task(s):  transcription
  • Writing: handwritten
  • Source creation (century): 15
  • Source Layout: prose
  • Language(s): de