RISE Humanities Data Benchmark, 0.5.1

Benchmark Results

Filter by benchmark title
 

Bibliographic Data

Evaluates models' ability to extract bibliographic information from historical documents such as publication details, authors, dates, and other metadata from digitized sources.

Image

Top 5 Runs

ScoreDateProviderModel
71.553 weeks agoopenaigpt-5.5-2026-04-23
71.431 year agoopenaigpt-4o
71.053 months agoopenaigpt-5.1-2025-11-13
70.237 months agogenaigemini-2.5-flash-preview-09-2025
69.891 month agoanthropicclaude-opus-4-7

Last 5 Runs

ScoreDateProviderModel
71.553 weeks agoopenaigpt-5.5-2026-04-23
0.004 weeks agoopenrouterqwen/qwen3.5-9b
41.711 month agoopenroutergoogle/gemma-4-26b-a4b-it
46.101 month agoopenrouterqwen/qwen3.5-9b
61.351 month agoopenroutergoogle/gemma-4-31b-it

Tags
  • Type(s): book-page
  • Benchmark task(s):  information-extraction
  • Writing: printed
  • Source creation (century): 20
  • Source Layout: list
  • Language(s): en

Blacklist Cards

Index cards of companies on a British 'black list' in the 1940s. Assesses models' capability to recognize typed and handwritten information from index cards.

Image

Top 5 Runs

ScoreDateProviderModel
96.873 months agogenaigemini-2.5-pro
96.762 months agoanthropicclaude-sonnet-4-6
96.353 months agoanthropicclaude-opus-4-5-20251101
95.802 months agoanthropicclaude-opus-4-6
95.656 months agoopenaigpt-4.1-mini

Last 5 Runs

ScoreDateProviderModel
85.963 weeks agoopenaigpt-5.5-2026-04-23
85.304 weeks agoopenrouterqwen/qwen3.5-9b
88.641 month agoopenrouterqwen/qwen3.5-plus-02-15
91.751 month agoopenroutergoogle/gemma-4-26b-a4b-it
92.481 month agoopenrouterqwen/qwen3.6-plus

Tags
  • Type(s): index-card
  • Benchmark task(s):  information-extraction
  • Writing: typed, handwritten
  • Source creation (century): 20
  • Source Layout: n/a
  • Language(s): de

Book Advert XML files (malformed) from Avisblatt

line_2.json (JSON)
Loading file content...

Top 5 Runs

ScoreDateProviderModel
98.611 month agox-aigrok-4.20-0309-reasoning
98.483 weeks agoopenaigpt-5.5-2026-04-23
98.211 month agoopenroutergoogle/gemma-4-26b-a4b-it
97.621 month agoanthropicclaude-opus-4-7
97.543 months agoopenrouterx-ai/grok-4

Last 5 Runs

ScoreDateProviderModel
98.483 weeks agoopenaigpt-5.5-2026-04-23
97.423 weeks agodeepseekdeepseek-v4-pro
97.013 weeks agodeepseekdeepseek-v4-flash
18.804 weeks agoopenrouterqwen/qwen3.5-9b
92.011 month agoopenrouterqwen/qwen3.5-122b-a10b

Tags
  • Type(s): newspaper-page
  • Benchmark task(s):  data-correction
  • Writing: n/a
  • Source creation (century): 18
  • Source Layout: n/a
  • Language(s): en

Business Letters

Tests models on extracting structured metadata from historical correspondence, including person names, organizations, dates, locations, and other contextual information from 20th century Swiss historical letters.

Image

Top 5 Runs

ScoreDateProviderModel
81.003 months agoopenaigpt-5
77.009 months agoopenaigpt-5
72.003 months agogenaigemini-3-flash-preview
72.002 months agogenaigemini-3.1-pro-preview
71.003 months agoopenaigpt-5

Last 5 Runs

ScoreDateProviderModel
61.003 weeks agoopenaigpt-5.5-2026-04-23
59.003 weeks agoopenaigpt-5.5-2026-04-23
54.003 weeks agoopenaigpt-5.5-2026-04-23
55.004 weeks agoopenrouterqwen/qwen3.5-9b
51.004 weeks agoopenrouterqwen/qwen3.5-9b

Tags
  • Type(s): letter
  • Benchmark task(s):  information-extraction
  • Writing: typed, handwritten
  • Source creation (century): 20
  • Source Layout: prose
  • Language(s): de

Company Lists

Image

Top 5 Runs

ScoreDateProviderModel
59.803 months agoopenaigpt-5
58.406 months agoopenaigpt-5
56.931 month agoopenrouterqwen/qwen3.5-9b
55.801 month agoopenrouterqwen/qwen3.5-plus-02-15
55.472 months agogenaigemini-3.1-pro-preview

Last 5 Runs

ScoreDateProviderModel
46.673 weeks agoopenaigpt-5.5-2026-04-23
52.003 weeks agoopenaigpt-5.5-2026-04-23
45.804 weeks agoopenrouterqwen/qwen3.5-9b
28.734 weeks agoopenrouterqwen/qwen3.5-9b
44.931 month agoopenrouterqwen/qwen3.5-397b-a17b

Tags
  • Type(s): book-page
  • Benchmark task(s):  information-extraction
  • Writing: printed
  • Source creation (century): 20
  • Source Layout: list
  • Language(s): en, de

Fraktur Adverts

Assesses models' capability to recognize and transcribe historical German Fraktur script, a Gothic typeface commonly used in German-language documents.

Image

Top 5 Runs

ScoreDateProviderModel
97.902 months agogenaigemini-3.1-pro-preview
97.302 months agoanthropicclaude-sonnet-4-6
96.003 weeks agoopenaigpt-5.5-2026-04-23
95.903 months agogenaigemini-3-flash-preview
95.901 month agoalibabaqwen3.5-122b-a10b

Last 5 Runs

ScoreDateProviderModel
96.003 weeks agoopenaigpt-5.5-2026-04-23
48.604 weeks agoopenrouterqwen/qwen3.5-9b
28.701 month agoopenroutergoogle/gemma-4-26b-a4b-it
77.401 month agoopenrouterqwen/qwen3.5-plus-02-15
51.201 month agoopenroutergoogle/gemma-4-31b-it

Tags
  • Type(s): book-page
  • Benchmark task(s):  transcription
  • Writing: printed
  • Source creation (century): 18, 19
  • Source Layout: prose
  • Language(s): de

General Meeting Minutes

Extract names, locations, signatures from table-like meeting minutes of Mines de Costano S.A., 1930s - 1960s

Image

Top 5 Runs

ScoreDateProviderModel
88.642 months agoopenaigpt-5.4-2026-03-05
86.202 months agoopenaigpt-5.4-2026-03-05
85.592 months agoopenaigpt-5.3-codex
84.932 months agoopenaigpt-5.2-2025-12-11
84.791 month agoopenaigpt-5.3-codex

Last 5 Runs

ScoreDateProviderModel
83.631 month agoopenaigpt-5.3-codex
83.311 month agoopenaigpt-5.3-codex
84.791 month agoopenaigpt-5.3-codex
83.291 month agoopenaigpt-5.3-codex
81.991 month agoopenaigpt-5.3-codex

Tags
  • Type(s): minutes
  • Benchmark task(s):  information-extraction
  • Writing: typed, handwritten
  • Source creation (century): 20
  • Source Layout: table
  • Language(s): it, fr, de

Library Cards

A comprehensive benchmark focused on catalog card analysis and information extraction from historical library catalog systems. This benchmark evaluates models on structured data extraction from digitized catalog cards, testing their ability to parse complex bibliographic information, author names, dates, and hierarchical catalog structures from historical Swiss library records.

Image

Top 5 Runs

ScoreDateProviderModel
89.518 months agoopenaigpt-5
89.398 months agoopenaigpt-4.1
89.368 months agoopenaigpt-4o
89.171 month agoanthropicclaude-opus-4-7
89.105 months agogenaigemini-3-pro-preview

Last 5 Runs

ScoreDateProviderModel
88.593 weeks agoopenaigpt-5.5-2026-04-23
63.984 weeks agoopenrouterqwen/qwen3.5-9b
86.881 month agoopenrouterqwen/qwen3.5-122b-a10b
86.651 month agoopenrouterqwen/qwen3.6-plus
63.721 month agoopenrouterqwen/qwen3.5-9b

Tags
  • Type(s): index-card
  • Benchmark task(s):  information-extraction
  • Writing: typed, printed, handwritten
  • Source creation (century): 20, 19
  • Source Layout: index
  • Language(s): de, fr, en, la, el, fi, sv, pl

Magazine Pages - Anglo Swiss Trade Review

Examines a model's ability to extract bounding boxes of advertisements from magazine pages.

Image

Top 5 Runs

ScoreDateProviderModel
95.603 weeks agoopenaigpt-5.5-2026-04-23
88.501 month agoopenaigpt-5.2-2025-12-11
86.002 months agoopenaigpt-5.3-codex
84.801 month agogenaigemini-3-flash-preview
80.202 months agocontour_localopencv-contour

Last 5 Runs

ScoreDateProviderModel
95.603 weeks agoopenaigpt-5.5-2026-04-23
40.501 month agoanthropicclaude-opus-4-7
0.001 month agoopenrouterqwen/qwen3.5-27b
4.301 month agoopenrouterqwen/qwen3.5-397b-a17b
5.401 month agoopenrouterqwen/qwen3.5-9b

Tags
  • Type(s): newspaper-page
  • Benchmark task(s):  document-understanding
  • Writing: printed
  • Source creation (century): 20
  • Source Layout: prose, columns
  • Language(s): en

Medieval Manuscripts

Evaluates models on page segmentation and handwritten text extraction from 15th century medieval manuscripts written in late medieval German. Tests the ability to transcribe historical handwriting, identify folio numbers, distinguish main text from marginal additions, and maintain historical spelling and formatting. Performance is measured using fuzzy string matching and Character Error Rate (CER).

Image

Top 5 Runs

ScoreDateProviderModel
84.903 months agoanthropicclaude-opus-4-5-20251101
84.601 month agoopenrouterqwen/qwen3.5-9b
80.705 months agogenaigemini-3-pro-preview
79.802 months agoanthropicclaude-opus-4-6
77.902 months agogenaigemini-3.1-flash-lite-preview

Last 5 Runs

ScoreDateProviderModel
71.103 weeks agoopenaigpt-5.5-2026-04-23
62.304 weeks agoopenrouterqwen/qwen3.5-9b
71.701 month agoopenrouterqwen/qwen3.5-397b-a17b
73.901 month agoopenroutergoogle/gemma-4-31b-it
75.401 month agoopenrouterqwen/qwen3.5-122b-a10b

Tags
  • Type(s): manuscript
  • Benchmark task(s):  transcription
  • Writing: handwritten
  • Source creation (century): 15
  • Source Layout: prose
  • Language(s): de