RISE Humanities Data Benchmark

Bibliographic Data

Evaluates models' ability to extract bibliographic information from historical documents such as publication details, authors, dates, and other metadata from digitized sources.

Top 5 Runs

Score	Date	Provider	Model
77.10	1 month ago	mistral	mistral-medium-2508
71.55	2 months ago	openai	gpt-5.5-2026-04-23
71.43	1 year ago	openai	gpt-4o
71.20	5 days ago	anthropic	claude-fable-5
71.05	5 months ago	openai	gpt-5.1-2025-11-13

Last 5 Runs

Score	Date	Provider	Model
71.20	5 days ago	anthropic	claude-fable-5
0.00	6 days ago	anthropic	claude-sonnet-5
23.14	1 week ago	genai	gemini-3.1-flash-lite
41.68	2 weeks ago	scicore	qwen35-397b-a17b-fp8
67.72	4 weeks ago	x-ai	grok-4.3

Options

See Benchmark Description See Test Results Open on Github

Tags

Type(s): book-page
Benchmark task(s): information-extraction
Writing: printed
Source creation (century): 20
Source Layout: list
Language(s): en

Blacklist Cards

Index cards of companies on a British 'black list' in the 1940s. Assesses models' capability to recognize typed and handwritten information from index cards.

Top 5 Runs

Score	Date	Provider	Model
96.87	5 months ago	genai	gemini-2.5-pro
96.76	3 months ago	anthropic	claude-sonnet-4-6
96.35	5 months ago	anthropic	claude-opus-4-5-20251101
95.80	3 months ago	anthropic	claude-opus-4-6
95.65	8 months ago	openai	gpt-4.1-mini

Last 5 Runs

Score	Date	Provider	Model
95.42	5 days ago	anthropic	claude-fable-5
39.50	6 days ago	anthropic	claude-sonnet-5
60.12	1 week ago	genai	gemini-3.1-flash-lite
88.80	2 weeks ago	scicore	qwen35-397b-a17b-fp8
94.70	4 weeks ago	x-ai	grok-4.3

Options

See Benchmark Description See Test Results Open on Github

Tags

Type(s): index-card
Benchmark task(s): information-extraction
Writing: typed, handwritten
Source creation (century): 20
Source Layout: n/a
Language(s): de

Book Advert XML files (malformed) from Avisblatt

line_2.json (JSON)

Loading file content...

Top 5 Runs

Score	Date	Provider	Model
98.61	3 months ago	x-ai	grok-4.20-0309-reasoning
98.60	4 weeks ago	x-ai	grok-4.3
98.48	2 months ago	openai	gpt-5.5-2026-04-23
98.21	2 months ago	openrouter	google/gemma-4-26b-a4b-it
97.90	5 days ago	anthropic	claude-fable-5

Last 5 Runs

Score	Date	Provider	Model
97.90	5 days ago	anthropic	claude-fable-5
96.57	6 days ago	anthropic	claude-sonnet-5
96.36	1 week ago	genai	gemini-3.1-flash-lite
95.78	2 weeks ago	scicore	qwen35-397b-a17b-fp8
98.60	4 weeks ago	x-ai	grok-4.3

Options

See Benchmark Description See Test Results Open on Github

Tags

Type(s): newspaper-page
Benchmark task(s): data-correction
Writing: n/a
Source creation (century): 18
Source Layout: n/a
Language(s): en

Business Letters

Tests models on extracting structured metadata from historical correspondence, including person names, organizations, dates, locations, and other contextual information from 20th century Swiss historical letters.

Top 5 Runs

Score	Date	Provider	Model
87.00	1 month ago	genai	gemini-3.5-flash
81.00	4 months ago	openai	gpt-5
78.00	1 month ago	anthropic	claude-opus-4-8
77.00	10 months ago	openai	gpt-5
74.00	4 weeks ago	x-ai	grok-4.3

Last 5 Runs

Score	Date	Provider	Model
64.00	5 days ago	anthropic	claude-fable-5
54.00	5 days ago	anthropic	claude-fable-5
59.00	5 days ago	anthropic	claude-fable-5
60.00	6 days ago	anthropic	claude-sonnet-5
51.00	6 days ago	anthropic	claude-sonnet-5

Options

See Benchmark Description See Test Results Open on Github

Tags

Type(s): letter
Benchmark task(s): information-extraction
Writing: typed, handwritten
Source creation (century): 20
Source Layout: prose
Language(s): de

Company Lists

Top 5 Runs

Score	Date	Provider	Model
59.80	5 months ago	openai	gpt-5
58.40	8 months ago	openai	gpt-5
56.93	2 months ago	openrouter	qwen/qwen3.5-9b
56.73	1 month ago	mistral	mistral-small-2506
55.80	2 months ago	openrouter	qwen/qwen3.5-plus-02-15

Last 5 Runs

Score	Date	Provider	Model
43.53	5 days ago	anthropic	claude-fable-5
51.47	5 days ago	anthropic	claude-fable-5
50.00	6 days ago	anthropic	claude-sonnet-5
49.87	6 days ago	anthropic	claude-sonnet-5
38.07	1 week ago	genai	gemini-3.1-flash-lite

Options

See Benchmark Description See Test Results Open on Github

Tags

Type(s): book-page
Benchmark task(s): information-extraction
Writing: printed
Source creation (century): 20
Source Layout: list
Language(s): en, de

Fraktur Adverts

Assesses models' capability to recognize and transcribe historical German Fraktur script, a Gothic typeface commonly used in German-language documents.

Top 5 Runs

Score	Date	Provider	Model
97.90	3 months ago	genai	gemini-3.1-pro-preview
97.30	3 months ago	anthropic	claude-sonnet-4-6
96.90	2 weeks ago	scicore	qwen35-397b-a17b-fp8
96.00	2 months ago	openai	gpt-5.5-2026-04-23
95.90	5 months ago	genai	gemini-3-flash-preview

Last 5 Runs

Score	Date	Provider	Model
78.50	5 days ago	anthropic	claude-fable-5
0.00	6 days ago	anthropic	claude-sonnet-5
19.70	1 week ago	genai	gemini-3.1-flash-lite
96.90	2 weeks ago	scicore	qwen35-397b-a17b-fp8
84.50	4 weeks ago	x-ai	grok-4.3

Options

See Benchmark Description See Test Results Open on Github

Tags

Type(s): book-page
Benchmark task(s): transcription
Writing: printed
Source creation (century): 18, 19
Source Layout: prose
Language(s): de

General Meeting Minutes

Extract names, locations, signatures from table-like meeting minutes of Mines de Costanzo S.A., 1930s - 1960s

Top 5 Runs

Score	Date	Provider	Model
92.50	4 days ago	anthropic	claude-fable-5
92.47	4 days ago	genai	gemini-3-flash-preview
90.56	4 days ago	openai	gpt-5.5-2026-04-23
89.30	4 days ago	genai	gemini-3.1-pro-preview
88.64	3 months ago	openai	gpt-5.4-2026-03-05

Last 5 Runs

Score	Date	Provider	Model
80.93	4 days ago	mistral	ministral-14b-2512
47.44	4 days ago	openai	gpt-4.1-nano
85.25	4 days ago	anthropic	claude-opus-4-6
84.09	4 days ago	genai	gemini-2.5-pro
85.96	4 days ago	anthropic	claude-opus-4-7

Options

See Benchmark Description See Test Results Open on Github

Tags

Type(s): minutes
Benchmark task(s): information-extraction
Writing: typed, handwritten
Source creation (century): 20
Source Layout: table
Language(s): it, fr, de

Library Cards

A comprehensive benchmark focused on catalog card analysis and information extraction from historical library catalog systems. This benchmark evaluates models on structured data extraction from digitized catalog cards, testing their ability to parse complex bibliographic information, author names, dates, and hierarchical catalog structures from historical Swiss library records.

Top 5 Runs

Score	Date	Provider	Model
89.51	10 months ago	openai	gpt-5
89.39	10 months ago	openai	gpt-4.1
89.36	10 months ago	openai	gpt-4o
89.17	2 months ago	anthropic	claude-opus-4-7
89.10	7 months ago	genai	gemini-3-pro-preview

Last 5 Runs

Score	Date	Provider	Model
84.62	5 days ago	anthropic	claude-fable-5
4.68	6 days ago	anthropic	claude-sonnet-5
75.76	1 week ago	genai	gemini-3.1-flash-lite
81.01	2 weeks ago	scicore	qwen35-397b-a17b-fp8
84.56	4 weeks ago	x-ai	grok-4.3

Options

See Benchmark Description See Test Results Open on Github

Tags

Type(s): index-card
Benchmark task(s): information-extraction
Writing: typed, printed, handwritten
Source creation (century): 20, 19
Source Layout: index
Language(s): de, fr, en, la, el, fi, sv, pl

Magazine Pages - Anglo Swiss Trade Review

Examines a model's ability to extract bounding boxes of advertisements from magazine pages.

Top 5 Runs

Score	Date	Provider	Model
96.00	5 days ago	anthropic	claude-fable-5
95.60	2 months ago	openai	gpt-5.5-2026-04-23
88.50	3 months ago	openai	gpt-5.2-2025-12-11
86.00	3 months ago	openai	gpt-5.3-codex
84.80	3 months ago	genai	gemini-3-flash-preview

Last 5 Runs

Score	Date	Provider	Model
96.00	5 days ago	anthropic	claude-fable-5
4.80	6 days ago	anthropic	claude-sonnet-5
0.00	1 week ago	genai	gemini-3.1-flash-lite
8.70	2 weeks ago	scicore	qwen35-397b-a17b-fp8
61.80	4 weeks ago	x-ai	grok-4.3

Options

See Benchmark Description See Test Results Open on Github

Tags

Type(s): newspaper-page
Benchmark task(s): document-understanding
Writing: printed
Source creation (century): 20
Source Layout: prose, columns
Language(s): en

Medieval Manuscripts

Evaluates models on page segmentation and handwritten text extraction from 15th century medieval manuscripts written in late medieval German. Tests the ability to transcribe historical handwriting, identify folio numbers, distinguish main text from marginal additions, and maintain historical spelling and formatting. Performance is measured using fuzzy string matching and Character Error Rate (CER).

Top 5 Runs

Score	Date	Provider	Model
87.30	5 days ago	anthropic	claude-fable-5
84.90	5 months ago	anthropic	claude-opus-4-5-20251101
84.60	2 months ago	openrouter	qwen/qwen3.5-9b
83.00	1 month ago	anthropic	claude-opus-4-8
82.70	1 month ago	genai	gemini-3.5-flash

Last 5 Runs

Score	Date	Provider	Model
87.30	5 days ago	anthropic	claude-fable-5
0.00	6 days ago	anthropic	claude-sonnet-5
73.70	1 week ago	genai	gemini-3.1-flash-lite
68.80	2 weeks ago	scicore	qwen35-397b-a17b-fp8
65.60	4 weeks ago	x-ai	grok-4.3

Options

See Benchmark Description See Test Results Open on Github

Tags

Type(s): manuscript
Benchmark task(s): transcription
Writing: handwritten
Source creation (century): 15
Source Layout: prose
Language(s): de

Benchmark Results

Bibliographic Data

Blacklist Cards

Book Advert XML files (malformed) from Avisblatt

Business Letters

Company Lists

Fraktur Adverts

General Meeting Minutes

Library Cards

Magazine Pages - Anglo Swiss Trade Review

Medieval Manuscripts