RISE Humanities Data Benchmark

Bibliographic Data

Evaluates models' ability to extract bibliographic information from historical documents such as publication details, authors, dates, and other metadata from digitized sources.

Top 5 Runs

Score	Date	Provider	Model
71.43	1 year ago	openai	gpt-4o
71.05	2 months ago	openai	gpt-5.1-2025-11-13
70.23	6 months ago	genai	gemini-2.5-flash-preview-09-2025
69.83	2 weeks ago	openai	gpt-5.4-2026-03-05
69.18	1 week ago	alibaba	qwen3.5-plus-2026-02-15

Last 5 Runs

Score	Date	Provider	Model
64.87	1 week ago	alibaba	qwen3.5-35b-a3b
67.87	1 week ago	alibaba	qwen3.5-397b-a17b
67.34	1 week ago	alibaba	qwen3.5-flash-2026-02-23
66.45	1 week ago	alibaba	qwen3.5-27b
65.34	1 week ago	alibaba	qwen3.5-122b-a10b

Options

See Benchmark Description See Test Results Open on Github

Tags

Type(s): book-page
Benchmark task(s): information-extraction
Writing: printed
Source creation (century): 20
Source Layout: list
Language(s): en

Blacklist Cards

Index cards of companies on a British 'black list' in the 1940s. Assesses models' capability to recognize typed and handwritten information from index cards.

Top 5 Runs

Score	Date	Provider	Model
96.87	2 months ago	genai	gemini-2.5-pro
96.76	2 weeks ago	anthropic	claude-sonnet-4-6
96.35	2 months ago	anthropic	claude-opus-4-5-20251101
95.80	2 weeks ago	anthropic	claude-opus-4-6
95.65	5 months ago	openai	gpt-4.1-mini

Last 5 Runs

Score	Date	Provider	Model
94.54	1 week ago	alibaba	qwen3.5-27b
90.89	1 week ago	alibaba	qwen3.5-35b-a3b
62.77	1 week ago	alibaba	qwen3.5-flash-2026-02-23
91.55	1 week ago	alibaba	qwen3.5-122b-a10b
88.34	1 week ago	alibaba	qwen3.5-397b-a17b

Options

See Benchmark Description See Test Results Open on Github

Tags

Type(s): index-card
Benchmark task(s): information-extraction
Writing: typed, handwritten
Source creation (century): 20
Source Layout: n/a
Language(s): de

Book Advert XML files (malformed) from Avisblatt

line_2.json (JSON)

Loading file content...

Top 5 Runs

Score	Date	Provider	Model
98.61	1 week ago	x-ai	grok-4.20-0309-reasoning
97.54	2 months ago	openrouter	x-ai/grok-4
97.47	3 months ago	anthropic	claude-sonnet-4-5-20250929
97.46	2 weeks ago	anthropic	claude-opus-4-6
97.39	2 months ago	anthropic	claude-sonnet-4-5-20250929

Last 5 Runs

Score	Date	Provider	Model
92.93	1 week ago	alibaba	qwen3.5-35b-a3b
96.39	1 week ago	deepseek	deepseek-chat
95.84	1 week ago	alibaba	qwen3.5-27b
96.00	1 week ago	alibaba	qwen3.5-122b-a10b
95.85	1 week ago	deepseek	deepseek-reasoner

Options

See Benchmark Description See Test Results Open on Github

Tags

Type(s): newspaper-page
Benchmark task(s): data-correction
Writing: n/a
Source creation (century): 18
Source Layout: n/a
Language(s): en

Business Letters

Tests models on extracting structured metadata from historical correspondence, including person names, organizations, dates, locations, and other contextual information from 20th century Swiss historical letters.

Top 5 Runs

Score	Date	Provider	Model
81.00	1 month ago	openai	gpt-5
77.00	7 months ago	openai	gpt-5
72.00	2 months ago	genai	gemini-3-flash-preview
72.00	2 weeks ago	genai	gemini-3.1-pro-preview
71.00	1 month ago	openai	gpt-5

Last 5 Runs

Score	Date	Provider	Model
0.00	4 days ago	mistral	ministral-8b-2512
0.00	4 days ago	mistral	ministral-8b-2512
57.00	1 week ago	alibaba	qwen3.5-flash-2026-02-23
55.00	1 week ago	alibaba	qwen3.5-35b-a3b
58.00	1 week ago	alibaba	qwen3.5-397b-a17b

Options

See Benchmark Description See Test Results Open on Github

Tags

Type(s): letter
Benchmark task(s): information-extraction
Writing: typed, handwritten
Source creation (century): 20
Source Layout: prose
Language(s): de

Company Lists

Top 5 Runs

Score	Date	Provider	Model
59.80	2 months ago	openai	gpt-5
58.40	5 months ago	openai	gpt-5
55.47	2 weeks ago	genai	gemini-3.1-pro-preview
55.47	1 week ago	alibaba	qwen3.5-122b-a10b
55.20	5 months ago	openai	o3

Last 5 Runs

Score	Date	Provider	Model
52.40	1 week ago	alibaba	qwen3.5-flash-2026-02-23
51.13	1 week ago	alibaba	qwen3.5-397b-a17b
51.33	1 week ago	alibaba	qwen3.5-35b-a3b
28.27	1 week ago	alibaba	qwen3.5-27b
46.87	1 week ago	alibaba	qwen3.5-122b-a10b

Options

See Benchmark Description See Test Results Open on Github

Tags

Type(s): book-page
Benchmark task(s): information-extraction
Writing: printed
Source creation (century): 20
Source Layout: list
Language(s): en, de

Fraktur Adverts

Assesses models' capability to recognize and transcribe historical German Fraktur script, a Gothic typeface commonly used in German-language documents.

Top 5 Runs

Score	Date	Provider	Model
97.90	2 weeks ago	genai	gemini-3.1-pro-preview
97.30	2 weeks ago	anthropic	claude-sonnet-4-6
95.90	2 months ago	genai	gemini-3-flash-preview
95.90	1 week ago	alibaba	qwen3.5-122b-a10b
95.80	2 weeks ago	anthropic	claude-opus-4-6

Last 5 Runs

Score	Date	Provider	Model
52.90	1 week ago	alibaba	qwen3.5-flash-2026-02-23
76.40	1 week ago	alibaba	qwen3.5-27b
51.30	1 week ago	alibaba	qwen3.5-35b-a3b
66.40	1 week ago	alibaba	qwen3.5-397b-a17b
95.90	1 week ago	alibaba	qwen3.5-122b-a10b

Options

See Benchmark Description See Test Results Open on Github

Tags

Type(s): book-page
Benchmark task(s): transcription
Writing: printed
Source creation (century): 18, 19
Source Layout: prose
Language(s): de

General Meeting Minutes

Extract names, locations, signatures from table-like meeting minutes of Mines de Costano S.A., 1930s - 1960s

Top 5 Runs

Score	Date	Provider	Model
88.64	2 weeks ago	openai	gpt-5.4-2026-03-05
86.20	2 weeks ago	openai	gpt-5.4-2026-03-05
85.59	2 weeks ago	openai	gpt-5.3-codex
84.93	3 weeks ago	openai	gpt-5.2-2025-12-11
84.79	1 week ago	openai	gpt-5.3-codex

Last 5 Runs

Score	Date	Provider	Model
83.63	1 week ago	openai	gpt-5.3-codex
83.31	1 week ago	openai	gpt-5.3-codex
84.79	1 week ago	openai	gpt-5.3-codex
83.29	1 week ago	openai	gpt-5.3-codex
81.99	1 week ago	openai	gpt-5.3-codex

Options

See Benchmark Description See Test Results Open on Github

Tags

Type(s): minutes
Benchmark task(s): information-extraction
Writing: typed, handwritten
Source creation (century): 20
Source Layout: table
Language(s): it, fr, de

Library Cards

A comprehensive benchmark focused on catalog card analysis and information extraction from historical library catalog systems. This benchmark evaluates models on structured data extraction from digitized catalog cards, testing their ability to parse complex bibliographic information, author names, dates, and hierarchical catalog structures from historical Swiss library records.

Top 5 Runs

Score	Date	Provider	Model
89.51	7 months ago	openai	gpt-5
89.39	7 months ago	openai	gpt-4.1
89.36	7 months ago	openai	gpt-4o
89.10	4 months ago	genai	gemini-3-pro-preview
88.46	1 week ago	alibaba	qwen3.5-plus-2026-02-15

Last 5 Runs

Score	Date	Provider	Model
38.61	1 week ago	alibaba	qwen3.5-flash-2026-02-23
86.85	1 week ago	alibaba	qwen3.5-27b
83.80	1 week ago	alibaba	qwen3.5-35b-a3b
85.29	1 week ago	alibaba	qwen3.5-122b-a10b
88.25	1 week ago	alibaba	qwen3.5-397b-a17b

Options

See Benchmark Description See Test Results Open on Github

Tags

Type(s): index-card
Benchmark task(s): information-extraction
Writing: typed, printed, handwritten
Source creation (century): 20
Source Layout: index
Language(s): n/a

Magazine Pages - Anglo Swiss Trade Review

Examines a model's ability to extract bounding boxes of advertisements from magazine pages.

Top 5 Runs

Score	Date	Provider	Model
88.50	1 week ago	openai	gpt-5.2-2025-12-11
86.00	2 weeks ago	openai	gpt-5.3-codex
84.80	1 week ago	genai	gemini-3-flash-preview
80.20	2 weeks ago	contour_local	opencv-contour
78.70	2 weeks ago	openai	gpt-5

Last 5 Runs

Date	Provider	Model
1 week ago	alibaba	qwen3.5-flash-2026-02-23
1 week ago	alibaba	qwen3.5-27b
1 week ago	alibaba	qwen3.5-35b-a3b
1 week ago	alibaba	qwen3.5-122b-a10b
1 week ago	alibaba	qwen3.5-397b-a17b

Options

See Benchmark Description See Test Results Open on Github

Tags

Type(s): newspaper-page
Benchmark task(s): document-understanding
Writing: printed
Source creation (century): 20
Source Layout: prose, columns
Language(s): en

Medieval Manuscripts

Evaluates models on page segmentation and handwritten text extraction from 15th century medieval manuscripts written in late medieval German. Tests the ability to transcribe historical handwriting, identify folio numbers, distinguish main text from marginal additions, and maintain historical spelling and formatting. Performance is measured using fuzzy string matching and Character Error Rate (CER).

Top 5 Runs

Score	Date	Provider	Model
84.90	2 months ago	anthropic	claude-opus-4-5-20251101
80.70	4 months ago	genai	gemini-3-pro-preview
79.80	2 weeks ago	anthropic	claude-opus-4-6
77.90	2 weeks ago	genai	gemini-3.1-flash-lite-preview
77.60	2 months ago	genai	gemini-2.5-flash-preview-09-2025

Last 5 Runs

Score	Date	Provider	Model
0.00	4 days ago	mistral	ministral-14b-2512
69.00	4 days ago	genai	gemini-2.5-flash-lite-preview-09-2025
58.70	4 days ago	genai	gemini-2.5-flash-lite
68.00	1 week ago	alibaba	qwen3.5-flash-2026-02-23
67.40	1 week ago	alibaba	qwen3.5-397b-a17b

Options

See Benchmark Description See Test Results Open on Github

Tags

Type(s): manuscript
Benchmark task(s): transcription
Writing: handwritten
Source creation (century): 15
Source Layout: prose
Language(s): de

Benchmark Results

Bibliographic Data

Blacklist Cards

Book Advert XML files (malformed) from Avisblatt

Business Letters

Company Lists

Fraktur Adverts

General Meeting Minutes

Library Cards

Magazine Pages - Anglo Swiss Trade Review

Medieval Manuscripts