RISE Humanities Data Benchmark, 0.5.0-pre1

Benchmarks

This page provides an overview of the available benchmark datasets. Each benchmark includes a detailed description of its inputs, ground truth, and evaluation metrics. You can explore the corresponding test results and access all sources on GitHub.

Card image cap
Bibliographic Data

Extract bibliographic data from a bibliography.

Card image cap
Blacklist Cards

Extract structured company information from historical index cards.

Card image cap
Book Advert XML files

Correct malformed XML data produced by an LLM extraction process.

Card image cap
Business Letters

Extract metadata from correspondence, including signature recognition.

Card image cap
Company Lists

Extract structured company information from various company lists.

Card image cap
Fraktur Adverts

Extract fraktur typeface text.

Card image cap
General Meeting Minutes

Extract voters and votes from business metting minutes.

Card image cap
Library Cards

Extract bibliographic information from index cards.

Card image cap
Magazine Pages

Extract bounding boxes of advertisments from magazine pages.

Card image cap
Medieval Manuscripts

Identify sections that contain handwritten text and extract it.

Card image cap
Personnel Cards

Extract and interpret structured salary information.