RISE Humanities Data Benchmark, 0.5.1

Benchmark Results

Bibliographic Data

Evaluates models' ability to extract bibliographic information from historical documents such as publication details, authors, dates, and other metadata from digitized sources.

Dataset Description Result Overview Test Runs

This benchmark has been run 145 times. It uses fuzzy metric.

Overview

Tested providers: x-ai, openrouter, genai, openai, anthropic, mistral, alibaba, scicore

Tested models: claude-haiku-4-5-20251001, google/gemma-4-26b-a4b-it, qwen3.5-397b-a17b, qwen/qwen3.5-122b-a10b, gemini-2.0-flash, claude-opus-4-6, mistral-large-2411, qwen3.5-35b-a3b, claude-sonnet-4-5-20250929, mistral-small-2506, claude-3-7-sonnet-20250219, grok-4.20-0309-reasoning, gpt-5-nano, x-ai/grok-4, gemini-2.0-flash-lite, mistral-medium-2505, qwen/qwen3.5-flash-02-23, claude-opus-4-20250514, o3, gemini-2.5-flash-lite, meta-llama/llama-4-maverick, gpt-4o-mini, gemini-2.5-flash-preview-09-2025, ministral-14b-2512, qwen/qwen3-vl-8b-instruct, mistral-medium-2508, gemini-2.5-flash-lite-preview-09-2025, gemini-1.5-flash, pixtral-12b, claude-opus-4-5-20251101, qwen3.5-122b-a10b, magistral-medium-2509, claude-opus-4-1-20250805, gpt-5.5-2026-04-23, gpt-5.4-2026-03-05, gemini-3.1-pro-preview, gpt-5, gemini-2.5-pro, gpt-5.2-2025-12-11, qwen3.5-27b, qwen/qwen3.5-27b, claude-sonnet-4-20250514, gemini-3.1-flash-lite-preview, qwen/qwen3-vl-8b-thinking, gpt-5-mini, gpt-4.5-preview, gpt-4.1, magistral-small-2509, qwen3.5-flash-2026-02-23, gemini-2.5-flash, claude-opus-4-7, qwen/qwen3.5-35b-a3b, qwen/qwen3.5-397b-a17b, qwen/qwen3.5-9b, gpt-4.1-mini, qwen/qwen3-vl-30b-a3b-instruct, claude-3-5-sonnet-20241022, qwen3.5-plus-2026-02-15, claude-sonnet-4-6, qwen/qwen3.6-plus, gpt-4.1-nano, gpt-4o, gemini-3-flash-preview, pixtral-large-2411, mistral-large-2512, gemini-1.5-pro, ministral-8b-2512, gpt-5.3-codex, GLM-4.5V-FP8, google/gemma-4-31b-it, gemini-3-pro-preview, claude-3-opus-20240229, qwen/qwen3.5-plus-02-15, gpt-5.1-2025-11-13

Last 5 Runs

ScoreDateProviderModel
71.553 weeks agoopenaigpt-5.5-2026-04-23
0.004 weeks agoopenrouterqwen/qwen3.5-9b
41.711 month agoopenroutergoogle/gemma-4-26b-a4b-it
46.101 month agoopenrouterqwen/qwen3.5-9b
61.351 month agoopenroutergoogle/gemma-4-31b-it

All test runs

Contributors

RoleContributors
Domain expertPema Frick
Data curatorPema Frick
AnnotatorSven Burkhardt, Pema Frick
AnalystPema Frick, Sorin Marti
EngineerPema Frick, Sorin Marti

Tags
  • Type(s): book-page
  • Benchmark task(s):  information-extraction
  • Writing: printed
  • Source creation (century): 20
  • Source Layout: list
  • Language(s): en