RISE Humanities Data Benchmark, 0.5.0-pre1

Benchmark Results

Bibliographic Data

Evaluates models' ability to extract bibliographic information from historical documents such as publication details, authors, dates, and other metadata from digitized sources.

Dataset Description Result Overview Test Runs

This benchmark has been run 132 times. It uses fuzzy metric.

Overview

Tested providers: mistral, openai, openrouter, x-ai, alibaba, genai, scicore, anthropic

Tested models: ministral-14b-2512, qwen3.5-397b-a17b, pixtral-large-2411, claude-opus-4-5-20251101, o3, gpt-4.1-mini, claude-sonnet-4-6, x-ai/grok-4, magistral-small-2509, qwen/qwen3-vl-8b-thinking, qwen3.5-35b-a3b, qwen/qwen3-vl-8b-instruct, qwen/qwen3-vl-30b-a3b-instruct, gemini-1.5-pro, pixtral-12b, gemini-2.0-flash, claude-opus-4-1-20250805, qwen3.5-flash-2026-02-23, gemini-2.5-flash, gpt-5, magistral-medium-2509, gpt-5-mini, gemini-3.1-flash-lite-preview, gemini-1.5-flash, ministral-8b-2512, gemini-3-flash-preview, claude-3-opus-20240229, gemini-2.5-flash-preview-09-2025, mistral-small-2506, gpt-4.1, gpt-4o-mini, grok-4.20-0309-reasoning, claude-3-5-sonnet-20241022, gpt-4.5-preview, claude-opus-4-20250514, gemini-2.5-pro, GLM-4.5V-FP8, mistral-large-2512, gpt-5.3-codex, claude-sonnet-4-20250514, qwen3.5-plus-2026-02-15, qwen3.5-27b, claude-3-7-sonnet-20250219, qwen3.5-122b-a10b, gpt-4o, gemini-3.1-pro-preview, gpt-4.1-nano, meta-llama/llama-4-maverick, gemini-2.5-flash-lite, gpt-5-nano, gemini-2.5-flash-lite-preview-09-2025, claude-haiku-4-5-20251001, mistral-medium-2505, gemini-3-pro-preview, mistral-large-2411, gpt-5.2-2025-12-11, gpt-5.4-2026-03-05, mistral-medium-2508, claude-opus-4-6, gemini-2.0-flash-lite, gpt-5.1-2025-11-13, claude-sonnet-4-5-20250929

Last 5 Runs

ScoreDateProviderModel
64.871 week agoalibabaqwen3.5-35b-a3b
67.871 week agoalibabaqwen3.5-397b-a17b
67.341 week agoalibabaqwen3.5-flash-2026-02-23
66.451 week agoalibabaqwen3.5-27b
65.341 week agoalibabaqwen3.5-122b-a10b

All test runs

Contributors

RoleContributors
Domain expertPema Frick
Data curatorPema Frick
AnnotatorSven Burkhardt, Pema Frick
AnalystPema Frick, Sorin Marti
EngineerPema Frick, Sorin Marti

Tags
  • Type(s): book-page
  • Benchmark task(s):  information-extraction
  • Writing: printed
  • Source creation (century): 20
  • Source Layout: list
  • Language(s): en