RISE Humanities Data Benchmark, 0.5.1

Benchmark Results

Library Cards

A comprehensive benchmark focused on catalog card analysis and information extraction from historical library catalog systems. This benchmark evaluates models on structured data extraction from digitized catalog cards, testing their ability to parse complex bibliographic information, author names, dates, and hierarchical catalog structures from historical Swiss library records.

Dataset Description Result Overview Test Runs

This benchmark has been run 130 times. It uses f1_macro metric.

Overview

Tested providers: x-ai, openrouter, genai, openai, anthropic, mistral, alibaba, scicore

Tested models: claude-haiku-4-5-20251001, google/gemma-4-26b-a4b-it, qwen3.5-397b-a17b, qwen/qwen3.5-122b-a10b, gemini-2.0-flash, claude-opus-4-6, mistral-large-2411, qwen3.5-35b-a3b, claude-sonnet-4-5-20250929, mistral-small-2506, claude-3-7-sonnet-20250219, grok-4.20-0309-reasoning, gpt-5-nano, x-ai/grok-4, gemini-2.0-flash-lite, mistral-medium-2505, qwen/qwen3.5-flash-02-23, claude-opus-4-20250514, o3, gemini-2.5-flash-lite, meta-llama/llama-4-maverick, gpt-4o-mini, gemini-2.5-flash-preview-09-2025, ministral-14b-2512, qwen/qwen3-vl-8b-instruct, mistral-medium-2508, gemini-2.5-flash-lite-preview-09-2025, qwen3.5-122b-a10b, pixtral-12b, claude-opus-4-5-20251101, magistral-medium-2509, claude-opus-4-1-20250805, gpt-5.5-2026-04-23, gpt-5.4-2026-03-05, gemini-3.1-pro-preview, gpt-5, gemini-2.5-pro, gpt-5.2-2025-12-11, qwen3.5-27b, qwen/qwen3.5-27b, claude-sonnet-4-20250514, gemini-3.1-flash-lite-preview, qwen/qwen3-vl-8b-thinking, gpt-5-mini, claude-opus-4-7, gpt-4.1, magistral-small-2509, qwen3.5-flash-2026-02-23, gemini-2.5-flash, qwen/qwen3.5-35b-a3b, qwen/qwen3.5-397b-a17b, qwen/qwen3.5-9b, gpt-4.1-mini, qwen/qwen3-vl-30b-a3b-instruct, claude-3-5-sonnet-20241022, qwen3.5-plus-2026-02-15, claude-sonnet-4-6, qwen/qwen3.6-plus, gpt-4.1-nano, gpt-4o, gemini-3-flash-preview, pixtral-large-2411, mistral-large-2512, ministral-8b-2512, gpt-5.3-codex, GLM-4.5V-FP8, google/gemma-4-31b-it, gemini-3-pro-preview, claude-3-opus-20240229, qwen/qwen3.5-plus-02-15, gpt-5.1-2025-11-13

Last 5 Runs

ScoreDateProviderModel
88.593 weeks agoopenaigpt-5.5-2026-04-23
63.984 weeks agoopenrouterqwen/qwen3.5-9b
86.881 month agoopenrouterqwen/qwen3.5-122b-a10b
86.651 month agoopenrouterqwen/qwen3.6-plus
63.721 month agoopenrouterqwen/qwen3.5-9b

All test runs

Contributors

RoleContributors
Domain expertGabriel Müller
Data curatorGabriel Müller
AnnotatorMaximilian Hindermann, Gabriel Müller
AnalystMaximilian Hindermann
EngineerMaximilian Hindermann

Tags
  • Type(s): index-card
  • Benchmark task(s):  information-extraction
  • Writing: typed, printed, handwritten
  • Source creation (century): 20, 19
  • Source Layout: index
  • Language(s): de, fr, en, la, el, fi, sv, pl