RISE Humanities Data Benchmark, 0.5.1

Benchmark Results

Blacklist Cards

Index cards of companies on a British 'black list' in the 1940s. Assesses models' capability to recognize typed and handwritten information from index cards.

Dataset Description Result Overview Test Runs

This benchmark has been run 106 times. It uses fuzzy metric.

Overview

Tested providers: x-ai, openrouter, genai, openai, anthropic, mistral, alibaba, scicore

Tested models: claude-haiku-4-5-20251001, google/gemma-4-26b-a4b-it, qwen3.5-397b-a17b, qwen/qwen3.5-122b-a10b, gemini-2.0-flash, claude-opus-4-6, mistral-large-2411, qwen3.5-35b-a3b, claude-sonnet-4-5-20250929, mistral-small-2506, claude-3-7-sonnet-20250219, grok-4.20-0309-reasoning, gpt-5-nano, gemini-2.0-flash-lite, mistral-medium-2505, qwen/qwen3.5-flash-02-23, claude-opus-4-20250514, gemini-2.5-flash-lite, o3, meta-llama/llama-4-maverick, gemini-2.5-flash-preview-09-2025, qwen/qwen3-vl-8b-instruct, gpt-4o-mini, ministral-14b-2512, mistral-medium-2508, gemini-2.5-flash-lite-preview-09-2025, qwen3.5-122b-a10b, pixtral-12b, claude-opus-4-5-20251101, magistral-medium-2509, claude-opus-4-1-20250805, gpt-5.5-2026-04-23, gpt-5.4-2026-03-05, gemini-3.1-pro-preview, gpt-5, gemini-2.5-pro, gpt-5.2-2025-12-11, qwen3.5-27b, qwen/qwen3.5-27b, claude-sonnet-4-20250514, gemini-3.1-flash-lite-preview, qwen/qwen3-vl-8b-thinking, gpt-5-mini, claude-opus-4-7, gpt-4.1, magistral-small-2509, qwen3.5-flash-2026-02-23, gemini-2.5-flash, qwen/qwen3.5-35b-a3b, qwen/qwen3.5-397b-a17b, qwen/qwen3.5-9b, gpt-4.1-mini, qwen/qwen3-vl-30b-a3b-instruct, claude-3-5-sonnet-20241022, qwen3.5-plus-2026-02-15, claude-sonnet-4-6, qwen/qwen3.6-plus, gpt-4.1-nano, gpt-4o, gpt-5.1-2025-11-13, gemini-3-flash-preview, pixtral-large-2411, mistral-large-2512, ministral-8b-2512, gpt-5.3-codex, GLM-4.5V-FP8, google/gemma-4-31b-it, gemini-3-pro-preview, claude-3-opus-20240229, qwen/qwen3.5-plus-02-15, x-ai/grok-4

Last 5 Runs

ScoreDateProviderModel
85.963 weeks agoopenaigpt-5.5-2026-04-23
85.304 weeks agoopenrouterqwen/qwen3.5-9b
88.641 month agoopenrouterqwen/qwen3.5-plus-02-15
91.751 month agoopenroutergoogle/gemma-4-26b-a4b-it
92.481 month agoopenrouterqwen/qwen3.6-plus

All test runs

Contributors

RoleContributors
Domain expertLea Kasper
Data curatorSorin Marti
AnnotatorLea Kasper, Sorin Marti
AnalystLea Kasper, Sorin Marti
EngineerSorin Marti

Tags
  • Type(s): index-card
  • Benchmark task(s):  information-extraction
  • Writing: typed, handwritten
  • Source creation (century): 20
  • Source Layout: n/a
  • Language(s): de