RISE Humanities Data Benchmark, 0.5.0-pre1

Benchmark Results

Blacklist Cards

Index cards of companies on a British 'black list' in the 1940s. Assesses models' capability to recognize typed and handwritten information from index cards.

Dataset Description Result Overview Test Runs

This benchmark has been run 93 times. It uses fuzzy metric.

Overview

Tested providers: mistral, openai, openrouter, x-ai, alibaba, genai, scicore, anthropic

Tested models: ministral-14b-2512, qwen3.5-397b-a17b, pixtral-large-2411, claude-opus-4-5-20251101, o3, gpt-4.1-mini, claude-sonnet-4-6, x-ai/grok-4, magistral-small-2509, qwen/qwen3-vl-8b-thinking, qwen3.5-35b-a3b, qwen/qwen3-vl-8b-instruct, qwen/qwen3-vl-30b-a3b-instruct, pixtral-12b, gemini-2.0-flash, claude-opus-4-1-20250805, qwen3.5-flash-2026-02-23, gemini-2.5-flash, gpt-5, magistral-medium-2509, gpt-5-mini, gemini-3.1-flash-lite-preview, ministral-8b-2512, gemini-3-flash-preview, claude-3-opus-20240229, gemini-2.5-flash-preview-09-2025, mistral-small-2506, gpt-4.1, gpt-4o-mini, grok-4.20-0309-reasoning, claude-opus-4-20250514, claude-3-5-sonnet-20241022, gemini-2.5-pro, mistral-large-2512, GLM-4.5V-FP8, gpt-5.3-codex, qwen3.5-27b, claude-sonnet-4-20250514, qwen3.5-plus-2026-02-15, qwen3.5-122b-a10b, claude-3-7-sonnet-20250219, gpt-4o, gemini-3.1-pro-preview, gpt-4.1-nano, meta-llama/llama-4-maverick, gemini-2.5-flash-lite, gpt-5-nano, gemini-2.5-flash-lite-preview-09-2025, claude-haiku-4-5-20251001, mistral-medium-2505, gemini-3-pro-preview, mistral-large-2411, gpt-5.2-2025-12-11, gpt-5.4-2026-03-05, mistral-medium-2508, claude-opus-4-6, gemini-2.0-flash-lite, gpt-5.1-2025-11-13, claude-sonnet-4-5-20250929

Last 5 Runs

ScoreDateProviderModel
94.541 week agoalibabaqwen3.5-27b
90.891 week agoalibabaqwen3.5-35b-a3b
62.771 week agoalibabaqwen3.5-flash-2026-02-23
91.551 week agoalibabaqwen3.5-122b-a10b
88.341 week agoalibabaqwen3.5-397b-a17b

All test runs

Contributors

RoleContributors
Domain expertLea Kasper
Data curatorSorin Marti
AnnotatorLea Kasper, Sorin Marti
AnalystLea Kasper, Sorin Marti
EngineerSorin Marti

Tags
  • Type(s): index-card
  • Benchmark task(s):  information-extraction
  • Writing: typed, handwritten
  • Source creation (century): 20
  • Source Layout: n/a
  • Language(s): de