RISE Humanities Data Benchmark, 0.5.0-pre1

Benchmark Results

Medieval Manuscripts

Evaluates models on page segmentation and handwritten text extraction from 15th century medieval manuscripts written in late medieval German. Tests the ability to transcribe historical handwriting, identify folio numbers, distinguish main text from marginal additions, and maintain historical spelling and formatting. Performance is measured using fuzzy string matching and Character Error Rate (CER).

Dataset Description Result Overview Test Runs

This benchmark has been run 92 times. It uses cer metric.

Overview

Tested providers: mistral, openrouter, x-ai, alibaba, genai, anthropic, scicore, openai

Tested models: ministral-14b-2512, qwen3.5-397b-a17b, pixtral-large-2411, claude-opus-4-5-20251101, o3, gpt-4.1-mini, claude-sonnet-4-6, x-ai/grok-4, magistral-small-2509, qwen/qwen3-vl-8b-thinking, qwen3.5-35b-a3b, qwen/qwen3-vl-8b-instruct, qwen/qwen3-vl-30b-a3b-instruct, pixtral-12b, gemini-2.0-flash, claude-opus-4-1-20250805, qwen3.5-flash-2026-02-23, gemini-2.5-flash, gpt-5, magistral-medium-2509, gpt-5-mini, gemini-3.1-flash-lite-preview, ministral-8b-2512, gemini-3-flash-preview, claude-3-opus-20240229, gemini-2.5-flash-preview-09-2025, mistral-small-2506, gpt-4.1, gpt-4o-mini, grok-4.20-0309-reasoning, claude-3-5-sonnet-20241022, claude-opus-4-20250514, gemini-2.5-pro, mistral-large-2512, GLM-4.5V-FP8, gpt-5.3-codex, qwen3.5-122b-a10b, claude-sonnet-4-20250514, qwen3.5-plus-2026-02-15, qwen3.5-27b, claude-3-7-sonnet-20250219, gpt-4o, gemini-3.1-pro-preview, gpt-4.1-nano, gemini-2.5-flash-lite, gemini-2.5-flash-lite-preview-09-2025, gpt-5-nano, claude-haiku-4-5-20251001, mistral-medium-2505, claude-sonnet-4-5-20250929, gemini-3-pro-preview, mistral-large-2411, gpt-5.2-2025-12-11, gpt-5.4-2026-03-05, mistral-medium-2508, claude-opus-4-6, gemini-2.0-flash-lite, gpt-5.1-2025-11-13, meta-llama/llama-4-maverick

Last 5 Runs

ScoreDateProviderModel
0.004 days agomistralministral-14b-2512
69.004 days agogenaigemini-2.5-flash-lite-preview-09-2025
58.704 days agogenaigemini-2.5-flash-lite
68.001 week agoalibabaqwen3.5-flash-2026-02-23
67.401 week agoalibabaqwen3.5-397b-a17b

All test runs

Contributors

RoleContributors
Domain expertIna Serif
Data curatorIna Serif
AnnotatorIna Serif
AnalystMaximilian Hindermann
EngineerMaximilian Hindermann, Ina Serif

Tags
  • Type(s): manuscript
  • Benchmark task(s):  transcription
  • Writing: handwritten
  • Source creation (century): 15
  • Source Layout: prose
  • Language(s): de