RISE Humanities Data Benchmark, 0.5.1

Benchmark Results

Fraktur Adverts

Assesses models' capability to recognize and transcribe historical German Fraktur script, a Gothic typeface commonly used in German-language documents.

Dataset Description Result Overview Test Runs

This benchmark has been run 159 times. It uses cer metric.

Overview

Tested providers: x-ai, openrouter, genai, openai, anthropic, mistral, alibaba, scicore

Tested models: claude-opus-4-6, qwen3.5-35b-a3b, gpt-5-nano, gemini-2.0-flash-lite, qwen/qwen3.5-flash-02-23, meta-llama/llama-4-maverick, gpt-4o-mini, qwen/qwen3-vl-8b-instruct, gpt-5.4-2026-03-05, gemini-2.5-pro, qwen/qwen3-vl-8b-thinking, gpt-5-mini, gpt-4.1, magistral-small-2509, qwen/qwen3.5-35b-a3b, qwen/qwen3.5-397b-a17b, gemini-exp-1206, gpt-4.1-nano, google/gemma-4-26b-a4b-it, qwen/qwen3.5-122b-a10b, mistral-small-2506, claude-opus-4-20250514, gemini-2.5-flash-lite, mistral-medium-2508, claude-opus-4-5-20251101, gpt-5.5-2026-04-23, gemini-3.1-pro-preview, gpt-5, gemini-2.0-pro-exp-02-05, qwen/qwen3.5-9b, qwen/qwen3-vl-30b-a3b-instruct, claude-3-5-sonnet-20241022, gemini-3-flash-preview, google/gemma-4-31b-it, gemini-2.5-flash-preview-04-17, x-ai/grok-4, gpt-5.1-2025-11-13, claude-haiku-4-5-20251001, gemini-2.0-flash, claude-sonnet-4-5-20250929, grok-4.20-0309-reasoning, pixtral-12b, claude-opus-4-1-20250805, qwen/qwen3.5-27b, gemini-3.1-flash-lite-preview, gpt-4.5-preview, gemini-2.5-flash, qwen3.5-plus-2026-02-15, gpt-4o, GLM-4.5V-FP8, qwen3.5-397b-a17b, mistral-large-2411, claude-3-7-sonnet-20250219, mistral-medium-2505, o3, gemini-2.5-flash-preview-09-2025, ministral-14b-2512, gemini-2.5-pro-exp-03-25, qwen3.5-122b-a10b, gemini-2.5-flash-lite-preview-09-2025, gemini-1.5-flash, magistral-medium-2509, gpt-5.2-2025-12-11, qwen3.5-27b, claude-sonnet-4-20250514, claude-opus-4-7, qwen3.5-flash-2026-02-23, gpt-4.1-mini, claude-sonnet-4-6, qwen/qwen3.6-plus, pixtral-large-2411, mistral-large-2512, gemini-2.5-pro-preview-05-06, gemini-1.5-pro, ministral-8b-2512, gpt-5.3-codex, gemini-3-pro-preview, claude-3-opus-20240229, qwen/qwen3.5-plus-02-15

Last 5 Runs

ScoreDateProviderModel
96.003 weeks agoopenaigpt-5.5-2026-04-23
48.604 weeks agoopenrouterqwen/qwen3.5-9b
28.701 month agoopenroutergoogle/gemma-4-26b-a4b-it
77.401 month agoopenrouterqwen/qwen3.5-plus-02-15
51.201 month agoopenroutergoogle/gemma-4-31b-it

All test runs

Contributors

RoleContributors
Domain expertIna Serif
Data curatorIna Serif
AnnotatorIna Serif
AnalystMaximilian Hindermann
EngineerMaximilian Hindermann, Ina Serif

Tags
  • Type(s): book-page
  • Benchmark task(s):  transcription
  • Writing: printed
  • Source creation (century): 18, 19
  • Source Layout: prose
  • Language(s): de