RISE Humanities Data Benchmark, 0.5.0-pre1

Search Test Runs

 

A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.

A test run includes:

  • Prompt and role definition – what the model was asked to do and from what perspective (e.g. “as a historian”).
  • Model configuration – provider, model version, temperature, and other generation parameters.
  • Results – the model’s actual response and its evaluation (scores such as F1 or accuracy).
  • Usage and cost data – token counts and calculated API costs.
  • Metadata – information like the test date, benchmark name, and person who executed it.

Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.

Search Results

Your search for Benchmark 'personnel_cards__true' with Search Hidden 'False' returned 57 results, showing page 1 of 6.
Result 1 of 57

Test T0890 at 2026-03-25

{'document-type': ['index-card'], 'writing': ['handwritten', 'typed', 'printed'], 'century': [20], 'layout': ['table', 'form'], 'task': ['transcription', 'document-understanding', 'data-correction'], 'language': ['de', 'fr']}

Result 2 of 57

Test T0864 at 2026-03-25

{'document-type': ['index-card'], 'writing': ['handwritten', 'typed', 'printed'], 'century': [20], 'layout': ['table', 'form'], 'task': ['transcription', 'document-understanding', 'data-correction'], 'language': ['de', 'fr']}

Result 3 of 57

Test T0838 at 2026-03-25

{'document-type': ['index-card'], 'writing': ['handwritten', 'typed', 'printed'], 'century': [20], 'layout': ['table', 'form'], 'task': ['transcription', 'document-understanding', 'data-correction'], 'language': ['de', 'fr']}

Result 4 of 57

Test T0877 at 2026-03-25

{'document-type': ['index-card'], 'writing': ['handwritten', 'typed', 'printed'], 'century': [20], 'layout': ['table', 'form'], 'task': ['transcription', 'document-understanding', 'data-correction'], 'language': ['de', 'fr']}

Result 5 of 57

Test T0851 at 2026-03-25

{'document-type': ['index-card'], 'writing': ['handwritten', 'typed', 'printed'], 'century': [20], 'layout': ['table', 'form'], 'task': ['transcription', 'document-understanding', 'data-correction'], 'language': ['de', 'fr']}

Result 6 of 57

Test T0825 at 2026-03-24

{'document-type': ['index-card'], 'writing': ['handwritten', 'typed', 'printed'], 'century': [20], 'layout': ['table', 'form'], 'task': ['transcription', 'document-understanding', 'data-correction'], 'language': ['de', 'fr']}

Result 7 of 57

Test T0729 at 2026-03-23

{'document-type': ['index-card'], 'writing': ['handwritten', 'typed', 'printed'], 'century': [20], 'layout': ['table', 'form'], 'task': ['transcription', 'document-understanding', 'data-correction'], 'language': ['de', 'fr']}

Result 8 of 57

Test T0704 at 2026-03-23

{'document-type': ['index-card'], 'writing': ['handwritten', 'typed', 'printed'], 'century': [20], 'layout': ['table', 'form'], 'task': ['transcription', 'document-understanding', 'data-correction'], 'language': ['de', 'fr']}

Result 9 of 57

Test T0602 at 2026-03-23

{'document-type': ['index-card'], 'writing': ['handwritten', 'typed', 'printed'], 'century': [20], 'layout': ['table', 'form'], 'task': ['transcription', 'document-understanding', 'data-correction'], 'language': ['de', 'fr']}

Result 10 of 57

Test T0704 at 2026-03-17

{'document-type': ['index-card'], 'writing': ['handwritten', 'typed', 'printed'], 'century': [20], 'layout': ['table', 'form'], 'task': ['transcription', 'document-understanding', 'data-correction'], 'language': ['de', 'fr']}