RISE Humanities Data Benchmark, 0.5.3-pre1

Search Test Runs

A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.

A test run includes:

Prompt and role definition – what the model was asked to do and from what perspective (e.g. “as a historian”).
Model configuration – provider, model version, temperature, and other generation parameters.
Results – the model’s actual response and its evaluation (scores such as F1 or accuracy).
Usage and cost data – token counts and calculated API costs.
Metadata – information like the test date, benchmark name, and person who executed it.

Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.

Search Results
Show full results Refine Search New Search

Download JSON Download CSV

Your search for Benchmark 'general_meeting_minutes__true' with Search Hidden 'False' returned 77 results, showing page 1 of 8.

Result 1 of 77

Test T1230 at 2026-07-03

{'document-type': ['minutes'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['it', 'fr', 'de'], 'layout': ['table'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 2 of 77

Test T1214 at 2026-07-03

{'document-type': ['minutes'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['it', 'fr', 'de'], 'layout': ['table'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 3 of 77

Test T1213 at 2026-07-03

{'document-type': ['minutes'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['it', 'fr', 'de'], 'layout': ['table'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 4 of 77

Test T1237 at 2026-07-03

{'document-type': ['minutes'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['it', 'fr', 'de'], 'layout': ['table'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 5 of 77

Test T1254 at 2026-07-03

{'document-type': ['minutes'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['it', 'fr', 'de'], 'layout': ['table'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 6 of 77

Test T1246 at 2026-07-03

{'document-type': ['minutes'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['it', 'fr', 'de'], 'layout': ['table'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 7 of 77

Test T1261 at 2026-07-03

{'document-type': ['minutes'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['it', 'fr', 'de'], 'layout': ['table'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 8 of 77

Test T1268 at 2026-07-03

{'document-type': ['minutes'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['it', 'fr', 'de'], 'layout': ['table'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 9 of 77

Test T1265 at 2026-07-03

{'document-type': ['minutes'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['it', 'fr', 'de'], 'layout': ['table'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 10 of 77

Test T1263 at 2026-07-03

{'document-type': ['minutes'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['it', 'fr', 'de'], 'layout': ['table'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Previous
1 (current)
2
3
4
5
6
7
8
Next