RISE Humanities Data Benchmark

A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.

A test run includes:

Prompt and role definition – what the model was asked to do and from what perspective (e.g. “as a historian”).
Model configuration – provider, model version, temperature, and other generation parameters.
Results – the model’s actual response and its evaluation (scores such as F1 or accuracy).
Usage and cost data – token counts and calculated API costs.
Metadata – information like the test date, benchmark name, and person who executed it.

Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.

Result 1 of 20

Test T0708 at 2026-03-23

{'document-type': ['minutes'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['it', 'fr', 'de'], 'layout': ['table'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}

Configuration

Provider	openai
Model	gpt-5.3-codex

Temperature	1.0
Dataclass	MinutesPage

Normalized Score	83.31 %
Test time	unknown seconds

Prompt

Extract the table content from the page image and return it in the required JSON structure.

The page contains a table of shareholders with the following columns:

1. N° D'ORDRE (row number)
2. NOM, PRÉNOMS ET DOMICILE DES ACTIONNAIRES (name and address)
3. ACTIONS O
4. ACTIONS P
5. NOMBRE DE VOIX
6. SIGNATURE DES ACTIONNAIRES OU DES MANDATAIRES

ROW STRUCTURE
-------------
Each entry begins with the row number in the first column.
All lines belonging to that number form one entry.

The second column contains BOTH name and address in the same field.
Split this content into:

- name
- address

The name is always the first line.
All following lines belong to the address.

Preserve the line breaks inside the address exactly as in the source.

Do NOT modify spelling or add accents. Transcribe the text exactly as shown.

Dashes inside addresses must be preserved.
Only remove a dash if it clearly separates the name from the address on the same line.

NUMERIC COLUMNS
---------------
Extract the values from the table columns:

- actions_o
- actions_p
- no_de_voix

Leave fields empty if no value is present.

SIGNATURE COLUMN
----------------
If the signature column contains handwriting or text:

signature_present = true
signature = transcription of the visible text

If the column is empty:

signature_present = false
signature = ""

TOTALS
------
Totals may appear below the table near the text "A REPORTER".
Only extract totals if explicit numbers are present.
Otherwise return empty strings for:

total_o
total_p
total_voix

DOCUMENT METADATA
-----------------
filename = {filename}
page_number = {page_number}

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
0.83	n/a	n/a	n/a	n/a	n/a	n/a	n/a	n/a
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 23.7K IT + 6.1K OT = 29.8K TT

Cost: 0.042$ + 0.085$ = 0.127$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 2 of 20

Test T0706 at 2026-03-23

{'document-type': ['minutes'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['it', 'fr', 'de'], 'layout': ['table'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}

Configuration

Provider	openai
Model	gpt-5.3-codex

Temperature	1.0
Dataclass	MinutesPage

Normalized Score	84.79 %
Test time	unknown seconds

Prompt

Filename: {filename}
Page: {page_number}

Prompt 2

CRITICAL RULES
--------------

1) Name and Address are in the same table field.
   - Extract them into separate fields: "name" and "address".
   - Remove only the visual splitting characters (e.g., a dash used only to visually separate name and address).
   - Preserve line breaks exactly as in the source.
   - Addresses may contain multiple dashes; preserve those. Only remove the dash(es) that act purely as a separator between name and address.

2) Transcription fidelity (no normalization):
   - Many entries are in French. The original often omits accents (é, è) and shows only "e".
   - Do NOT correct spelling and do NOT add French diacritics unless they are explicitly present in the original image.
   - Transcribe exactly what is visible, including punctuation and spacing as much as possible.

3) totals_actions must NOT be computed:
   - Do NOT calculate totals.
   - Only fill "total_actions" if the totals are explicitly written on the page.
   - You can always find the information for the totals at the end of the page after “A Reporter”
   - If there is no totals row/section, still output the JSON but set total_actions fields to empty strings OR omit total_actions only if your parser requires it (prefer empty strings if the field is required).

4) actions_o and actions_p content:
   - These fields may contain digits as well as "/" and "-" characters. Preserve them exactly as seen.

5) Column order may vary:
   - The order of the "actions_o" and "actions_p" columns can change from page to page.
   - Do NOT assume a fixed column order. Always verify from the header / column labels / layout in the original image which column corresponds to "O" and which to "P", and map values accordingly.

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
0.85	n/a	n/a	n/a	n/a	n/a	n/a	n/a	n/a
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 24.0K IT + 5.0K OT = 29.0K TT

Cost: 0.042$ + 0.071$ = 0.113$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 3 of 20

Test T0705 at 2026-03-23

{'document-type': ['minutes'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['it', 'fr', 'de'], 'layout': ['table'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}

Configuration

Provider	openai
Model	gpt-5.3-codex

Temperature	1.0
Dataclass	MinutesPage

Normalized Score	81.99 %
Test time	unknown seconds

Prompt

Please extract the metadata according to the given output format.
Name and Address are in the same table field. Your task is to extract them into separate fields.
Lose any dashes between name and address but preserve linebreaks.
Be aware: Addresses may contain multiple dashes; preserve them, only remove "visual splitting characters" between name and address.
Filename: {filename}
Page: {page_number}

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
0.82	n/a	n/a	n/a	n/a	n/a	n/a	n/a	n/a
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 21.2K IT + 5.8K OT = 27.0K TT

Cost: 0.037$ + 0.081$ = 0.118$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 4 of 20

Test T0709 at 2026-03-23

{'document-type': ['minutes'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['it', 'fr', 'de'], 'layout': ['table'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}

Configuration

Provider	openai
Model	gpt-5.3-codex

Temperature	1.0
Dataclass	MinutesPage

Normalized Score	83.29 %
Test time	unknown seconds

Prompt

Please extract the metadata according to the given output format.

Name and address are in the same table field. Split them into:

- name
- address

The name is always the first line.
All following lines belong to the address.

Preserve line breaks exactly as they appear.
Do not normalize spelling or accents.

Steps:
1. Identify each row of the table using the row number.
2. For each row extract the table columns.
3. Split the name/address field.
4. Return the final JSON.

Document Metadata:
filename = {filename}
page_number = {page_number}

Return only the JSON.

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
0.83	n/a	n/a	n/a	n/a	n/a	n/a	n/a	n/a
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 21.6K IT + 4.9K OT = 26.5K TT

Cost: 0.038$ + 0.069$ = 0.107$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 5 of 20

Test T0707 at 2026-03-23

{'document-type': ['minutes'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['it', 'fr', 'de'], 'layout': ['table'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}

Configuration

Provider	openai
Model	gpt-5.3-codex

Temperature	1.0
Dataclass	MinutesPage

Normalized Score	83.63 %
Test time	unknown seconds

Prompt

Please extract the metadata according to the given output format.
Name and Address are in the same table field. Your task is to extract them into separate fields.
Lose any dashes between name and address but preserve linebreaks.
Be aware: Addresses may contain multiple dashes; preserve them, only remove "visual splitting characters" between name and address.
Each numbered entry may span multiple lines in the name/address field.
The presented image may contain rotated pages, extract everything, treat them as a MinutesPage.
Filename: {filename}
Page: {page_number}

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
0.84	n/a	n/a	n/a	n/a	n/a	n/a	n/a	n/a
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 21.5K IT + 5.9K OT = 27.4K TT

Cost: 0.038$ + 0.083$ = 0.121$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 6 of 20

Test T0706 at 2026-03-17

{'document-type': ['minutes'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['it', 'fr', 'de'], 'layout': ['table'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}

Configuration

Provider	openai
Model	gpt-5.3-codex

Temperature	1.0
Dataclass	MinutesPage

Normalized Score	84.12 %
Test time	unknown seconds

Prompt

Filename: {filename}
Page: {page_number}

Prompt 2

CRITICAL RULES
--------------

1) Name and Address are in the same table field.
   - Extract them into separate fields: "name" and "address".
   - Remove only the visual splitting characters (e.g., a dash used only to visually separate name and address).
   - Preserve line breaks exactly as in the source.
   - Addresses may contain multiple dashes; preserve those. Only remove the dash(es) that act purely as a separator between name and address.

2) Transcription fidelity (no normalization):
   - Many entries are in French. The original often omits accents (é, è) and shows only "e".
   - Do NOT correct spelling and do NOT add French diacritics unless they are explicitly present in the original image.
   - Transcribe exactly what is visible, including punctuation and spacing as much as possible.

3) totals_actions must NOT be computed:
   - Do NOT calculate totals.
   - Only fill "total_actions" if the totals are explicitly written on the page.
   - You can always find the information for the totals at the end of the page after “A Reporter”
   - If there is no totals row/section, still output the JSON but set total_actions fields to empty strings OR omit total_actions only if your parser requires it (prefer empty strings if the field is required).

4) actions_o and actions_p content:
   - These fields may contain digits as well as "/" and "-" characters. Preserve them exactly as seen.

5) Column order may vary:
   - The order of the "actions_o" and "actions_p" columns can change from page to page.
   - Do NOT assume a fixed column order. Always verify from the header / column labels / layout in the original image which column corresponds to "O" and which to "P", and map values accordingly.

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
0.84	n/a	n/a	n/a	n/a	n/a	n/a	n/a	n/a
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 24.0K IT + 5.1K OT = 29.0K TT

Cost: 0.042$ + 0.071$ = 0.113$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 7 of 20

Test T0708 at 2026-03-17

{'document-type': ['minutes'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['it', 'fr', 'de'], 'layout': ['table'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}

Configuration

Provider	openai
Model	gpt-5.3-codex

Temperature	1.0
Dataclass	MinutesPage

Normalized Score	82.96 %
Test time	unknown seconds

Prompt

Extract the table content from the page image and return it in the required JSON structure.

The page contains a table of shareholders with the following columns:

1. N° D'ORDRE (row number)
2. NOM, PRÉNOMS ET DOMICILE DES ACTIONNAIRES (name and address)
3. ACTIONS O
4. ACTIONS P
5. NOMBRE DE VOIX
6. SIGNATURE DES ACTIONNAIRES OU DES MANDATAIRES

ROW STRUCTURE
-------------
Each entry begins with the row number in the first column.
All lines belonging to that number form one entry.

The second column contains BOTH name and address in the same field.
Split this content into:

- name
- address

The name is always the first line.
All following lines belong to the address.

Preserve the line breaks inside the address exactly as in the source.

Do NOT modify spelling or add accents. Transcribe the text exactly as shown.

Dashes inside addresses must be preserved.
Only remove a dash if it clearly separates the name from the address on the same line.

NUMERIC COLUMNS
---------------
Extract the values from the table columns:

- actions_o
- actions_p
- no_de_voix

Leave fields empty if no value is present.

SIGNATURE COLUMN
----------------
If the signature column contains handwriting or text:

signature_present = true
signature = transcription of the visible text

If the column is empty:

signature_present = false
signature = ""

TOTALS
------
Totals may appear below the table near the text "A REPORTER".
Only extract totals if explicit numbers are present.
Otherwise return empty strings for:

total_o
total_p
total_voix

DOCUMENT METADATA
-----------------
filename = {filename}
page_number = {page_number}

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
0.83	n/a	n/a	n/a	n/a	n/a	n/a	n/a	n/a
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 23.7K IT + 6.3K OT = 30.1K TT

Cost: 0.042$ + 0.089$ = 0.130$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 8 of 20

Test T0707 at 2026-03-17

{'document-type': ['minutes'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['it', 'fr', 'de'], 'layout': ['table'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}

Configuration

Provider	openai
Model	gpt-5.3-codex

Temperature	1.0
Dataclass	MinutesPage

Normalized Score	83.70 %
Test time	unknown seconds

Prompt

Please extract the metadata according to the given output format.
Name and Address are in the same table field. Your task is to extract them into separate fields.
Lose any dashes between name and address but preserve linebreaks.
Be aware: Addresses may contain multiple dashes; preserve them, only remove "visual splitting characters" between name and address.
Each numbered entry may span multiple lines in the name/address field.
The presented image may contain rotated pages, extract everything, treat them as a MinutesPage.
Filename: {filename}
Page: {page_number}

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
0.84	n/a	n/a	n/a	n/a	n/a	n/a	n/a	n/a
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 21.5K IT + 5.4K OT = 26.8K TT

Cost: 0.038$ + 0.075$ = 0.113$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 9 of 20

Test T0705 at 2026-03-17

{'document-type': ['minutes'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['it', 'fr', 'de'], 'layout': ['table'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}

Configuration

Provider	openai
Model	gpt-5.3-codex

Temperature	1.0
Dataclass	MinutesPage

Normalized Score	85.59 %
Test time	unknown seconds

Prompt

Please extract the metadata according to the given output format.
Name and Address are in the same table field. Your task is to extract them into separate fields.
Lose any dashes between name and address but preserve linebreaks.
Be aware: Addresses may contain multiple dashes; preserve them, only remove "visual splitting characters" between name and address.
Filename: {filename}
Page: {page_number}

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
0.86	n/a	n/a	n/a	n/a	n/a	n/a	n/a	n/a
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 21.2K IT + 5.4K OT = 26.6K TT

Cost: 0.037$ + 0.076$ = 0.113$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 10 of 20

Test T0709 at 2026-03-17

{'document-type': ['minutes'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['it', 'fr', 'de'], 'layout': ['table'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}

Configuration

Provider	openai
Model	gpt-5.3-codex

Temperature	1.0
Dataclass	MinutesPage

Normalized Score	78.49 %
Test time	unknown seconds

Prompt

Please extract the metadata according to the given output format.

Name and address are in the same table field. Split them into:

- name
- address

The name is always the first line.
All following lines belong to the address.

Preserve line breaks exactly as they appear.
Do not normalize spelling or accents.

Steps:
1. Identify each row of the table using the row number.
2. For each row extract the table columns.
3. Split the name/address field.
4. Return the final JSON.

Document Metadata:
filename = {filename}
page_number = {page_number}

Return only the JSON.

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
0.78	n/a	n/a	n/a	n/a	n/a	n/a	n/a	n/a
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 21.6K IT + 4.6K OT = 26.2K TT

Cost: 0.038$ + 0.065$ = 0.103$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Search Test Runs

Search Results
Show compact results Refine Search New Search

Download JSON Download CSV

Test T0708 at 2026-03-23

Test T0706 at 2026-03-23

Test T0705 at 2026-03-23

Test T0709 at 2026-03-23

Test T0707 at 2026-03-23

Test T0706 at 2026-03-17

Test T0708 at 2026-03-17

Test T0707 at 2026-03-17

Test T0705 at 2026-03-17

Test T0709 at 2026-03-17

Search Test Runs

Search Results Show compact results Refine Search New Search Download Download JSON Download CSV

Test T0708 at 2026-03-23

Test T0706 at 2026-03-23

Test T0705 at 2026-03-23

Test T0709 at 2026-03-23

Test T0707 at 2026-03-23

Test T0706 at 2026-03-17

Test T0708 at 2026-03-17

Test T0707 at 2026-03-17

Test T0705 at 2026-03-17

Test T0709 at 2026-03-17

Search Results
Show compact results Refine Search New Search

Download JSON Download CSV