RISE Humanities Data Benchmark

A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.

A test run includes:

Prompt and role definition – what the model was asked to do and from what perspective (e.g. “as a historian”).
Model configuration – provider, model version, temperature, and other generation parameters.
Results – the model’s actual response and its evaluation (scores such as F1 or accuracy).
Usage and cost data – token counts and calculated API costs.
Metadata – information like the test date, benchmark name, and person who executed it.

Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.

Result 101 of 186

Test T0343 at 2026-01-25

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	openai
Model	gpt-4.1-mini-2025-04-14

Temperature	0.5
Dataclass	ListPage

Normalized Score	46.93 %
Test time	unknown seconds

Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.49	0.47	0.50	0.48	15	481	489	511
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 43.6K IT + 9.1K OT = 52.7K TT

Cost: 0.017$ + 0.015$ = 0.032$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 102 of 186

Test T0540 at 2026-01-25

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	mistral
Model	mistral-large-2512

Temperature	0.5
Dataclass	ListPage

Normalized Score	0.27 %
Test time	unknown seconds

Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.00	0.00	0.01	0.00	15	2	292	990
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 8.2K IT + 10.0K OT = 18.2K TT

Cost: 0.004$ + 0.015$ = 0.019$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 103 of 186

Test T0338 at 2026-01-25

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	openai
Model	gpt-4o-2024-08-06

Temperature	0.5
Dataclass	ListPage

Normalized Score	36.53 %
Test time	unknown seconds

Prompt

- Answer in valid JSON.
- The page ID is given as {page_id}.

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.37	0.37	0.37	0.38	15	375	637	617
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 18.8K IT + 8.6K OT = 27.4K TT

Cost: 0.047$ + 0.086$ = 0.133$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 104 of 186

Test T0344 at 2026-01-25

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	openai
Model	gpt-4.1-mini-2025-04-14

Temperature	0.5
Dataclass	ListPage

Normalized Score	36.93 %
Test time	unknown seconds

Prompt

- Answer in valid JSON.
- The page ID is given as {page_id}.

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.39	0.37	0.39	0.39	15	387	606	605
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 38.5K IT + 8.8K OT = 47.3K TT

Cost: 0.015$ + 0.014$ = 0.029$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 105 of 186

Test T0393 at 2026-01-25

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	openrouter
Model	qwen/qwen3-vl-8b-thinking

Temperature	0.5
Dataclass	ListPage

Normalized Score	0.00 %
Test time	unknown seconds

Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.00	0.00	0.00	0.00	15	0	0	992
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 28.2K IT + 13.9K OT = 42.1K TT

Cost: 0.003$ + 0.019$ = 0.022$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 106 of 186

Test T0351 at 2026-01-25

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	openai
Model	gpt-5-nano-2025-08-07

Temperature	0.5
Dataclass	ListPage

Normalized Score	45.00 %
Test time	unknown seconds

Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.45	0.45	0.47	0.43	15	429	492	563
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 40.8K IT + 92.7K OT = 133.5K TT

Cost: 0.002$ + 0.037$ = 0.039$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 107 of 186

Test T0397 at 2026-01-25

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	openrouter
Model	qwen/qwen3-vl-30b-a3b-instruct

Temperature	0.5
Dataclass	ListPage

Normalized Score	41.40 %
Test time	unknown seconds

Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.40	0.41	0.38	0.41	15	410	664	582
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 46.9K IT + 13.7K OT = 60.6K TT

Cost: 0.006$ + 0.007$ = 0.013$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 108 of 186

Test T0508 at 2026-01-25

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	genai
Model	gemini-3-flash-preview

Temperature	0.5
Dataclass	ListPage

Normalized Score	39.60 %
Test time	unknown seconds

Prompt

- Answer in valid JSON.
- The page ID is given as {page_id}.

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.40	0.40	0.39	0.41	15	407	646	585
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 16.7K IT + 11.5K OT = 28.2K TT

Cost: 0.008$ + 0.035$ = 0.043$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 109 of 186

Test T0552 at 2026-01-25

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	mistral
Model	ministral-14b-2512

Temperature	0.5
Dataclass	ListPage

Normalized Score	0.00 %
Test time	unknown seconds

Prompt

- Answer in valid JSON.
- The page ID is given as {page_id}.

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.00	0.00	0.00	0.00	15	0	705	992
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 3.0K IT + 10.1K OT = 13.1K TT

Cost: 0.001$ + 0.002$ = 0.003$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 110 of 186

Test T0349 at 2026-01-25

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	openai
Model	gpt-5-mini-2025-08-07

Temperature	0.5
Dataclass	ListPage

Normalized Score	38.20 %
Test time	unknown seconds

Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.37	0.38	0.36	0.38	15	380	667	612
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 34.1K IT + 41.2K OT = 75.3K TT

Cost: 0.009$ + 0.082$ = 0.091$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Search Test Runs

Search Results
Show compact results Refine Search New Search

Download JSON Download CSV

Test T0343 at 2026-01-25

Test T0540 at 2026-01-25

Test T0338 at 2026-01-25

Test T0344 at 2026-01-25

Test T0393 at 2026-01-25

Test T0351 at 2026-01-25

Test T0397 at 2026-01-25

Test T0508 at 2026-01-25

Test T0552 at 2026-01-25

Test T0349 at 2026-01-25

Search Test Runs

Search Results Show compact results Refine Search New Search Download Download JSON Download CSV

Test T0343 at 2026-01-25

Test T0540 at 2026-01-25

Test T0338 at 2026-01-25

Test T0344 at 2026-01-25

Test T0393 at 2026-01-25

Test T0351 at 2026-01-25

Test T0397 at 2026-01-25

Test T0508 at 2026-01-25

Test T0552 at 2026-01-25

Test T0349 at 2026-01-25

Search Results
Show compact results Refine Search New Search

Download JSON Download CSV