RISE Humanities Data Benchmark

A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.

A test run includes:

Prompt and role definition – what the model was asked to do and from what perspective (e.g. “as a historian”).
Model configuration – provider, model version, temperature, and other generation parameters.
Results – the model’s actual response and its evaluation (scores such as F1 or accuracy).
Usage and cost data – token counts and calculated API costs.
Metadata – information like the test date, benchmark name, and person who executed it.

Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.

Result 51 of 186

Test T0965 at 2026-04-21

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	openrouter
Model	qwen/qwen3.5-plus-20260216

Temperature	0.5
Dataclass	ListPage

Normalized Score	55.80 %
Test time	unknown seconds

Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.56	0.56	0.55	0.57	15	567	455	425
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 46.8K IT + 49.5K OT = 96.3K TT

Cost: 0.012$ + 0.077$ = 0.089$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 52 of 186

Test T1018 at 2026-04-21

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	openrouter
Model	google/gemma-4-31b-it-20260402

Temperature	0.5
Dataclass	ListPage

Normalized Score	29.33 %
Test time	unknown seconds

Prompt

- Answer in valid JSON.
- The page ID is given as {page_id}.

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.31	0.29	0.33	0.29	15	286	583	706
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 4.8K IT + 12.8K OT = 17.6K TT

Cost: 0.001$ + 0.005$ = 0.005$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 53 of 186

Test T0913 at 2026-04-21

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	openrouter
Model	qwen/qwen3.5-122b-a10b-20260224

Temperature	0.5
Dataclass	ListPage

Normalized Score	54.93 %
Test time	unknown seconds

Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.55	0.55	0.55	0.55	15	549	457	443
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 46.8K IT + 15.2K OT = 62.0K TT

Cost: 0.012$ + 0.032$ = 0.044$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 54 of 186

Test T0966 at 2026-04-21

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	openrouter
Model	qwen/qwen3.5-plus-20260216

Temperature	0.5
Dataclass	ListPage

Normalized Score	42.13 %
Test time	unknown seconds

Prompt

- Answer in valid JSON.
- The page ID is given as {page_id}.

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.41	0.42	0.40	0.41	15	410	617	582
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 41.3K IT + 43.1K OT = 84.4K TT

Cost: 0.011$ + 0.067$ = 0.078$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 55 of 186

Test T0900 at 2026-04-21

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	openrouter
Model	qwen/qwen3.6-plus-04-02

Temperature	0.5
Dataclass	ListPage

Normalized Score	53.80 %
Test time	unknown seconds

Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.53	0.54	0.53	0.53	15	525	463	467
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 46.7K IT + 63.4K OT = 110.1K TT

Cost: 0.015$ + 0.124$ = 0.139$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 56 of 186

Test T0978 at 2026-04-21

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	openrouter
Model	qwen/qwen3.5-flash-20260224

Temperature	0.5
Dataclass	ListPage

Normalized Score	54.47 %
Test time	unknown seconds

Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.54	0.54	0.53	0.55	15	543	483	449
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 46.8K IT + 15.8K OT = 62.6K TT

Cost: 0.003$ + 0.004$ = 0.007$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 57 of 186

Test T0992 at 2026-04-21

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	openrouter
Model	qwen/qwen3.5-9b-20260310

Temperature	0.5
Dataclass	ListPage

Normalized Score	24.40 %
Test time	unknown seconds

Prompt

- Answer in valid JSON.
- The page ID is given as {page_id}.

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.22	0.24	0.27	0.19	15	190	512	802
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 40.7K IT + 74.9K OT = 115.6K TT

Cost: 0.004$ + 0.011$ = 0.015$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 58 of 186

Test T1004 at 2026-04-21

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	openrouter
Model	google/gemma-4-26b-a4b-it-20260403

Temperature	0.5
Dataclass	ListPage

Normalized Score	14.73 %
Test time	unknown seconds

Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.14	0.15	0.14	0.15	15	144	919	848
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 10.4K IT + 19.9K OT = 30.3K TT

Cost: 0.001$ + 0.007$ = 0.008$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 59 of 186

Test T0991 at 2026-04-21

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	openrouter
Model	qwen/qwen3.5-9b-20260310

Temperature	0.5
Dataclass	ListPage

Normalized Score	56.93 %
Test time	unknown seconds

Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.57	0.57	0.57	0.57	15	567	426	425
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 47.3K IT + 59.2K OT = 106.5K TT

Cost: 0.005$ + 0.009$ = 0.014$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 60 of 186

Test T1030 at 2026-04-21

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	anthropic
Model	claude-opus-4-7

Temperature	0.5
Dataclass	ListPage

Normalized Score	46.00 %
Test time	unknown seconds

Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.45	0.46	0.45	0.45	15	447	538	545
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 79.0K IT + 18.0K OT = 96.9K TT

Cost: 0.395$ + 0.449$ = 0.844$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Search Test Runs

Search Results
Show compact results Refine Search New Search

Download JSON Download CSV

Test T0965 at 2026-04-21

Test T1018 at 2026-04-21

Test T0913 at 2026-04-21

Test T0966 at 2026-04-21

Test T0900 at 2026-04-21

Test T0978 at 2026-04-21

Test T0992 at 2026-04-21

Test T1004 at 2026-04-21

Test T0991 at 2026-04-21

Test T1030 at 2026-04-21

Search Test Runs

Search Results Show compact results Refine Search New Search Download Download JSON Download CSV

Test T0965 at 2026-04-21

Test T1018 at 2026-04-21

Test T0913 at 2026-04-21

Test T0966 at 2026-04-21

Test T0900 at 2026-04-21

Test T0978 at 2026-04-21

Test T0992 at 2026-04-21

Test T1004 at 2026-04-21

Test T0991 at 2026-04-21

Test T1030 at 2026-04-21

Search Results
Show compact results Refine Search New Search

Download JSON Download CSV