RISE Humanities Data Benchmark

A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.

A test run includes:

Prompt and role definition – what the model was asked to do and from what perspective (e.g. “as a historian”).
Model configuration – provider, model version, temperature, and other generation parameters.
Results – the model’s actual response and its evaluation (scores such as F1 or accuracy).
Usage and cost data – token counts and calculated API costs.
Metadata – information like the test date, benchmark name, and person who executed it.

Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.

Result 1 of 198

Test T1198 at 2026-07-02

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	anthropic
Model	claude-fable-5

Temperature	0.5
Dataclass	ListPage

Normalized Score	51.47 %
Test time	unknown seconds

Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.51	0.51	0.51	0.51	15	501	484	491
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 73.0K IT + 16.9K OT = 89.9K TT

Cost: 0.730$ + 0.844$ = 1.574$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 2 of 198

Test T1199 at 2026-07-02

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	anthropic
Model	claude-fable-5

Temperature	0.5
Dataclass	ListPage

Normalized Score	43.53 %
Test time	unknown seconds

Prompt

- Answer in valid JSON.
- The page ID is given as {page_id}.

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.43	0.44	0.41	0.45	15	444	627	548
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 65.8K IT + 18.7K OT = 84.5K TT

Cost: 0.658$ + 0.935$ = 1.594$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 3 of 198

Test T1186 at 2026-07-01

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	anthropic
Model	claude-sonnet-5

Temperature	0.5
Dataclass	ListPage

Normalized Score	50.00 %
Test time	unknown seconds

Prompt

- Answer in valid JSON.
- The page ID is given as {page_id}.

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.49	0.50	0.47	0.51	15	510	564	482
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 66.8K IT + 16.8K OT = 83.6K TT

Cost: 0.134$ + 0.168$ = 0.302$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 4 of 198

Test T1185 at 2026-07-01

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	anthropic
Model	claude-sonnet-5

Temperature	0.5
Dataclass	ListPage

Normalized Score	49.87 %
Test time	unknown seconds

Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.49	0.50	0.49	0.49	15	489	505	503
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 73.9K IT + 17.3K OT = 91.2K TT

Cost: 0.148$ + 0.173$ = 0.321$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 5 of 198

Test T1172 at 2026-06-29

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	genai
Model	gemini-3.1-flash-lite

Temperature	0.5
Dataclass	ListPage

Normalized Score	53.07 %
Test time	unknown seconds

Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.54	0.53	0.54	0.54	15	533	460	459
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 22.3K IT + 16.3K OT = 38.6K TT

Cost: 0.006$ + 0.025$ = 0.030$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 6 of 198

Test T1173 at 2026-06-29

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	genai
Model	gemini-3.1-flash-lite

Temperature	0.5
Dataclass	ListPage

Normalized Score	38.07 %
Test time	unknown seconds

Prompt

- Answer in valid JSON.
- The page ID is given as {page_id}.

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.40	0.38	0.41	0.39	15	388	569	604
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 16.7K IT + 14.4K OT = 31.1K TT

Cost: 0.004$ + 0.022$ = 0.026$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 7 of 198

Test T1159 at 2026-06-22

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	scicore
Model	qwen35-397b-a17b-fp8

Temperature	0.5
Dataclass	ListPage

Normalized Score	45.00 %
Test time	unknown seconds

Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.43	0.45	0.42	0.44	15	439	602	553
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 47.3K IT + 77.2K OT = 124.5K TT

Cost: 0.000$ + 0.000$ = 0.000$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 8 of 198

Test T1160 at 2026-06-22

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	scicore
Model	qwen35-397b-a17b-fp8

Temperature	0.5
Dataclass	ListPage

Normalized Score	23.60 %
Test time	unknown seconds

Prompt

- Answer in valid JSON.
- The page ID is given as {page_id}.

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.23	0.24	0.22	0.25	15	250	899	742
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 41.8K IT + 51.9K OT = 93.7K TT

Cost: 0.000$ + 0.000$ = 0.000$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 9 of 198

Test T1111 at 2026-06-08

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	x-ai
Model	grok-4.3

Temperature	0.5
Dataclass	ListPage

Normalized Score	41.47 %
Test time	unknown seconds

Prompt

- Answer in valid JSON.
- The page ID is given as {page_id}.

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.43	0.41	0.45	0.42	15	413	514	579
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 40.5K IT + 10.1K OT = 50.6K TT

Cost: 0.051$ + 0.025$ = 0.076$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Result 10 of 198

Test T1110 at 2026-06-08

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration

Provider	x-ai
Model	grok-4.3

Temperature	0.5
Dataclass	ListPage

Normalized Score	55.40 %
Test time	unknown seconds

Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring

Fuzzy Score	F1 micro / macro		Micro precision/recall		Tue/False Positives
n/a	0.57	0.55	0.58	0.56	15	558	408	434
			Micro Precision	Micro Recall	Instances	TP	FP	FN

Costs / Pricing

Pricing Date: n/a, n/a.

Tokens: 45.5K IT + 11.2K OT = 56.6K TT

Cost: 0.057$ + 0.028$ = 0.085$

Cite: Hindermann, Marti, Alberto, et al., (2025). RISE-UNIBAS/humanities_data_benchmark, 10.5281/zenodo.16941752

Search Test Runs

Search Results
Show compact results Refine Search New Search

Download JSON Download CSV

Test T1198 at 2026-07-02

Test T1199 at 2026-07-02

Test T1186 at 2026-07-01

Test T1185 at 2026-07-01

Test T1172 at 2026-06-29

Test T1173 at 2026-06-29

Test T1159 at 2026-06-22

Test T1160 at 2026-06-22

Test T1111 at 2026-06-08

Test T1110 at 2026-06-08

Search Test Runs

Search Results Show compact results Refine Search New Search Download Download JSON Download CSV

Test T1198 at 2026-07-02

Test T1199 at 2026-07-02

Test T1186 at 2026-07-01

Test T1185 at 2026-07-01

Test T1172 at 2026-06-29

Test T1173 at 2026-06-29

Test T1159 at 2026-06-22

Test T1160 at 2026-06-22

Test T1111 at 2026-06-08

Test T1110 at 2026-06-08

Search Results
Show compact results Refine Search New Search

Download JSON Download CSV