RISE Humanities Data Benchmark, 0.5.0-pre1

Search Test Runs

 

A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.

A test run includes:

  • Prompt and role definition – what the model was asked to do and from what perspective (e.g. “as a historian”).
  • Model configuration – provider, model version, temperature, and other generation parameters.
  • Results – the model’s actual response and its evaluation (scores such as F1 or accuracy).
  • Usage and cost data – token counts and calculated API costs.
  • Metadata – information like the test date, benchmark name, and person who executed it.

Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.

Search Results

Your search for Benchmark 'company_lists__true' with Search Hidden 'False' returned 170 results, showing page 3 of 17.
Result 21 of 170

Test T0689 at 2026-03-16

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration
Providergenai
Modelgemini-3.1-pro-preview
  
Temperature0.5
DataclassListPage
  
Normalized Score55.47 %
Test timeunknown seconds
Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.56 0.55 0.56 0.55 15 550 433 442
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 22.3K IT + 18.0K OT = 40.4K TTCost: 0.045$0.217$0.261$
Result 22 of 170

Test T0649 at 2026-03-16

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration
Provideranthropic
Modelclaude-sonnet-4-6
  
Temperature0.5
DataclassListPage
  
Normalized Score43.67 %
Test timeunknown seconds
Prompt

- Answer in valid JSON.
- The page ID is given as {page_id}.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.42 0.44 0.40 0.43 15 431 640 561
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 35.8K IT + 13.9K OT = 49.7K TTCost: 0.107$0.209$0.316$
Result 23 of 170

Test T0648 at 2026-03-16

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration
Provideranthropic
Modelclaude-sonnet-4-6
  
Temperature0.5
DataclassListPage
  
Normalized Score48.73 %
Test timeunknown seconds
Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.48 0.49 0.48 0.47 15 471 520 521
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 41.6K IT + 15.9K OT = 57.5K TTCost: 0.125$0.238$0.363$
Result 24 of 170

Test T0636 at 2026-03-16

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration
Provideranthropic
Modelclaude-opus-4-6
  
Temperature0.5
DataclassListPage
  
Normalized Score52.80 %
Test timeunknown seconds
Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.53 0.53 0.53 0.53 15 523 464 469
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 41.6K IT + 13.8K OT = 55.4K TTCost: 0.208$0.346$0.554$
Result 25 of 170

Test T0637 at 2026-03-16

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration
Provideranthropic
Modelclaude-opus-4-6
  
Temperature0.5
DataclassListPage
  
Normalized Score48.40 %
Test timeunknown seconds
Prompt

- Answer in valid JSON.
- The page ID is given as {page_id}.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.49 0.48 0.47 0.51 15 502 572 490
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 35.8K IT + 13.8K OT = 49.7K TTCost: 0.179$0.346$0.525$
Result 26 of 170

Test T0678 at 2026-03-16

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration
Providergenai
Modelgemini-3.1-flash-lite-preview
  
Temperature0.5
DataclassListPage
  
Normalized Score42.00 %
Test timeunknown seconds
Prompt

- Answer in valid JSON.
- The page ID is given as {page_id}.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.44 0.42 0.45 0.44 15 432 537 560
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 16.7K IT + 14.5K OT = 31.2K TTCost: 0.004$0.022$0.026$
Result 27 of 170

Test T0661 at 2026-03-16

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration
Provideropenai
Modelgpt-5.4-2026-03-05
  
Temperature0.5
DataclassListPage
  
Normalized Score47.13 %
Test timeunknown seconds
Prompt

- Answer in valid JSON.
- The page ID is given as {page_id}.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.48 0.47 0.48 0.48 15 474 519 518
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 43.2K IT + 9.1K OT = 52.4K TTCost: 0.108$0.137$0.245$
Result 28 of 170

Test T0660 at 2026-03-16

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration
Provideropenai
Modelgpt-5.4-2026-03-05
  
Temperature0.5
DataclassListPage
  
Normalized Score54.93 %
Test timeunknown seconds
Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.56 0.55 0.56 0.56 15 553 436 439
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 48.3K IT + 9.0K OT = 57.3K TTCost: 0.121$0.135$0.256$
Result 29 of 170

Test T0497 at 2026-02-10

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration
Provideropenai
Modelgpt-5.2-2025-12-11
  
Temperature0.5
DataclassListPage
  
Normalized Score41.80 %
Test timeunknown seconds
Prompt

- Answer in valid JSON.
- The page ID is given as {page_id}.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.42 0.42 0.42 0.42 15 421 574 571
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 29.0K IT + 10.3K OT = 39.4K TTCost: 0.051$0.145$0.196$
Result 30 of 170

Test T0496 at 2026-02-10

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration
Provideropenai
Modelgpt-5.2-2025-12-11
  
Temperature0.5
DataclassListPage
  
Normalized Score35.53 %
Test timeunknown seconds
Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.32 0.36 0.33 0.31 15 304 609 688
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 34.1K IT + 9.6K OT = 43.8K TTCost: 0.060$0.135$0.194$