RISE Humanities Data Benchmark, 0.5.2-pre1

Search Test Runs

 

A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.

A test run includes:

  • Prompt and role definition – what the model was asked to do and from what perspective (e.g. “as a historian”).
  • Model configuration – provider, model version, temperature, and other generation parameters.
  • Results – the model’s actual response and its evaluation (scores such as F1 or accuracy).
  • Usage and cost data – token counts and calculated API costs.
  • Metadata – information like the test date, benchmark name, and person who executed it.

Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.

Search Results

Your search for Benchmark 'company_lists__true' with Search Hidden 'False' returned 228 results, showing page 18 of 23.
Result 171 of 228

Test T0377 at 2025-10-28

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration
Provideranthropic
Modelclaude-opus-4-1-20250805
  
Temperature0.5
DataclassListPage
  
Normalized Score45.20 %
Test timeunknown seconds
Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.45 0.45 0.45 0.45 15 445 550 547
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: 7 months ago2025-10-28Tokens: 28.0K IT + 17.1K OT = 45.1K TTCost: 0.421$1.279$1.699$
Result 172 of 228

Test T0389 at 2025-10-28

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration
Providermistral
Modelmistral-large-latest
  
Temperature0.5
DataclassListPage
  
Normalized Score43.73 %
Test timeunknown seconds
Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.44 0.44 0.44 0.44 15 434 556 558
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: 7 months ago2025-10-28Tokens: 22.6K IT + 14.2K OT = 36.8K TTCost: 0.045$0.085$0.130$
Result 173 of 228

Test T0373 at 2025-10-28

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration
Provideranthropic
Modelclaude-opus-4-20250514
  
Temperature0.5
DataclassListPage
  
Normalized Score49.33 %
Test timeunknown seconds
Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.50 0.49 0.50 0.50 15 495 502 497
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: 7 months ago2025-10-28Tokens: 28.0K IT + 17.0K OT = 45.0K TTCost: 0.421$1.275$1.695$
Result 174 of 228

Test T0352 at 2025-10-28

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration
Provideropenai
Modelgpt-5-nano
  
Temperature0.5
DataclassListPage
  
Normalized Score26.13 %
Test timeunknown seconds
Prompt

- Answer in valid JSON.
- The page ID is given as {page_id}.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.27 0.26 0.32 0.24 15 235 503 757
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: 7 months ago2025-10-28Tokens: 18.0K IT + 66.3K OT = 84.4K TTCost: 0.001$0.027$0.027$
Result 175 of 228

Test T0392 at 2025-10-28

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration
Providerscicore
ModelGLM-4.5V-FP8
  
Temperature0.5
DataclassListPage
  
Normalized Score0.00 %
Test timeunknown seconds
Prompt

- Answer in valid JSON.
- The page ID is given as {page_id}.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.00 0.00 0.00 0.00 15 0 15 992
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: 7 months ago2025-10-28Tokens: 0 IT + 0 OT = 0 TTCost: 0.000$0.000$0.000$
Result 176 of 228

Test T0364 at 2025-10-28

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration
Providergenai
Modelgemini-2.5-flash-lite
  
Temperature0.5
DataclassListPage
  
Normalized Score30.60 %
Test timeunknown seconds
Prompt

- Answer in valid JSON.
- The page ID is given as {page_id}.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.27 0.31 0.26 0.29 15 284 793 708
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: 7 months ago2025-10-28Tokens: 4.2K IT + 16.9K OT = 21.1K TTCost: 0.000$0.007$0.007$
Result 177 of 228

Test T0379 at 2025-10-28

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration
Provideranthropic
Modelclaude-sonnet-4-5-20250929
  
Temperature0.5
DataclassListPage
  
Normalized Score41.20 %
Test timeunknown seconds
Prompt

The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.

About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.

About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".

About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.

{
  "entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
  "company_name": "The name of the company or person",
  "location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
  ]
}

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.41 0.41 0.41 0.41 15 410 578 582
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: 7 months ago2025-10-28Tokens: 32.1K IT + 18.1K OT = 50.2K TTCost: 0.096$0.272$0.368$
Result 178 of 228

Test T0382 at 2025-10-28

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration
Providermistral
Modelpixtral-large-latest
  
Temperature0.5
DataclassListPage
  
Normalized Score30.40 %
Test timeunknown seconds
Prompt

- Answer in valid JSON.
- The page ID is given as {page_id}.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.32 0.30 0.31 0.32 15 315 688 677
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: 7 months ago2025-10-28Tokens: 45.6K IT + 21.3K OT = 66.8K TTCost: 0.091$0.128$0.219$
Result 179 of 228

Test T0386 at 2025-10-28

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration
Providermistral
Modelmistral-medium-2505
  
Temperature0.5
DataclassListPage
  
Normalized Score38.93 %
Test timeunknown seconds
Prompt

- Answer in valid JSON.
- The page ID is given as {page_id}.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.36 0.39 0.38 0.34 15 336 537 656
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: 7 months ago2025-10-28Tokens: 16.2K IT + 11.8K OT = 27.9K TTCost: 0.006$0.024$0.030$
Result 180 of 228

Test T0378 at 2025-10-28

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}

Configuration
Provideranthropic
Modelclaude-opus-4-1-20250805
  
Temperature0.5
DataclassListPage
  
Normalized Score45.87 %
Test timeunknown seconds
Prompt

- Answer in valid JSON.
- The page ID is given as {page_id}.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.45 0.46 0.44 0.47 15 467 601 525
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: 7 months ago2025-10-28Tokens: 22.3K IT + 16.9K OT = 39.2K TTCost: 0.334$1.268$1.602$