A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.
A test run includes:
Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | alibaba |
| Model | qwen3.5-plus |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 49.27 % |
| Test time | unknown seconds |
The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.
About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.
About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".
About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.
{
"entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
"company_name": "The name of the company or person",
"location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
]
}
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.51 | 0.49 | 0.52 | 0.50 | 15 | 497 | 456 | 495 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 46.5K IT + 15.4K OT = 61.9K TT | Cost: 0.019$ + 0.037$ = 0.056$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | alibaba |
| Model | qwen3.5-plus |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 45.27 % |
| Test time | unknown seconds |
- Answer in valid JSON.
- The page ID is given as {page_id}.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.45 | 0.45 | 0.45 | 0.45 | 15 | 444 | 533 | 548 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 40.9K IT + 14.4K OT = 55.4K TT | Cost: 0.016$ + 0.035$ = 0.051$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openai |
| Model | gpt-5.3-codex |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 50.53 % |
| Test time | unknown seconds |
The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.
About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.
About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".
About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.
{
"entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
"company_name": "The name of the company or person",
"location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
]
}
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.50 | 0.51 | 0.50 | 0.50 | 15 | 492 | 500 | 500 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 35.0K IT + 13.9K OT = 48.9K TT | Cost: 0.061$ + 0.195$ = 0.256$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openai |
| Model | gpt-5.3-codex |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 46.93 % |
| Test time | unknown seconds |
- Answer in valid JSON.
- The page ID is given as {page_id}.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.48 | 0.47 | 0.48 | 0.48 | 15 | 474 | 519 | 518 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 29.9K IT + 8.9K OT = 38.8K TT | Cost: 0.052$ + 0.125$ = 0.177$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | x-ai |
| Model | grok-4.20-0309-reasoning |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 30.60 % |
| Test time | unknown seconds |
The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.
About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.
About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".
About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.
{
"entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
"company_name": "The name of the company or person",
"location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
]
}
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.27 | 0.31 | 0.28 | 0.26 | 15 | 257 | 661 | 735 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 12.1K IT + 12.2K OT = 24.3K TT | Cost: 0.024$ + 0.073$ = 0.098$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | x-ai |
| Model | grok-4.20-0309-reasoning |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 22.40 % |
| Test time | unknown seconds |
- Answer in valid JSON.
- The page ID is given as {page_id}.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.18 | 0.22 | 0.20 | 0.17 | 15 | 168 | 666 | 824 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 7.1K IT + 10.2K OT = 17.2K TT | Cost: 0.014$ + 0.061$ = 0.075$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openai |
| Model | gpt-5.3-codex |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 47.40 % |
| Test time | unknown seconds |
- Answer in valid JSON.
- The page ID is given as {page_id}.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.48 | 0.47 | 0.48 | 0.48 | 15 | 475 | 515 | 517 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 29.9K IT + 8.8K OT = 38.7K TT | Cost: 0.052$ + 0.124$ = 0.176$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openai |
| Model | gpt-5.3-codex |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 51.40 % |
| Test time | unknown seconds |
The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.
About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.
About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".
About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.
{
"entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
"company_name": "The name of the company or person",
"location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
]
}
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.50 | 0.51 | 0.50 | 0.50 | 15 | 495 | 495 | 497 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 35.0K IT + 12.7K OT = 47.7K TT | Cost: 0.061$ + 0.178$ = 0.239$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | genai |
| Model | gemini-3.1-pro-preview |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 38.80 % |
| Test time | unknown seconds |
- Answer in valid JSON.
- The page ID is given as {page_id}.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.41 | 0.39 | 0.42 | 0.40 | 15 | 392 | 532 | 600 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 16.7K IT + 144.0K OT = 160.7K TT | Cost: 0.033$ + 1.728$ = 1.761$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | anthropic |
| Model | claude-opus-4-6 |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 52.80 % |
| Test time | unknown seconds |
The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.
About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.
About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".
About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.
{
"entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
"company_name": "The name of the company or person",
"location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
]
}
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.53 | 0.53 | 0.53 | 0.53 | 15 | 523 | 464 | 469 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 41.6K IT + 13.8K OT = 55.4K TT | Cost: 0.208$ + 0.346$ = 0.554$ |