A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.
A test run includes:
Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openai |
| Model | gpt-4o-2024-08-06 |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 42.93 % |
| Test time | unknown seconds |
The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.
About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.
About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".
About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.
{
"entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
"company_name": "The name of the company or person",
"location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
]
}
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.43 | 0.43 | 0.44 | 0.42 | 15 | 415 | 531 | 577 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 23.9K IT + 8.6K OT = 32.5K TT | Cost: 0.060$ + 0.086$ = 0.146$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openai |
| Model | gpt-5-2025-08-07 |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 59.80 % |
| Test time | unknown seconds |
The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.
About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.
About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".
About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.
{
"entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
"company_name": "The name of the company or person",
"location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
]
}
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.60 | 0.60 | 0.60 | 0.60 | 15 | 591 | 396 | 401 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 20.9K IT + 60.1K OT = 81.1K TT | Cost: 0.026$ + 0.601$ = 0.627$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | genai |
| Model | gemini-3-flash-preview |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 55.07 % |
| Test time | unknown seconds |
The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.
About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.
About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".
About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.
{
"entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
"company_name": "The name of the company or person",
"location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
]
}
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.55 | 0.55 | 0.55 | 0.55 | 15 | 547 | 444 | 445 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 22.3K IT + 11.9K OT = 34.2K TT | Cost: 0.011$ + 0.036$ = 0.047$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | qwen/qwen3-vl-8b-thinking |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 22.40 % |
| Test time | unknown seconds |
- Answer in valid JSON.
- The page ID is given as {page_id}.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.27 | 0.22 | 0.44 | 0.20 | 15 | 194 | 248 | 798 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 22.1K IT + 195.2K OT = 217.3K TT | Cost: 0.003$ + 0.267$ = 0.269$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openai |
| Model | gpt-5-nano-2025-08-07 |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 27.53 % |
| Test time | unknown seconds |
- Answer in valid JSON.
- The page ID is given as {page_id}.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.28 | 0.28 | 0.33 | 0.24 | 15 | 235 | 470 | 757 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 35.7K IT + 67.3K OT = 103.0K TT | Cost: 0.002$ + 0.027$ = 0.029$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | x-ai/grok-4 |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 6.07 % |
| Test time | unknown seconds |
- Answer in valid JSON.
- The page ID is given as {page_id}.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.05 | 0.06 | 0.06 | 0.04 | 15 | 42 | 698 | 950 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 16.9K IT + 101.5K OT = 118.4K TT | Cost: 0.051$ + 1.523$ = 1.573$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | qwen/qwen3-vl-8b-instruct |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 23.33 % |
| Test time | unknown seconds |
- Answer in valid JSON.
- The page ID is given as {page_id}.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.23 | 0.23 | 0.33 | 0.18 | 15 | 177 | 358 | 815 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 36.3K IT + 7.7K OT = 44.0K TT | Cost: 0.003$ + 0.004$ = 0.007$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | mistral |
| Model | ministral-14b-2512 |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 0.00 % |
| Test time | unknown seconds |
The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.
About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.
About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".
About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.
{
"entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
"company_name": "The name of the company or person",
"location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
]
}
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.00 | 0.00 | 0.00 | 0.00 | 15 | 0 | 15 | 992 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 8.2K IT + 255 OT = 8.5K TT | Cost: 0.002$ + 0.000$ = 0.002$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | meta-llama/llama-4-maverick |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 35.07 % |
| Test time | unknown seconds |
- Answer in valid JSON.
- The page ID is given as {page_id}.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.37 | 0.35 | 0.38 | 0.36 | 15 | 355 | 569 | 637 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 39.1K IT + 13.0K OT = 52.1K TT | Cost: 0.006$ + 0.008$ = 0.014$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | mistral |
| Model | magistral-small-2509 |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 0.00 % |
| Test time | unknown seconds |
- Answer in valid JSON.
- The page ID is given as {page_id}.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.00 | 0.00 | 0.00 | 0.00 | 15 | 0 | 31 | 992 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 3.0K IT + 1.7K OT = 4.7K TT | Cost: 0.001$ + 0.001$ = 0.002$ |