A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.
A test run includes:
Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | genai |
| Model | gemini-2.5-flash-lite-preview-09-2025 |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 37.93 % |
| Test time | unknown seconds |
- Answer in valid JSON.
- The page ID is given as {page_id}.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.38 | 0.38 | 0.37 | 0.39 | 15 | 385 | 664 | 607 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: 5 months ago, 2025-10-28. | Tokens: 4.2K IT + 16.9K OT = 21.1K TT | Cost: 0.000$ + 0.007$ = 0.007$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openai |
| Model | gpt-4.1 |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 29.53 % |
| Test time | unknown seconds |
- Answer in valid JSON.
- The page ID is given as {page_id}.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.33 | 0.30 | 0.35 | 0.31 | 15 | 305 | 578 | 687 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: 5 months ago, 2025-10-28. | Tokens: 13.7K IT + 7.6K OT = 21.4K TT | Cost: 0.027$ + 0.061$ = 0.088$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | qwen/qwen3-vl-8b-instruct |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 25.27 % |
| Test time | unknown seconds |
The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.
About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.
About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".
About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.
{
"entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
"company_name": "The name of the company or person",
"location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
]
}
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.26 | 0.25 | 0.31 | 0.22 | 15 | 218 | 493 | 774 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: 5 months ago, 2025-10-28. | Tokens: 20.6K IT + 10.4K OT = 31.1K TT | Cost: 0.002$ + 0.005$ = 0.007$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | qwen/qwen3-vl-30b-a3b-instruct |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 25.60 % |
| Test time | unknown seconds |
- Answer in valid JSON.
- The page ID is given as {page_id}.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.24 | 0.26 | 0.24 | 0.24 | 15 | 242 | 754 | 750 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: 5 months ago, 2025-10-28. | Tokens: 11.6K IT + 13.5K OT = 25.1K TT | Cost: 0.002$ + 0.009$ = 0.012$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openai |
| Model | gpt-5-mini |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 45.20 % |
| Test time | unknown seconds |
The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.
About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.
About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".
About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.
{
"entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
"company_name": "The name of the company or person",
"location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
]
}
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.45 | 0.45 | 0.45 | 0.45 | 15 | 445 | 547 | 547 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: 5 months ago, 2025-10-28. | Tokens: 20.0K IT + 40.1K OT = 60.1K TT | Cost: 0.005$ + 0.080$ = 0.085$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openai |
| Model | o3 |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 55.20 % |
| Test time | unknown seconds |
The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.
About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.
About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".
About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.
{
"entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
"company_name": "The name of the company or person",
"location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
]
}
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.57 | 0.55 | 0.58 | 0.56 | 15 | 556 | 409 | 436 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: 5 months ago, 2025-10-28. | Tokens: 17.4K IT + 29.7K OT = 47.1K TT | Cost: 0.035$ + 0.238$ = 0.273$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openai |
| Model | gpt-4.1 |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 39.60 % |
| Test time | unknown seconds |
The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.
About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.
About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".
About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.
{
"entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
"company_name": "The name of the company or person",
"location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
]
}
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.40 | 0.40 | 0.41 | 0.40 | 15 | 393 | 573 | 599 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: 5 months ago, 2025-10-28. | Tokens: 18.8K IT + 9.2K OT = 28.0K TT | Cost: 0.038$ + 0.073$ = 0.111$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | genai |
| Model | gemini-2.0-flash-lite |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 46.60 % |
| Test time | unknown seconds |
The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.
About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.
About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".
About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.
{
"entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
"company_name": "The name of the company or person",
"location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
]
}
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.45 | 0.47 | 0.45 | 0.45 | 15 | 447 | 536 | 545 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: 5 months ago, 2025-10-28. | Tokens: 33.5K IT + 16.5K OT = 50.0K TT | Cost: 0.003$ + 0.005$ = 0.007$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | anthropic |
| Model | claude-sonnet-4-5-20250929 |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 39.00 % |
| Test time | unknown seconds |
- Answer in valid JSON.
- The page ID is given as {page_id}.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.39 | 0.39 | 0.38 | 0.39 | 15 | 388 | 627 | 604 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: 5 months ago, 2025-10-28. | Tokens: 26.3K IT + 16.7K OT = 43.0K TT | Cost: 0.079$ + 0.250$ = 0.329$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | anthropic |
| Model | claude-opus-4-20250514 |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 49.33 % |
| Test time | unknown seconds |
The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.
About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.
About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".
About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.
{
"entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
"company_name": "The name of the company or person",
"location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
]
}
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.50 | 0.49 | 0.50 | 0.50 | 15 | 495 | 502 | 497 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: 5 months ago, 2025-10-28. | Tokens: 28.0K IT + 17.0K OT = 45.0K TT | Cost: 0.421$ + 1.275$ = 1.695$ |