A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.
A test run includes:
Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | qwen/qwen3.6-plus-04-02 |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 50.40 % |
| Test time | unknown seconds |
- Answer in valid JSON.
- The page ID is given as {page_id}.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.49 | 0.50 | 0.48 | 0.50 | 15 | 493 | 528 | 499 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 41.2K IT + 34.7K OT = 75.9K TT | Cost: 0.013$ + 0.068$ = 0.081$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | google/gemma-4-26b-a4b-it-20260403 |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 14.73 % |
| Test time | unknown seconds |
The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.
About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.
About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".
About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.
{
"entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
"company_name": "The name of the company or person",
"location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
]
}
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.14 | 0.15 | 0.14 | 0.15 | 15 | 144 | 919 | 848 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 10.4K IT + 19.9K OT = 30.3K TT | Cost: 0.001$ + 0.007$ = 0.008$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | qwen/qwen3.5-35b-a3b-20260224 |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 43.07 % |
| Test time | unknown seconds |
- Answer in valid JSON.
- The page ID is given as {page_id}.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.44 | 0.43 | 0.45 | 0.44 | 15 | 433 | 530 | 559 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 41.5K IT + 23.1K OT = 64.6K TT | Cost: 0.007$ + 0.030$ = 0.037$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | qwen/qwen3.5-35b-a3b-20260224 |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 54.40 % |
| Test time | unknown seconds |
The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.
About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.
About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".
About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.
{
"entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
"company_name": "The name of the company or person",
"location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
]
}
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.53 | 0.54 | 0.53 | 0.54 | 15 | 531 | 463 | 461 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 47.1K IT + 50.1K OT = 97.2K TT | Cost: 0.008$ + 0.065$ = 0.073$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | qwen/qwen3.5-flash-20260224 |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 49.80 % |
| Test time | unknown seconds |
- Answer in valid JSON.
- The page ID is given as {page_id}.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.51 | 0.50 | 0.50 | 0.52 | 15 | 511 | 514 | 481 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 41.3K IT + 14.3K OT = 55.6K TT | Cost: 0.003$ + 0.004$ = 0.006$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | qwen/qwen3.5-397b-a17b-20260216 |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 44.93 % |
| Test time | unknown seconds |
- Answer in valid JSON.
- The page ID is given as {page_id}.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.46 | 0.45 | 0.46 | 0.45 | 15 | 446 | 520 | 546 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 41.7K IT + 47.3K OT = 89.1K TT | Cost: 0.016$ + 0.111$ = 0.127$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | qwen/qwen3.5-flash-20260224 |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 54.47 % |
| Test time | unknown seconds |
The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.
About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.
About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".
About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.
{
"entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
"company_name": "The name of the company or person",
"location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
]
}
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.54 | 0.54 | 0.53 | 0.55 | 15 | 543 | 483 | 449 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 46.8K IT + 15.8K OT = 62.6K TT | Cost: 0.003$ + 0.004$ = 0.007$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | qwen/qwen3.5-122b-a10b-20260224 |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 41.67 % |
| Test time | unknown seconds |
- Answer in valid JSON.
- The page ID is given as {page_id}.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.42 | 0.42 | 0.41 | 0.42 | 15 | 421 | 594 | 571 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 41.2K IT + 15.1K OT = 56.4K TT | Cost: 0.011$ + 0.031$ = 0.042$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | alibaba |
| Model | qwen3.5-122b-a10b |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 55.47 % |
| Test time | unknown seconds |
The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.
About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.
About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".
About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.
{
"entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
"company_name": "The name of the company or person",
"location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
]
}
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.56 | 0.55 | 0.56 | 0.56 | 15 | 553 | 440 | 439 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 46.5K IT + 15.6K OT = 62.0K TT | Cost: 0.019$ + 0.050$ = 0.068$ |
{'document-type': ['book-page'], 'writing': ['printed'], 'century': [20], 'language': ['en', 'de'], 'layout': ['list'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | alibaba |
| Model | qwen3.5-27b |
| Temperature | 0.5 |
| Dataclass | ListPage |
| Normalized Score | 0.00 % |
| Test time | unknown seconds |
The image you are presented with stems from a digitized book containing lists of companies.
Your task is to extract structured information about each company listed on the page.
About the source:
- The image stems from a trade index of the British Swiss Chamber of Commerce.
- The image can show an alphabetical or a thematic list of companies.
- The companies are mostly located in Switzerland and the UK.
- The image stems from a trade index between 1925 and 1958.
- Most pages have one column but some years have two columns.
- The source itself is in English and German but the company names can be in English, German, French or Italian.
About the entries:
- Each entry describes a single company or person.
- Alphabetical entries have filling dots between the company name and the page number. Dots and page numbers are not part of the data and should be ignored.
- Alphabetical entries seldom to never have locations.
- Thematic entries often have locations.
- Thematic entries are listed under headings that describe the type of business.
- Some thematic headings are only references to other headings, e.g. "X, s. Y".
About the output:
- Answer in valid JSON. The JSON should be an array of objects with the following fields:
- The page ID is given as {page_id}.
- Do not add country information, if it is not directly written with the location.
{
"entry_id": "A unique identifier for the entry, e.g. '{page_id}-1'",
"company_name": "The name of the company or person",
"location": "The location of the company, e.g. 'Zurich' or 'London, UK'. If no location is given, set to null."
]
}
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.00 | 0.00 | 0.00 | 0.00 | 15 | 0 | 300 | 992 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 46.5K IT + 4.0K OT = 50.5K TT | Cost: 0.014$ + 0.010$ = 0.024$ |