A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.
A test run includes:
Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | qwen/qwen3.5-35b-a3b-20260224 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 59.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.53 | 0.59 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 178.9K IT + 44.8K OT = 223.6K TT | Cost: 0.029$ + 0.058$ = 0.087$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | qwen/qwen3.5-flash-20260224 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 48.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.54 | 0.48 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 103.5K IT + 2.3K OT = 105.8K TT | Cost: 0.007$ + 0.001$ = 0.007$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | qwen/qwen3.5-397b-a17b-20260216 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 59.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.54 | 0.59 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 189.3K IT + 144.1K OT = 333.4K TT | Cost: 0.074$ + 0.337$ = 0.411$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | qwen/qwen3.5-plus-20260216 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 59.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.61 | 0.59 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 278.7K IT + 49.5K OT = 328.2K TT | Cost: 0.072$ + 0.077$ = 0.150$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | qwen/qwen3.5-plus-20260216 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 60.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.70 | 0.60 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 103.5K IT + 19.1K OT = 122.6K TT | Cost: 0.027$ + 0.030$ = 0.057$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | qwen/qwen3.5-122b-a10b-20260224 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 43.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.51 | 0.43 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 103.5K IT + 2.0K OT = 105.5K TT | Cost: 0.027$ + 0.004$ = 0.031$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | anthropic |
| Model | claude-opus-4-7 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 61.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.58 | 0.61 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 296.6K IT + 4.1K OT = 300.7K TT | Cost: 1.483$ + 0.103$ = 1.586$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | qwen/qwen3.5-27b-20260224 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 53.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.54 | 0.53 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 281.8K IT + 6.1K OT = 287.9K TT | Cost: 0.055$ + 0.009$ = 0.064$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | google/gemma-4-26b-a4b-it-20260403 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 50.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.46 | 0.50 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 28.5K IT + 2.6K OT = 31.1K TT | Cost: 0.002$ + 0.001$ = 0.003$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | qwen/qwen3.5-27b-20260224 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 59.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.54 | 0.59 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 178.3K IT + 3.5K OT = 181.8K TT | Cost: 0.035$ + 0.005$ = 0.040$ |