A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.
A test run includes:
Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | scicore |
| Model | GLM-4.5V-FP8 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 50.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.46 | 0.50 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: 5 months ago, 2025-10-17. | Tokens: 70.8K IT + 2.3K OT = 73.1K TT | Cost: 0.000$ + 0.000$ = 0.000$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | scicore |
| Model | GLM-4.5V-FP8 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 48.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.50 | 0.48 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: 5 months ago, 2025-10-17. | Tokens: 112.5K IT + 3.8K OT = 116.3K TT | Cost: 0.000$ + 0.000$ = 0.000$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | scicore |
| Model | GLM-4.5V-FP8 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 46.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.57 | 0.46 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: 5 months ago, 2025-10-17. | Tokens: 41.6K IT + 1.5K OT = 43.2K TT | Cost: 0.000$ + 0.000$ = 0.000$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | openai |
| Model | o3 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 60.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.63 | 0.60 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: 6 months ago, 2025-10-01. | Tokens: 94.0K IT + 39.7K OT = 133.6K TT | Cost: 0.188$ + 0.317$ = 0.505$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | mistral |
| Model | mistral-large-latest |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 46.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.43 | 0.46 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: 6 months ago, 2025-10-01. | Tokens: 80.6K IT + 2.4K OT = 83.0K TT | Cost: 0.161$ + 0.015$ = 0.176$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | mistral |
| Model | mistral-large-latest |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 42.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.45 | 0.42 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: 6 months ago, 2025-10-01. | Tokens: 129.4K IT + 4.2K OT = 133.6K TT | Cost: 0.259$ + 0.025$ = 0.284$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | genai |
| Model | gemini-2.5-flash-lite |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 49.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.44 | 0.49 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: 6 months ago, 2025-10-01. | Tokens: 26.4K IT + 3.5K OT = 29.9K TT | Cost: 0.003$ + 0.001$ = 0.004$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | anthropic |
| Model | claude-sonnet-4-5-20250929 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 42.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.38 | 0.42 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: 6 months ago, 2025-10-01. | Tokens: 99.6K IT + 3.4K OT = 103.0K TT | Cost: 0.299$ + 0.051$ = 0.349$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | mistral |
| Model | mistral-medium-2508 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 41.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.51 | 0.41 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: 6 months ago, 2025-10-01. | Tokens: 48.8K IT + 1.8K OT = 50.6K TT | Cost: 0.020$ + 0.004$ = 0.023$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | mistral |
| Model | mistral-medium-2508 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 45.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.42 | 0.45 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: 6 months ago, 2025-10-01. | Tokens: 80.6K IT + 2.4K OT = 83.0K TT | Cost: 0.032$ + 0.005$ = 0.037$ |