A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.
A test run includes:
Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | genai |
| Model | gemini-2.5-flash-lite |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 44.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.44 | 0.44 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 44.4K IT + 6.3K OT = 50.7K TT | Cost: 0.004$ + 0.003$ = 0.007$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | genai |
| Model | gemini-2.5-flash |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 50.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.57 | 0.50 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 17.4K IT + 1.6K OT = 19.0K TT | Cost: 0.005$ + 0.004$ = 0.009$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | openai |
| Model | gpt-4o-mini-2024-07-18 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 49.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.50 | 0.49 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 3.6M IT + 4.0K OT = 3.6M TT | Cost: 0.546$ + 0.002$ = 0.548$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | genai |
| Model | gemini-2.5-pro |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 55.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.65 | 0.55 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 17.4K IT + 2.3K OT = 19.7K TT | Cost: 0.022$ + 0.023$ = 0.045$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | genai |
| Model | gemini-2.5-pro |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 54.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.54 | 0.54 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 44.4K IT + 5.5K OT = 50.0K TT | Cost: 0.056$ + 0.055$ = 0.111$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | anthropic |
| Model | claude-opus-4-1-20250805 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 48.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.44 | 0.48 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 128.6K IT + 3.5K OT = 132.1K TT | Cost: 1.929$ + 0.264$ = 2.193$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | genai |
| Model | gemini-2.5-pro |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 56.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.50 | 0.56 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 27.0K IT + 3.2K OT = 30.2K TT | Cost: 0.034$ + 0.032$ = 0.066$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | mistral |
| Model | mistral-medium-2508 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 0.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.00 | 0.00 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 18.7K IT + 2.1K OT = 20.8K TT | Cost: 0.007$ + 0.004$ = 0.012$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | anthropic |
| Model | claude-opus-4-20250514 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 44.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.51 | 0.44 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 78.4K IT + 2.7K OT = 81.1K TT | Cost: 1.176$ + 0.203$ = 1.379$ |
{'document-type': ['letter'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'layout': ['prose'], 'entry-type': ['person', 'location'], 'task': ['information-extraction']}
| Provider | openai |
| Model | gpt-4o-mini-2024-07-18 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 53.00 % |
| Test time | unknown seconds |
IDENTITY and PURPOSE:
You are presented with a series of images constituting a historical letter. Your task is to extract the
values of the following keys from the letter and return them in a JSON file where the values
corresponding to each key should be stored as a list, even if there is only a single value for a
key:
- letter_title: Title of the letter.
- sender_persons: Name(s) of the person(s) who wrote the letter.
- send_date: The exact or approximate date the letter was written.
- receiver_persons: Name(s) of the person(s) who received the letter.
Take a deep breath and think step by step about how to best accomplish this goal. Map out all the claims
and implications on a virtual whiteboard in your mind. Do not use OCR. Use the ISO format YYYY-MM-DD for
dates. If a piece of information is not included in the letter, set the value for the corresponding key
to "null". Do not return anything except the JSON file.
EXAMPLE:
{
"letter_title": ["Petition for Environmental Protection"],
"send_date": ["1993-03-12"],
"sender_persons": ["Lisa Simpson"],
"receiver_persons": ["Mayor Joe Quimby", "Seymour Skinner"],
}
OUTPUT:
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.49 | 0.53 | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 2.3M IT + 2.3K OT = 2.3M TT | Cost: 0.350$ + 0.001$ = 0.352$ |