A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.
A test run includes:
Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.
{'document-type': ['index-card'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | meta-llama/llama-4-maverick |
| Temperature | 0.5 |
| Dataclass | Card |
| Normalized Score | 74.13 % |
| Test time | unknown seconds |
You are a meticulous archivist extracting data from an index card image. Analyze the provided image and extract the following information. Return the data ONLY as a valid JSON object.
- "company": The primary company name, usually in the top-left. Exclude the location.
- "location": The city or town, often following the company name.
- "b_id": The identifier code, usually in the top-right, starting with "B.".
- "date": Any stamped dates on the card in YYYY-MM-DD format. If no date is present, use an empty string.
- "information": A list of text blocks from the main body of the card. Each block should be a separate string in the list. Maintain line breaks with \\n.
Here is the required JSON format:
{
"company": {"transcription": ""},
"location": {"transcription": ""},
"b_id": {"transcription": ""},
"date": "",
"information": [
{"transcription": ""}
]
}
If you cannot find a value for a field, leave its transcription value as an empty string. Do not add any explanatory text outside of the JSON object.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| 0.74 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 88.5K IT + 3.9K OT = 92.4K TT | Cost: 0.013$ + 0.002$ = 0.016$ |
{'document-type': ['index-card'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openai |
| Model | gpt-5.1-2025-11-13 |
| Temperature | 0.5 |
| Dataclass | Card |
| Normalized Score | 88.66 % |
| Test time | unknown seconds |
You are a meticulous archivist extracting data from an index card image. Analyze the provided image and extract the following information. Return the data ONLY as a valid JSON object.
- "company": The primary company name, usually in the top-left. Exclude the location.
- "location": The city or town, often following the company name.
- "b_id": The identifier code, usually in the top-right, starting with "B.".
- "date": Any stamped dates on the card in YYYY-MM-DD format. If no date is present, use an empty string.
- "information": A list of text blocks from the main body of the card. Each block should be a separate string in the list. Maintain line breaks with \\n.
Here is the required JSON format:
{
"company": {"transcription": ""},
"location": {"transcription": ""},
"b_id": {"transcription": ""},
"date": "",
"information": [
{"transcription": ""}
]
}
If you cannot find a value for a field, leave its transcription value as an empty string. Do not add any explanatory text outside of the JSON object.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| 0.89 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 45.6K IT + 3.5K OT = 49.1K TT | Cost: 0.057$ + 0.035$ = 0.092$ |
{'document-type': ['index-card'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | x-ai/grok-4 |
| Temperature | 0.5 |
| Dataclass | Card |
| Normalized Score | 90.25 % |
| Test time | unknown seconds |
You are a meticulous archivist extracting data from an index card image. Analyze the provided image and extract the following information. Return the data ONLY as a valid JSON object.
- "company": The primary company name, usually in the top-left. Exclude the location.
- "location": The city or town, often following the company name.
- "b_id": The identifier code, usually in the top-right, starting with "B.".
- "date": Any stamped dates on the card in YYYY-MM-DD format. If no date is present, use an empty string.
- "information": A list of text blocks from the main body of the card. Each block should be a separate string in the list. Maintain line breaks with \\n.
Here is the required JSON format:
{
"company": {"transcription": ""},
"location": {"transcription": ""},
"b_id": {"transcription": ""},
"date": "",
"information": [
{"transcription": ""}
]
}
If you cannot find a value for a field, leave its transcription value as an empty string. Do not add any explanatory text outside of the JSON object.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| 0.90 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 48.4K IT + 63.7K OT = 112.1K TT | Cost: 0.145$ + 0.955$ = 1.101$ |
{'document-type': ['index-card'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | anthropic |
| Model | claude-sonnet-4-20250514 |
| Temperature | 0.5 |
| Dataclass | Card |
| Normalized Score | 84.34 % |
| Test time | unknown seconds |
You are a meticulous archivist extracting data from an index card image. Analyze the provided image and extract the following information. Return the data ONLY as a valid JSON object.
- "company": The primary company name, usually in the top-left. Exclude the location.
- "location": The city or town, often following the company name.
- "b_id": The identifier code, usually in the top-right, starting with "B.".
- "date": Any stamped dates on the card in YYYY-MM-DD format. If no date is present, use an empty string.
- "information": A list of text blocks from the main body of the card. Each block should be a separate string in the list. Maintain line breaks with \\n.
Here is the required JSON format:
{
"company": {"transcription": ""},
"location": {"transcription": ""},
"b_id": {"transcription": ""},
"date": "",
"information": [
{"transcription": ""}
]
}
If you cannot find a value for a field, leave its transcription value as an empty string. Do not add any explanatory text outside of the JSON object.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| 0.84 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 84.2K IT + 6.4K OT = 90.6K TT | Cost: 0.253$ + 0.096$ = 0.348$ |
{'document-type': ['index-card'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | anthropic |
| Model | claude-opus-4-20250514 |
| Temperature | 0.5 |
| Dataclass | Card |
| Normalized Score | 89.37 % |
| Test time | unknown seconds |
You are a meticulous archivist extracting data from an index card image. Analyze the provided image and extract the following information. Return the data ONLY as a valid JSON object.
- "company": The primary company name, usually in the top-left. Exclude the location.
- "location": The city or town, often following the company name.
- "b_id": The identifier code, usually in the top-right, starting with "B.".
- "date": Any stamped dates on the card in YYYY-MM-DD format. If no date is present, use an empty string.
- "information": A list of text blocks from the main body of the card. Each block should be a separate string in the list. Maintain line breaks with \\n.
Here is the required JSON format:
{
"company": {"transcription": ""},
"location": {"transcription": ""},
"b_id": {"transcription": ""},
"date": "",
"information": [
{"transcription": ""}
]
}
If you cannot find a value for a field, leave its transcription value as an empty string. Do not add any explanatory text outside of the JSON object.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| 0.89 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 84.2K IT + 6.2K OT = 90.4K TT | Cost: 1.264$ + 0.462$ = 1.726$ |
{'document-type': ['index-card'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | anthropic |
| Model | claude-haiku-4-5-20251001 |
| Temperature | 0.5 |
| Dataclass | Card |
| Normalized Score | 82.87 % |
| Test time | unknown seconds |
You are a meticulous archivist extracting data from an index card image. Analyze the provided image and extract the following information. Return the data ONLY as a valid JSON object.
- "company": The primary company name, usually in the top-left. Exclude the location.
- "location": The city or town, often following the company name.
- "b_id": The identifier code, usually in the top-right, starting with "B.".
- "date": Any stamped dates on the card in YYYY-MM-DD format. If no date is present, use an empty string.
- "information": A list of text blocks from the main body of the card. Each block should be a separate string in the list. Maintain line breaks with \\n.
Here is the required JSON format:
{
"company": {"transcription": ""},
"location": {"transcription": ""},
"b_id": {"transcription": ""},
"date": "",
"information": [
{"transcription": ""}
]
}
If you cannot find a value for a field, leave its transcription value as an empty string. Do not add any explanatory text outside of the JSON object.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| 0.83 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 93.2K IT + 6.4K OT = 99.7K TT | Cost: 0.093$ + 0.032$ = 0.125$ |
{'document-type': ['index-card'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | genai |
| Model | gemini-2.5-flash-lite |
| Temperature | 0.5 |
| Dataclass | Card |
| Normalized Score | 87.22 % |
| Test time | unknown seconds |
You are a meticulous archivist extracting data from an index card image. Analyze the provided image and extract the following information. Return the data ONLY as a valid JSON object.
- "company": The primary company name, usually in the top-left. Exclude the location.
- "location": The city or town, often following the company name.
- "b_id": The identifier code, usually in the top-right, starting with "B.".
- "date": Any stamped dates on the card in YYYY-MM-DD format. If no date is present, use an empty string.
- "information": A list of text blocks from the main body of the card. Each block should be a separate string in the list. Maintain line breaks with \\n.
Here is the required JSON format:
{
"company": {"transcription": ""},
"location": {"transcription": ""},
"b_id": {"transcription": ""},
"date": "",
"information": [
{"transcription": ""}
]
}
If you cannot find a value for a field, leave its transcription value as an empty string. Do not add any explanatory text outside of the JSON object.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| 0.87 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 17.5K IT + 5.6K OT = 23.1K TT | Cost: 0.002$ + 0.002$ = 0.004$ |
{'document-type': ['index-card'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | mistral |
| Model | mistral-large-2512 |
| Temperature | 0.5 |
| Dataclass | Card |
| Normalized Score | 14.75 % |
| Test time | unknown seconds |
You are a meticulous archivist extracting data from an index card image. Analyze the provided image and extract the following information. Return the data ONLY as a valid JSON object.
- "company": The primary company name, usually in the top-left. Exclude the location.
- "location": The city or town, often following the company name.
- "b_id": The identifier code, usually in the top-right, starting with "B.".
- "date": Any stamped dates on the card in YYYY-MM-DD format. If no date is present, use an empty string.
- "information": A list of text blocks from the main body of the card. Each block should be a separate string in the list. Maintain line breaks with \\n.
Here is the required JSON format:
{
"company": {"transcription": ""},
"location": {"transcription": ""},
"b_id": {"transcription": ""},
"date": "",
"information": [
{"transcription": ""}
]
}
If you cannot find a value for a field, leave its transcription value as an empty string. Do not add any explanatory text outside of the JSON object.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| 0.15 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 19.7K IT + 1.6K OT = 21.4K TT | Cost: 0.010$ + 0.002$ = 0.012$ |
{'document-type': ['index-card'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | genai |
| Model | gemini-2.0-flash-lite |
| Temperature | 0.5 |
| Dataclass | Card |
| Normalized Score | 91.13 % |
| Test time | unknown seconds |
You are a meticulous archivist extracting data from an index card image. Analyze the provided image and extract the following information. Return the data ONLY as a valid JSON object.
- "company": The primary company name, usually in the top-left. Exclude the location.
- "location": The city or town, often following the company name.
- "b_id": The identifier code, usually in the top-right, starting with "B.".
- "date": Any stamped dates on the card in YYYY-MM-DD format. If no date is present, use an empty string.
- "information": A list of text blocks from the main body of the card. Each block should be a separate string in the list. Maintain line breaks with \\n.
Here is the required JSON format:
{
"company": {"transcription": ""},
"location": {"transcription": ""},
"b_id": {"transcription": ""},
"date": "",
"information": [
{"transcription": ""}
]
}
If you cannot find a value for a field, leave its transcription value as an empty string. Do not add any explanatory text outside of the JSON object.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| 0.91 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 70.5K IT + 26.4K OT = 96.8K TT | Cost: 0.005$ + 0.008$ = 0.013$ |
{'document-type': ['index-card'], 'writing': ['typed', 'handwritten'], 'century': [20], 'language': ['de'], 'entry-type': ['company'], 'task': ['information-extraction']}
| Provider | mistral |
| Model | magistral-medium-2509 |
| Temperature | 0.5 |
| Dataclass | Card |
| Normalized Score | 14.64 % |
| Test time | unknown seconds |
You are a meticulous archivist extracting data from an index card image. Analyze the provided image and extract the following information. Return the data ONLY as a valid JSON object.
- "company": The primary company name, usually in the top-left. Exclude the location.
- "location": The city or town, often following the company name.
- "b_id": The identifier code, usually in the top-right, starting with "B.".
- "date": Any stamped dates on the card in YYYY-MM-DD format. If no date is present, use an empty string.
- "information": A list of text blocks from the main body of the card. Each block should be a separate string in the list. Maintain line breaks with \\n.
Here is the required JSON format:
{
"company": {"transcription": ""},
"location": {"transcription": ""},
"b_id": {"transcription": ""},
"date": "",
"information": [
{"transcription": ""}
]
}
If you cannot find a value for a field, leave its transcription value as an empty string. Do not add any explanatory text outside of the JSON object.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| 0.15 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 19.7K IT + 2.8K OT = 22.6K TT | Cost: 0.039$ + 0.014$ = 0.054$ |