A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.
A test run includes:
Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | mistral |
| Model | mistral-large-2411 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 0.00 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.00 | 0.00 | 0.00 | 0.00 | 263 | 0 | 0 | 2415 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 340.8K IT + 41.0K OT = 381.9K TT | Cost: 0.682$ + 0.246$ = 0.928$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | mistral |
| Model | mistral-small-2506 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 0.00 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.00 | 0.00 | 0.00 | 0.00 | 263 | 0 | 0 | 2415 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 314.5K IT + 40.7K OT = 355.2K TT | Cost: 0.031$ + 0.012$ = 0.044$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | genai |
| Model | gemini-2.0-flash-lite |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 81.01 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.82 | 0.81 | 0.84 | 0.80 | 263 | 1931 | 369 | 484 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 676.7K IT + 44.3K OT = 721.0K TT | Cost: 0.051$ + 0.013$ = 0.064$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | genai |
| Model | gemini-2.5-flash |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 86.37 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.87 | 0.86 | 0.85 | 0.89 | 263 | 2149 | 379 | 266 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 187.3K IT + 48.1K OT = 235.4K TT | Cost: 0.056$ + 0.120$ = 0.176$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | mistral |
| Model | pixtral-large-2411 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 0.00 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.00 | 0.00 | 0.00 | 0.00 | 263 | 0 | 8 | 2415 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 340.8K IT + 57.3K OT = 398.1K TT | Cost: 0.682$ + 0.344$ = 1.025$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | mistral |
| Model | mistral-medium-2508 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 0.28 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.01 | 0.00 | 0.09 | 0.00 | 263 | 7 | 67 | 2408 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 314.5K IT + 41.2K OT = 355.7K TT | Cost: 0.126$ + 0.082$ = 0.208$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | anthropic |
| Model | claude-sonnet-4-20250514 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 84.02 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.85 | 0.84 | 0.84 | 0.86 | 263 | 2088 | 408 | 327 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 634.2K IT + 54.5K OT = 688.6K TT | Cost: 1.903$ + 0.817$ = 2.720$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | openai |
| Model | gpt-4.1-2025-04-14 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 84.20 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.85 | 0.84 | 0.85 | 0.84 | 263 | 2033 | 356 | 382 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 457.8K IT + 361.0K OT = 818.9K TT | Cost: 0.916$ + 2.888$ = 3.804$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | anthropic |
| Model | claude-opus-4-5-20251101 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 85.65 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.86 | 0.86 | 0.83 | 0.90 | 263 | 2175 | 449 | 240 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 706.0K IT + 56.0K OT = 762.0K TT | Cost: 3.530$ + 1.401$ = 4.931$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | genai |
| Model | gemini-2.5-flash-lite |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 68.34 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.69 | 0.68 | 0.73 | 0.66 | 263 | 1598 | 590 | 817 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 187.3K IT + 175.3K OT = 362.5K TT | Cost: 0.019$ + 0.070$ = 0.089$ |