A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.
A test run includes:
Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | openai |
| Model | qwen/qwen3-vl-30b-a3b-instruct |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 53.63 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.61 | 0.54 | 0.66 | 0.57 | 263 | 1372 | 693 | 1043 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 345.3K IT + 43.7K OT = 389.0K TT | Cost: 0.000$ + 0.000$ = 0.025$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | openai |
| Model | gpt-4o-2024-08-06 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 84.97 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.86 | 0.85 | 0.86 | 0.85 | 263 | 2059 | 332 | 356 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 462.1K IT + 30.8K OT = 492.8K TT | Cost: 1.155$ + 0.308$ = 1.463$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | mistral |
| Model | mistral-medium-2508 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 0.28 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.01 | 0.00 | 0.09 | 0.00 | 263 | 7 | 67 | 2408 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 314.5K IT + 41.2K OT = 355.7K TT | Cost: 0.126$ + 0.082$ = 0.208$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | mistral |
| Model | mistral-medium-2505 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 0.17 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.00 | 0.00 | 0.10 | 0.00 | 263 | 4 | 36 | 2411 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 314.5K IT + 41.2K OT = 355.8K TT | Cost: 0.126$ + 0.082$ = 0.208$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | genai |
| Model | gemini-2.0-flash-lite |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 81.01 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.82 | 0.81 | 0.84 | 0.80 | 263 | 1931 | 369 | 484 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 676.7K IT + 44.3K OT = 721.0K TT | Cost: 0.051$ + 0.013$ = 0.064$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | genai |
| Model | gemini-2.0-flash |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 80.51 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.82 | 0.81 | 0.82 | 0.81 | 263 | 1959 | 417 | 456 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 676.7K IT + 46.4K OT = 723.1K TT | Cost: 0.068$ + 0.019$ = 0.086$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | genai |
| Model | gemini-2.5-flash |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 86.37 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.87 | 0.86 | 0.85 | 0.89 | 263 | 2149 | 379 | 266 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 187.3K IT + 48.1K OT = 235.4K TT | Cost: 0.056$ + 0.120$ = 0.176$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | anthropic |
| Model | claude-sonnet-4-5-20250929 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 84.70 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.86 | 0.85 | 0.84 | 0.87 | 263 | 2112 | 395 | 303 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 706.0K IT + 56.2K OT = 762.1K TT | Cost: 2.118$ + 0.842$ = 2.960$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | mistral |
| Model | mistral-small-2506 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 0.00 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.00 | 0.00 | 0.00 | 0.00 | 263 | 0 | 0 | 2415 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 314.5K IT + 40.7K OT = 355.2K TT | Cost: 0.031$ + 0.012$ = 0.044$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | genai |
| Model | gemini-2.5-flash-lite-preview-09-2025 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 64.85 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.66 | 0.65 | 0.69 | 0.63 | 263 | 1533 | 677 | 882 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 187.3K IT + 503.0K OT = 690.2K TT | Cost: 0.019$ + 0.201$ = 0.220$ |