A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.
A test run includes:
Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | x-ai |
| Model | grok-4.3 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 84.56 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.86 | 0.85 | 0.85 | 0.86 | 263 | 2074 | 353 | 341 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 468.1K IT + 41.3K OT = 509.4K TT | Cost: 0.585$ + 0.103$ = 0.688$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | stepfun/step-3.7-flash-20260528 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 83.05 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.83 | 0.83 | 0.82 | 0.85 | 263 | 2051 | 455 | 364 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 221.7K IT + 702.9K OT = 924.6K TT | Cost: 0.044$ + 0.808$ = 0.853$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | meta-llama/llama-4-scout-17b-16e-instruct |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 71.88 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.73 | 0.72 | 0.71 | 0.75 | 263 | 1806 | 754 | 609 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 401.7K IT + 45.0K OT = 446.7K TT | Cost: 0.032$ + 0.014$ = 0.046$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | mistral |
| Model | pixtral-large-2411 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 76.59 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.80 | 0.77 | 0.80 | 0.79 | 263 | 1915 | 471 | 500 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 488.3K IT + 42.2K OT = 530.5K TT | Cost: 0.977$ + 0.253$ = 1.230$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | mistral |
| Model | mistral-medium-2508 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 77.57 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.79 | 0.78 | 0.81 | 0.77 | 263 | 1867 | 447 | 548 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 506.4K IT + 41.1K OT = 547.4K TT | Cost: 0.203$ + 0.082$ = 0.285$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | mistral |
| Model | ministral-8b-2512 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 85.10 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.86 | 0.85 | 0.85 | 0.87 | 263 | 2093 | 376 | 322 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 524.2K IT + 42.7K OT = 566.9K TT | Cost: 0.079$ + 0.006$ = 0.085$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | mistral |
| Model | mistral-large-2512 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 17.76 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.29 | 0.18 | 0.79 | 0.18 | 263 | 435 | 117 | 1980 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 115.0K IT + 9.9K OT = 124.9K TT | Cost: 0.057$ + 0.015$ = 0.072$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | mistral |
| Model | mistral-small-2506 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 83.38 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.86 | 0.83 | 0.87 | 0.84 | 263 | 2030 | 300 | 385 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 504.4K IT + 40.2K OT = 544.6K TT | Cost: 0.050$ + 0.012$ = 0.062$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | mistral |
| Model | ministral-14b-2512 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 83.56 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.84 | 0.84 | 0.82 | 0.86 | 263 | 2081 | 445 | 334 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 524.2K IT + 43.5K OT = 567.7K TT | Cost: 0.105$ + 0.009$ = 0.114$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | mistral |
| Model | mistral-medium-3.5 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 82.31 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.83 | 0.82 | 0.80 | 0.85 | 263 | 2060 | 501 | 355 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 524.2K IT + 45.0K OT = 569.2K TT | Cost: 0.786$ + 0.337$ = 1.124$ |