A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.
A test run includes:
Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | qwen/qwen3.5-flash-20260224 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 83.87 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.85 | 0.84 | 0.81 | 0.88 | 263 | 2136 | 499 | 279 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 476.2K IT + 50.9K OT = 527.2K TT | Cost: 0.031$ + 0.013$ = 0.044$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | anthropic |
| Model | claude-opus-4-7 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 89.17 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.90 | 0.89 | 0.88 | 0.93 | 263 | 2236 | 314 | 179 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 875.3K IT + 71.7K OT = 947.0K TT | Cost: 4.377$ + 1.792$ = 6.168$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | google/gemma-4-31b-it-20260402 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 81.17 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.82 | 0.81 | 0.78 | 0.86 | 263 | 2069 | 583 | 346 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 301.1K IT + 43.1K OT = 344.2K TT | Cost: 0.039$ + 0.016$ = 0.056$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | qwen/qwen3.5-397b-a17b-20260216 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 78.41 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.79 | 0.78 | 0.75 | 0.84 | 263 | 2021 | 690 | 394 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 302.3K IT + 845.4K OT = 1.1M TT | Cost: 0.118$ + 1.978$ = 2.096$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | qwen/qwen3.5-27b-20260224 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 86.62 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.87 | 0.87 | 0.85 | 0.90 | 263 | 2163 | 370 | 252 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 478.7K IT + 49.5K OT = 528.2K TT | Cost: 0.093$ + 0.077$ = 0.171$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | google/gemma-4-26b-a4b-it-20260403 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 79.71 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.81 | 0.80 | 0.76 | 0.86 | 263 | 2082 | 674 | 333 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 204.9K IT + 54.8K OT = 259.6K TT | Cost: 0.016$ + 0.019$ = 0.036$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | qwen/qwen3.5-122b-a10b-20260224 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 86.88 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.88 | 0.87 | 0.85 | 0.91 | 263 | 2194 | 396 | 221 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 478.5K IT + 50.2K OT = 528.7K TT | Cost: 0.124$ + 0.104$ = 0.229$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | qwen/qwen3.6-plus-04-02 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 86.65 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.88 | 0.87 | 0.87 | 0.89 | 263 | 2146 | 316 | 269 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 477.6K IT + 866.9K OT = 1.3M TT | Cost: 0.155$ + 1.690$ = 1.846$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | openrouter |
| Model | qwen/qwen3.5-9b-20260310 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 63.72 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.75 | 0.64 | 0.86 | 0.66 | 263 | 1591 | 258 | 824 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 496.0K IT + 3.1M OT = 3.6M TT | Cost: 0.050$ + 0.465$ = 0.514$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | alibaba |
| Model | qwen3.5-flash-2026-02-23 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 38.61 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.53 | 0.39 | 0.82 | 0.40 | 263 | 954 | 207 | 1461 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 220.0K IT + 22.2K OT = 242.2K TT | Cost: 0.022$ + 0.009$ = 0.031$ |