A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.
A test run includes:
Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | alibaba |
| Model | qwen3.5-35b-a3b |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 83.80 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.84 | 0.84 | 0.83 | 0.86 | 263 | 2068 | 415 | 347 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 470.7K IT + 48.0K OT = 518.7K TT | Cost: 0.118$ + 0.096$ = 0.214$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | alibaba |
| Model | qwen3.5-397b-a17b |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 88.25 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.89 | 0.88 | 0.86 | 0.92 | 263 | 2215 | 357 | 200 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 470.7K IT + 47.6K OT = 518.3K TT | Cost: 0.282$ + 0.171$ = 0.454$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | alibaba |
| Model | qwen3.5-122b-a10b |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 85.29 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.86 | 0.85 | 0.84 | 0.88 | 263 | 2131 | 408 | 284 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 470.7K IT + 48.0K OT = 518.7K TT | Cost: 0.188$ + 0.154$ = 0.342$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | alibaba |
| Model | qwen3.5-27b |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 86.85 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.88 | 0.87 | 0.87 | 0.88 | 263 | 2137 | 329 | 278 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 470.7K IT + 46.9K OT = 517.6K TT | Cost: 0.141$ + 0.112$ = 0.254$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | alibaba |
| Model | qwen3.5-flash-2026-02-23 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 38.61 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.53 | 0.39 | 0.82 | 0.40 | 263 | 954 | 207 | 1461 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 220.0K IT + 22.2K OT = 242.2K TT | Cost: 0.022$ + 0.009$ = 0.031$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | alibaba |
| Model | qwen3.5-plus |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 88.46 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.89 | 0.88 | 0.86 | 0.92 | 263 | 2221 | 354 | 194 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 470.7K IT + 47.7K OT = 518.4K TT | Cost: 0.188$ + 0.114$ = 0.303$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | openai |
| Model | gpt-5.3-codex |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 84.71 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.86 | 0.85 | 0.83 | 0.88 | 263 | 2122 | 424 | 293 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 498.1K IT + 42.1K OT = 540.1K TT | Cost: 0.872$ + 0.589$ = 1.461$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | x-ai |
| Model | grok-4.20-0309-reasoning |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 85.87 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.86 | 0.86 | 0.85 | 0.88 | 263 | 2115 | 363 | 300 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 342.4K IT + 41.2K OT = 383.6K TT | Cost: 0.685$ + 0.247$ = 0.932$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | openai |
| Model | gpt-5.3-codex |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 84.19 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.85 | 0.84 | 0.83 | 0.87 | 263 | 2105 | 433 | 310 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 498.1K IT + 42.2K OT = 540.3K TT | Cost: 0.872$ + 0.591$ = 1.462$ |
{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}
| Provider | anthropic |
| Model | claude-opus-4-6 |
| Temperature | 0.0 |
| Dataclass | Document |
| Normalized Score | 84.63 % |
| Test time | unknown seconds |
Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:
```json
{
"type": {
"type": "Dissertation or thesis" OR "Reference"
},
"author": {
"last_name": "string",
"first_name": "string"
},
"publication": {
"title": "string",
"year": integer,
"place": "string or empty string",
"pages": "string or empty string",
"publisher": "string or empty string",
"format": "string or empty string"
},
"library_reference": {
"shelfmark": "string or empty string",
"subjects": "string or empty string"
}
}
```
EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".
2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.
3. **Publication**:
- title: The main title of the work
- year: Publication year as integer
- place: Publication place
- pages: Page count (remove " S." suffix if present)
- publisher: Publishing house/institution
- format: Usually "8°", "8'", or "4°" - single value only
4. **Library Reference**:
- shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
- subjects: Subject classifications or keywords
5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.
6. **Ignore**: Disregard any information that doesn't fit into these categories.
Return ONLY the JSON object, no additional text or explanation.
no valid result
| Fuzzy Score | F1 micro / macro | Micro precision/recall | Tue/False Positives | |||||
| n/a | 0.85 | 0.85 | 0.82 | 0.89 | 263 | 2155 | 486 | 260 |
| Micro Precision | Micro Recall | Instances | TP | FP | FN | |||
| Pricing Date: n/a, n/a. | Tokens: 706.2K IT + 57.4K OT = 763.6K TT | Cost: 3.531$ + 1.435$ = 4.966$ |