RISE Humanities Data Benchmark, 0.5.0-pre1

Search Test Runs

 

A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.

A test run includes:

  • Prompt and role definition – what the model was asked to do and from what perspective (e.g. “as a historian”).
  • Model configuration – provider, model version, temperature, and other generation parameters.
  • Results – the model’s actual response and its evaluation (scores such as F1 or accuracy).
  • Usage and cost data – token counts and calculated API costs.
  • Metadata – information like the test date, benchmark name, and person who executed it.

Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.

Search Results

Your search for Benchmark 'library_cards__true' with Search Hidden 'False' returned 105 results, showing page 3 of 11.
Result 21 of 105

Test T0559 at 2026-01-25

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Providermistral
Modelministral-8b-2512
  
Temperature0.0
DataclassDocument
  
Normalized Score0.00 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.00 0.00 0.00 0.00 263 0 0 2415
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 314.5K IT + 52.9K OT = 367.5K TTCost: 0.047$0.008$0.055$
Result 22 of 105

Test T0164 at 2026-01-25

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideropenai
Modelgpt-4o-mini-2024-07-18
  
Temperature0.0
DataclassDocument
  
Normalized Score1.51 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.02 0.02 0.54 0.01 263 27 23 2388
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 6.5M IT + 37.0K OT = 6.6M TTCost: 0.980$0.022$1.003$
Result 23 of 105

Test T0270 at 2026-01-25

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideropenrouter
Modelx-ai/grok-4
  
Temperature0.0
DataclassDocument
  
Normalized ScoreNone %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
None None None None None None None None None
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: NoneNoneTokens: None IT + None OT = None TTCost: None$None$None$
Result 24 of 105

Test T0264 at 2026-01-25

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideropenrouter
Modelqwen/qwen3-vl-8b-instruct
  
Temperature0.0
DataclassDocument
  
Normalized Score61.59 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.70 0.62 0.78 0.64 263 1536 429 879
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 439.0K IT + 42.6K OT = 481.6K TTCost: 0.035$0.021$0.056$
Result 25 of 105

Test T0570 at 2026-01-25

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Providermistral
Modelmagistral-small-2509
  
Temperature0.0
DataclassDocument
  
Normalized Score0.04 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.00 0.00 0.09 0.00 263 1 10 2414
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 314.5K IT + 158.8K OT = 473.3K TTCost: 0.157$0.079$0.237$
Result 26 of 105

Test T0162 at 2026-01-25

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideropenai
Modelgpt-4.1-nano-2025-04-14
  
Temperature0.0
DataclassDocument
  
Normalized Score64.54 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.66 0.65 0.74 0.59 263 1428 494 987
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 642.8K IT + 25.7K OT = 668.5K TTCost: 0.064$0.010$0.075$
Result 27 of 105

Test T0504 at 2026-01-25

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Providergenai
Modelgemini-3-flash-preview
  
Temperature0.0
DataclassDocument
  
Normalized Score85.62 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.87 0.86 0.83 0.91 263 2187 450 228
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 406.2K IT + 38.3K OT = 444.5K TTCost: 0.203$0.115$0.318$
Result 28 of 105

Test T0408 at 2026-01-25

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideropenai
Modelgpt-5.1-2025-11-13
  
Temperature0.0
DataclassDocument
  
Normalized Score84.16 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.85 0.84 0.85 0.84 263 2033 353 382
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 421.9K IT + 135.1K OT = 557.0K TTCost: 0.527$1.351$1.879$
Result 29 of 105

Test T0548 at 2026-01-25

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Providermistral
Modelministral-14b-2512
  
Temperature0.0
DataclassDocument
  
Normalized Score0.61 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.02 0.01 0.02 0.02 263 43 2075 2372
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 314.5K IT + 161.6K OT = 476.1K TTCost: 0.063$0.032$0.095$
Result 30 of 105

Test T0258 at 2026-01-24

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideropenai
Modelqwen/qwen3-vl-30b-a3b-instruct
  
Temperature0.0
DataclassDocument
  
Normalized Score53.63 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.61 0.54 0.66 0.57 263 1372 693 1043
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 345.3K IT + 43.7K OT = 389.0K TTCost: 0.000$0.000$0.025$