RISE Humanities Data Benchmark, 0.5.2-pre1

Search Test Runs

 

A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.

A test run includes:

  • Prompt and role definition – what the model was asked to do and from what perspective (e.g. “as a historian”).
  • Model configuration – provider, model version, temperature, and other generation parameters.
  • Results – the model’s actual response and its evaluation (scores such as F1 or accuracy).
  • Usage and cost data – token counts and calculated API costs.
  • Metadata – information like the test date, benchmark name, and person who executed it.

Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.

Search Results

Your search for Benchmark 'library_cards__true' with Search Hidden 'False' returned 134 results, showing page 1 of 14.
Result 1 of 134

Test T1107 at 2026-06-08

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Providerx-ai
Modelgrok-4.3
  
Temperature0.0
DataclassDocument
  
Normalized Score84.56 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.86 0.85 0.85 0.86 263 2074 353 341
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 468.1K IT + 41.3K OT = 509.4K TTCost: 0.585$0.103$0.688$
Result 2 of 134

Test T1133 at 2026-06-05

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideropenrouter
Modelstepfun/step-3.7-flash-20260528
  
Temperature0.0
DataclassDocument
  
Normalized Score83.05 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.83 0.83 0.82 0.85 263 2051 455 364
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 221.7K IT + 702.9K OT = 924.6K TTCost: 0.044$0.808$0.853$
Result 3 of 134

Test T1120 at 2026-06-05

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideropenrouter
Modelmeta-llama/llama-4-scout-17b-16e-instruct
  
Temperature0.0
DataclassDocument
  
Normalized Score71.88 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.73 0.72 0.71 0.75 263 1806 754 609
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 401.7K IT + 45.0K OT = 446.7K TTCost: 0.032$0.014$0.046$
Result 4 of 134

Test T0159 at 2026-06-04

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Providermistral
Modelpixtral-large-2411
  
Temperature0.0
DataclassDocument
  
Normalized Score76.59 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.80 0.77 0.80 0.79 263 1915 471 500
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 488.3K IT + 42.2K OT = 530.5K TTCost: 0.977$0.253$1.230$
Result 5 of 134

Test T0179 at 2026-06-04

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Providermistral
Modelmistral-medium-2508
  
Temperature0.0
DataclassDocument
  
Normalized Score77.57 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.79 0.78 0.81 0.77 263 1867 447 548
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 506.4K IT + 41.1K OT = 547.4K TTCost: 0.203$0.082$0.285$
Result 6 of 134

Test T0559 at 2026-06-04

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Providermistral
Modelministral-8b-2512
  
Temperature0.0
DataclassDocument
  
Normalized Score85.10 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.86 0.85 0.85 0.87 263 2093 376 322
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 524.2K IT + 42.7K OT = 566.9K TTCost: 0.079$0.006$0.085$
Result 7 of 134

Test T0537 at 2026-06-04

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Providermistral
Modelmistral-large-2512
  
Temperature0.0
DataclassDocument
  
Normalized Score17.76 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.29 0.18 0.79 0.18 263 435 117 1980
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 115.0K IT + 9.9K OT = 124.9K TTCost: 0.057$0.015$0.072$
Result 8 of 134

Test T0440 at 2026-06-04

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Providermistral
Modelmistral-small-2506
  
Temperature0.0
DataclassDocument
  
Normalized Score83.38 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.86 0.83 0.87 0.84 263 2030 300 385
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 504.4K IT + 40.2K OT = 544.6K TTCost: 0.050$0.012$0.062$
Result 9 of 134

Test T0548 at 2026-06-04

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Providermistral
Modelministral-14b-2512
  
Temperature0.0
DataclassDocument
  
Normalized Score83.56 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.84 0.84 0.82 0.86 263 2081 445 334
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 524.2K IT + 43.5K OT = 567.7K TTCost: 0.105$0.009$0.114$
Result 10 of 134

Test T1094 at 2026-06-04

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Providermistral
Modelmistral-medium-3.5
  
Temperature0.0
DataclassDocument
  
Normalized Score82.31 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.83 0.82 0.80 0.85 263 2060 501 355
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 524.2K IT + 45.0K OT = 569.2K TTCost: 0.786$0.337$1.124$