RISE Humanities Data Benchmark, 0.5.2-pre1

Search Test Runs

 

A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.

A test run includes:

  • Prompt and role definition – what the model was asked to do and from what perspective (e.g. “as a historian”).
  • Model configuration – provider, model version, temperature, and other generation parameters.
  • Results – the model’s actual response and its evaluation (scores such as F1 or accuracy).
  • Usage and cost data – token counts and calculated API costs.
  • Metadata – information like the test date, benchmark name, and person who executed it.

Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.

Search Results

Your search for Benchmark 'library_cards__true' with Search Hidden 'False' returned 134 results, showing page 9 of 14.
Result 81 of 134

Test T0537 at 2026-01-24

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Providermistral
Modelmistral-large-2512
  
Temperature0.0
DataclassDocument
  
Normalized Score0.00 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.00 0.00 0.00 0.00 263 0 0 2415
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 314.5K IT + 24.7K OT = 339.3K TTCost: 0.157$0.037$0.194$
Result 82 of 134

Test T0252 at 2026-01-24

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideropenai
Modelmeta-llama/llama-4-maverick
  
Temperature0.0
DataclassDocument
  
Normalized Score60.81 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.71 0.61 0.81 0.63 263 1519 349 896
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 482.5K IT + 176.7K OT = 659.2K TTCost: 0.000$0.000$0.079$
Result 83 of 134

Test T0161 at 2026-01-24

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideropenai
Modelgpt-4.1-mini-2025-04-14
  
Temperature0.0
DataclassDocument
  
Normalized Score46.49 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.58 0.46 0.76 0.47 263 1136 355 1279
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 531.9K IT + 72.3K OT = 604.2K TTCost: 0.213$0.116$0.328$
Result 84 of 134

Test T0440 at 2025-11-25

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Providermistral
Modelmistral-small-2506
  
Temperature0.0
DataclassDocument
  
Normalized Score0.00 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.00 0.00 0.00 0.00 263 0 0 2415
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 254.6K IT + 41.3K OT = 295.9K TTCost: 0.025$0.012$0.038$
Result 85 of 134

Test T0408 at 2025-11-25

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideropenai
Modelgpt-5.1-2025-11-13
  
Temperature0.0
DataclassDocument
  
Normalized Score81.82 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.82 0.82 0.82 0.82 263 1979 422 436
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 365.2K IT + 33.2K OT = 398.4K TTCost: 0.456$0.332$0.789$
Result 86 of 134

Test T0200 at 2025-11-24

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Providergenai
Modelgemini-2.5-flash
  
Temperature0.0
DataclassDocument
  
Normalized Score86.02 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.87 0.86 0.85 0.89 263 2141 389 274
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 187.3K IT + 38.2K OT = 225.5K TTCost: 0.056$0.096$0.152$
Result 87 of 134

Test T0252 at 2025-10-17

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideropenrouter
Modelmeta-llama/llama-4-maverick
  
Temperature0.0
DataclassDocument
  
Normalized Score62.33 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.67 0.62 0.78 0.59 263 1433 399 982
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: 7 months ago2025-10-17Tokens: 390.9K IT + 819.9K OT = 1.2M TTCost: 0.059$0.492$0.551$
Result 88 of 134

Test T0247 at 2025-10-17

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideropenrouter
Modelqwen/qwen3-vl-8b-thinking
  
Temperature0.0
DataclassDocument
  
Normalized Score5.90 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.10 0.06 0.33 0.06 263 148 305 2267
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: 7 months ago2025-10-17Tokens: 495.2K IT + 44.6K OT = 539.8K TTCost: 0.089$0.094$0.183$
Result 89 of 134

Test T0242 at 2025-10-17

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Providerscicore
ModelGLM-4.5V-FP8
  
Temperature0.0
DataclassDocument
  
Normalized Score19.71 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.26 0.20 0.56 0.17 263 407 322 2008
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: 7 months ago2025-10-17Tokens: 263.1K IT + 3.4M OT = 3.7M TTCost: 0.000$0.000$0.000$
Result 90 of 134

Test T0164 at 2025-10-03

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideropenai
Modelgpt-4o-mini
  
Temperature0.0
DataclassDocument
  
Normalized Score0.95 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.01 0.01 0.07 0.01 263 20 275 2395
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: 8 months ago2025-10-03Tokens: 131.3K IT + 727 OT = 132.0K TTCost: 0.020$0.000$0.020$