RISE Humanities Data Benchmark, 0.5.0-pre1

Search Test Runs

 

A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.

A test run includes:

  • Prompt and role definition – what the model was asked to do and from what perspective (e.g. “as a historian”).
  • Model configuration – provider, model version, temperature, and other generation parameters.
  • Results – the model’s actual response and its evaluation (scores such as F1 or accuracy).
  • Usage and cost data – token counts and calculated API costs.
  • Metadata – information like the test date, benchmark name, and person who executed it.

Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.

Search Results

Your search for Benchmark 'library_cards__true' with Search Hidden 'False' returned 105 results, showing page 6 of 11.
Result 51 of 105

Test T0515 at 2026-01-24

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideranthropic
Modelclaude-opus-4-5-20251101
  
Temperature0.0
DataclassDocument
  
Normalized Score85.65 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.86 0.86 0.83 0.90 263 2175 449 240
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 706.0K IT + 56.0K OT = 762.0K TTCost: 3.530$1.401$4.931$
Result 52 of 105

Test T0179 at 2026-01-24

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Providermistral
Modelmistral-medium-2508
  
Temperature0.0
DataclassDocument
  
Normalized Score0.28 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.01 0.00 0.09 0.00 263 7 67 2408
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 314.5K IT + 41.2K OT = 355.7K TTCost: 0.126$0.082$0.208$
Result 53 of 105

Test T0151 at 2026-01-24

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Providergenai
Modelgemini-2.0-flash
  
Temperature0.0
DataclassDocument
  
Normalized Score80.51 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.82 0.81 0.82 0.81 263 1959 417 456
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 676.7K IT + 46.4K OT = 723.1K TTCost: 0.068$0.019$0.086$
Result 54 of 105

Test T0537 at 2026-01-24

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Providermistral
Modelmistral-large-2512
  
Temperature0.0
DataclassDocument
  
Normalized Score0.00 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.00 0.00 0.00 0.00 263 0 0 2415
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 314.5K IT + 24.7K OT = 339.3K TTCost: 0.157$0.037$0.194$
Result 55 of 105

Test T0440 at 2025-11-25

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Providermistral
Modelmistral-small-2506
  
Temperature0.0
DataclassDocument
  
Normalized Score0.00 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.00 0.00 0.00 0.00 263 0 0 2415
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 254.6K IT + 41.3K OT = 295.9K TTCost: 0.025$0.012$0.038$
Result 56 of 105

Test T0408 at 2025-11-25

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideropenai
Modelgpt-5.1-2025-11-13
  
Temperature0.0
DataclassDocument
  
Normalized Score81.82 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.82 0.82 0.82 0.82 263 1979 422 436
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 365.2K IT + 33.2K OT = 398.4K TTCost: 0.456$0.332$0.789$
Result 57 of 105

Test T0200 at 2025-11-24

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Providergenai
Modelgemini-2.5-flash
  
Temperature0.0
DataclassDocument
  
Normalized Score86.02 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.87 0.86 0.85 0.89 263 2141 389 274
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 187.3K IT + 38.2K OT = 225.5K TTCost: 0.056$0.096$0.152$
Result 58 of 105

Test T0252 at 2025-10-17

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideropenrouter
Modelmeta-llama/llama-4-maverick
  
Temperature0.0
DataclassDocument
  
Normalized Score62.33 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.67 0.62 0.78 0.59 263 1433 399 982
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: 5 months ago2025-10-17Tokens: 390.9K IT + 819.9K OT = 1.2M TTCost: 0.059$0.492$0.551$
Result 59 of 105

Test T0247 at 2025-10-17

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideropenrouter
Modelqwen/qwen3-vl-8b-thinking
  
Temperature0.0
DataclassDocument
  
Normalized Score5.90 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.10 0.06 0.33 0.06 263 148 305 2267
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: 5 months ago2025-10-17Tokens: 495.2K IT + 44.6K OT = 539.8K TTCost: 0.089$0.094$0.183$
Result 60 of 105

Test T0242 at 2025-10-17

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Providerscicore
ModelGLM-4.5V-FP8
  
Temperature0.0
DataclassDocument
  
Normalized Score19.71 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.26 0.20 0.56 0.17 263 407 322 2008
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: 5 months ago2025-10-17Tokens: 263.1K IT + 3.4M OT = 3.7M TTCost: 0.000$0.000$0.000$