RISE Humanities Data Benchmark, 0.5.2-pre1

Search Test Runs

 

A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.

A test run includes:

  • Prompt and role definition – what the model was asked to do and from what perspective (e.g. “as a historian”).
  • Model configuration – provider, model version, temperature, and other generation parameters.
  • Results – the model’s actual response and its evaluation (scores such as F1 or accuracy).
  • Usage and cost data – token counts and calculated API costs.
  • Metadata – information like the test date, benchmark name, and person who executed it.

Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.

Search Results

Your search for Benchmark 'library_cards__true' with Search Hidden 'False' returned 134 results, showing page 3 of 14.
Result 21 of 134

Test T0975 at 2026-04-21

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideropenrouter
Modelqwen/qwen3.5-flash-20260224
  
Temperature0.0
DataclassDocument
  
Normalized Score83.87 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.85 0.84 0.81 0.88 263 2136 499 279
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 476.2K IT + 50.9K OT = 527.2K TTCost: 0.031$0.013$0.044$
Result 22 of 134

Test T1027 at 2026-04-21

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideranthropic
Modelclaude-opus-4-7
  
Temperature0.0
DataclassDocument
  
Normalized Score89.17 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.90 0.89 0.88 0.93 263 2236 314 179
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 875.3K IT + 71.7K OT = 947.0K TTCost: 4.377$1.792$6.168$
Result 23 of 134

Test T1014 at 2026-04-21

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideropenrouter
Modelgoogle/gemma-4-31b-it-20260402
  
Temperature0.0
DataclassDocument
  
Normalized Score81.17 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.82 0.81 0.78 0.86 263 2069 583 346
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 301.1K IT + 43.1K OT = 344.2K TTCost: 0.039$0.016$0.056$
Result 24 of 134

Test T0949 at 2026-04-21

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideropenrouter
Modelqwen/qwen3.5-397b-a17b-20260216
  
Temperature0.0
DataclassDocument
  
Normalized Score78.41 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.79 0.78 0.75 0.84 263 2021 690 394
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 302.3K IT + 845.4K OT = 1.1M TTCost: 0.118$1.978$2.096$
Result 25 of 134

Test T0923 at 2026-04-21

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideropenrouter
Modelqwen/qwen3.5-27b-20260224
  
Temperature0.0
DataclassDocument
  
Normalized Score86.62 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.87 0.87 0.85 0.90 263 2163 370 252
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 478.7K IT + 49.5K OT = 528.2K TTCost: 0.093$0.077$0.171$
Result 26 of 134

Test T1001 at 2026-04-21

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideropenrouter
Modelgoogle/gemma-4-26b-a4b-it-20260403
  
Temperature0.0
DataclassDocument
  
Normalized Score79.71 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.81 0.80 0.76 0.86 263 2082 674 333
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 204.9K IT + 54.8K OT = 259.6K TTCost: 0.016$0.019$0.036$
Result 27 of 134

Test T0910 at 2026-04-21

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideropenrouter
Modelqwen/qwen3.5-122b-a10b-20260224
  
Temperature0.0
DataclassDocument
  
Normalized Score86.88 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.88 0.87 0.85 0.91 263 2194 396 221
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 478.5K IT + 50.2K OT = 528.7K TTCost: 0.124$0.104$0.229$
Result 28 of 134

Test T0897 at 2026-04-21

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideropenrouter
Modelqwen/qwen3.6-plus-04-02
  
Temperature0.0
DataclassDocument
  
Normalized Score86.65 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.88 0.87 0.87 0.89 263 2146 316 269
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 477.6K IT + 866.9K OT = 1.3M TTCost: 0.155$1.690$1.846$
Result 29 of 134

Test T0988 at 2026-04-21

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideropenrouter
Modelqwen/qwen3.5-9b-20260310
  
Temperature0.0
DataclassDocument
  
Normalized Score63.72 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.75 0.64 0.86 0.66 263 1591 258 824
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 496.0K IT + 3.1M OT = 3.6M TTCost: 0.050$0.465$0.514$
Result 30 of 134

Test T0884 at 2026-03-25

{'document-type': ['index-card'], 'writing': ['typed', 'printed', 'handwritten'], 'century': [20, 19], 'language': ['de', 'fr', 'en', 'la', 'el', 'fi', 'sv', 'pl'], 'layout': ['index'], 'entry-type': ['bibliographic'], 'task': ['information-extraction']}

Configuration
Provideralibaba
Modelqwen3.5-flash-2026-02-23
  
Temperature0.0
DataclassDocument
  
Normalized Score38.61 %
Test timeunknown seconds
Prompt

Extract the bibliographic information about a historical dissertation from this index card and return it as a structured JSON object with the following exact format:

```json
{
  "type": {
    "type": "Dissertation or thesis" OR "Reference"
  },
  "author": {
    "last_name": "string",
    "first_name": "string"
  },
  "publication": {
    "title": "string",
    "year": integer,
    "place": "string or empty string",
    "pages": "string or empty string",
    "publisher": "string or empty string",
    "format": "string or empty string"
  },
  "library_reference": {
    "shelfmark": "string or empty string",
    "subjects": "string or empty string"
  }
}
```

EXTRACTION RULES:
1. **Card Type**: If a card contains the note "s." on a separate line, it is a "Reference". Otherwise, it is a "Dissertation or thesis".

2. **Author**: Extract last_name and first_name. If only one name is given, put it in last_name and leave first_name empty.

3. **Publication**:
   - title: The main title of the work
   - year: Publication year as integer
   - place: Publication place
   - pages: Page count (remove " S." suffix if present)
   - publisher: Publishing house/institution
   - format: Usually "8°", "8'", or "4°" - single value only

4. **Library Reference**:
   - shelfmark: Often begins with "Diss." or "AT", may be marked with "Standort:"
   - subjects: Subject classifications or keywords

5. **Missing Information**: Use empty string "" for missing text fields, omit year fields entirely if not present.

6. **Ignore**: Disregard any information that doesn't fit into these categories.

Return ONLY the JSON object, no additional text or explanation.

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
n/a 0.53 0.39 0.82 0.40 263 954 207 1461
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 220.0K IT + 22.2K OT = 242.2K TTCost: 0.022$0.009$0.031$