RISE Humanities Data Benchmark, 0.5.0-pre1

Search Test Runs

 

A test run is a single execution of a benchmark test using a defined model configuration.
Each run represents how a particular large language model (LLM) — such as GPT-4, Claude-3, or Gemini — performed on a given task at a specific time, with specific settings.

A test run includes:

  • Prompt and role definition – what the model was asked to do and from what perspective (e.g. “as a historian”).
  • Model configuration – provider, model version, temperature, and other generation parameters.
  • Results – the model’s actual response and its evaluation (scores such as F1 or accuracy).
  • Usage and cost data – token counts and calculated API costs.
  • Metadata – information like the test date, benchmark name, and person who executed it.

Together, test runs make it possible to compare models, providers, and configurations across benchmarks in a transparent and reproducible way.

Search Results

Your search for Benchmark 'fraktur_adverts__true' with Search Hidden 'False' returned 113 results, showing page 4 of 12.
Result 31 of 113

Test T0207 at 2026-01-24

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [18, 19], 'language': ['de'], 'layout': ['prose'], 'script-style': ['fraktur'], 'task': ['transcription']}

Configuration
Providergenai
Modelgemini-2.5-flash-lite
  
Temperature0.0
DataclassDocument
  
Normalized Score14.70 %
Test timeunknown seconds
Prompt

## IDENTITY AND PURPOSE

You are an OCR and information extraction system trained to process historical newspaper pages printed in 18th-century German using Fraktur type. The pages contain mostly classified advertisements. Your task is to identify and extract each advertisement *exactly as printed*, including historical spellings, typographic errors, punctuation, and formatting.


## INSTRUCTIONS

- Extract **all advertisements** from the input image, one after the other, following the sequence on the page.
- Maintain the **original spelling**, capitalization, and any **typos or non-standard forms**.
- Follow these transcription rules: 
  - the long s (ſ) is transcribed as "s"
  - "/" is transcribed as ","
- Use the masthead of the newspaper only to extract the date, ignore other content.
- The layout is typically **two-column**; extract ads from both columns, including the ad number.
- Return the result as a **JSON object** in the specified format and **nothing else** (no explanations, summaries, or additional text).
- For each advertisement, include:
  - `"date"`: the publication date of the page in ISO 8061 format (YYYY-MM-DD)
  - `"tags_section"`: the heading under which the advertisement appears
  - `"text"`: the full advertisement text

## EXAMPLE OUTPUT

{
  "advertisements": [
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zum Verkauff offeriert",
      "text": "5. Ein kleines, jedoch listiges Lehrbuch der Zauberkunst, lange im Gebrauche des jungen Bartolomeus Simpson."
    },
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zum Verkauff offeriert",
      "text": "6. Ein rarer, mit Edelsteinen besetzter Saxophon-Kasten, aus dem Besitze der Jungfer Lisa Simpson."
    },
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zu Entleihen begehrt",
      "text": "7. Ein gar prachtvoller, jedoch etwas zerlesener Band mit Rezepten von Margaretha Simpsonin."
    }
  ]
}

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
0.15 n/a n/a n/a n/a n/a n/a n/a n/a
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 3.9K IT + 136.5K OT = 140.5K TTCost: 0.000$0.055$0.055$
Result 32 of 113

Test T0215 at 2026-01-24

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [18, 19], 'language': ['de'], 'layout': ['prose'], 'script-style': ['fraktur'], 'task': ['transcription']}

Configuration
Providergenai
Modelgemini-2.5-flash-lite-preview-09-2025
  
Temperature0.0
DataclassDocument
  
Normalized Score60.30 %
Test timeunknown seconds
Prompt

## IDENTITY AND PURPOSE

You are an OCR and information extraction system trained to process historical newspaper pages printed in 18th-century German using Fraktur type. The pages contain mostly classified advertisements. Your task is to identify and extract each advertisement *exactly as printed*, including historical spellings, typographic errors, punctuation, and formatting.


## INSTRUCTIONS

- Extract **all advertisements** from the input image, one after the other, following the sequence on the page.
- Maintain the **original spelling**, capitalization, and any **typos or non-standard forms**.
- Follow these transcription rules: 
  - the long s (ſ) is transcribed as "s"
  - "/" is transcribed as ","
- Use the masthead of the newspaper only to extract the date, ignore other content.
- The layout is typically **two-column**; extract ads from both columns, including the ad number.
- Return the result as a **JSON object** in the specified format and **nothing else** (no explanations, summaries, or additional text).
- For each advertisement, include:
  - `"date"`: the publication date of the page in ISO 8061 format (YYYY-MM-DD)
  - `"tags_section"`: the heading under which the advertisement appears
  - `"text"`: the full advertisement text

## EXAMPLE OUTPUT

{
  "advertisements": [
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zum Verkauff offeriert",
      "text": "5. Ein kleines, jedoch listiges Lehrbuch der Zauberkunst, lange im Gebrauche des jungen Bartolomeus Simpson."
    },
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zum Verkauff offeriert",
      "text": "6. Ein rarer, mit Edelsteinen besetzter Saxophon-Kasten, aus dem Besitze der Jungfer Lisa Simpson."
    },
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zu Entleihen begehrt",
      "text": "7. Ein gar prachtvoller, jedoch etwas zerlesener Band mit Rezepten von Margaretha Simpsonin."
    }
  ]
}

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
0.63 n/a n/a n/a n/a n/a n/a n/a n/a
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 3.9K IT + 75.3K OT = 79.2K TTCost: 0.000$0.030$0.031$
Result 33 of 113

Test T0082 at 2026-01-24

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [18, 19], 'language': ['de'], 'layout': ['prose'], 'script-style': ['fraktur'], 'task': ['transcription']}

Configuration
Provideropenai
Modelgpt-4o-mini-2024-07-18
  
Temperature0.0
DataclassDocument
  
Normalized Score34.80 %
Test timeunknown seconds
Prompt

## IDENTITY AND PURPOSE

You are an OCR and information extraction system trained to process historical newspaper pages printed in 18th-century German using Fraktur type. The pages contain mostly classified advertisements. Your task is to identify and extract each advertisement *exactly as printed*, including historical spellings, typographic errors, punctuation, and formatting.


## INSTRUCTIONS

- Extract **all advertisements** from the input image, one after the other, following the sequence on the page.
- Maintain the **original spelling**, capitalization, and any **typos or non-standard forms**.
- Follow these transcription rules: 
  - the long s (ſ) is transcribed as "s"
  - "/" is transcribed as ","
- Use the masthead of the newspaper only to extract the date, ignore other content.
- The layout is typically **two-column**; extract ads from both columns, including the ad number.
- Return the result as a **JSON object** in the specified format and **nothing else** (no explanations, summaries, or additional text).
- For each advertisement, include:
  - `"date"`: the publication date of the page in ISO 8061 format (YYYY-MM-DD)
  - `"tags_section"`: the heading under which the advertisement appears
  - `"text"`: the full advertisement text

## EXAMPLE OUTPUT

{
  "advertisements": [
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zum Verkauff offeriert",
      "text": "5. Ein kleines, jedoch listiges Lehrbuch der Zauberkunst, lange im Gebrauche des jungen Bartolomeus Simpson."
    },
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zum Verkauff offeriert",
      "text": "6. Ein rarer, mit Edelsteinen besetzter Saxophon-Kasten, aus dem Besitze der Jungfer Lisa Simpson."
    },
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zu Entleihen begehrt",
      "text": "7. Ein gar prachtvoller, jedoch etwas zerlesener Band mit Rezepten von Margaretha Simpsonin."
    }
  ]
}

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
0.42 n/a n/a n/a n/a n/a n/a n/a n/a
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 153.7K IT + 5.9K OT = 159.5K TTCost: 0.023$0.004$0.027$
Result 34 of 113

Test T0177 at 2026-01-24

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [18, 19], 'language': ['de'], 'layout': ['prose'], 'script-style': ['fraktur'], 'task': ['transcription']}

Configuration
Providermistral
Modelmistral-medium-2508
  
Temperature0.0
DataclassDocument
  
Normalized Score0.00 %
Test timeunknown seconds
Prompt

## IDENTITY AND PURPOSE

You are an OCR and information extraction system trained to process historical newspaper pages printed in 18th-century German using Fraktur type. The pages contain mostly classified advertisements. Your task is to identify and extract each advertisement *exactly as printed*, including historical spellings, typographic errors, punctuation, and formatting.


## INSTRUCTIONS

- Extract **all advertisements** from the input image, one after the other, following the sequence on the page.
- Maintain the **original spelling**, capitalization, and any **typos or non-standard forms**.
- Follow these transcription rules: 
  - the long s (ſ) is transcribed as "s"
  - "/" is transcribed as ","
- Use the masthead of the newspaper only to extract the date, ignore other content.
- The layout is typically **two-column**; extract ads from both columns, including the ad number.
- Return the result as a **JSON object** in the specified format and **nothing else** (no explanations, summaries, or additional text).
- For each advertisement, include:
  - `"date"`: the publication date of the page in ISO 8061 format (YYYY-MM-DD)
  - `"tags_section"`: the heading under which the advertisement appears
  - `"text"`: the full advertisement text

## EXAMPLE OUTPUT

{
  "advertisements": [
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zum Verkauff offeriert",
      "text": "5. Ein kleines, jedoch listiges Lehrbuch der Zauberkunst, lange im Gebrauche des jungen Bartolomeus Simpson."
    },
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zum Verkauff offeriert",
      "text": "6. Ein rarer, mit Edelsteinen besetzter Saxophon-Kasten, aus dem Besitze der Jungfer Lisa Simpson."
    },
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zu Entleihen begehrt",
      "text": "7. Ein gar prachtvoller, jedoch etwas zerlesener Band mit Rezepten von Margaretha Simpsonin."
    }
  ]
}

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
0.00 n/a n/a n/a n/a n/a n/a n/a n/a
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 3.9K IT + 1.1K OT = 5.1K TTCost: 0.002$0.002$0.004$
Result 35 of 113

Test T0199 at 2026-01-24

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [18, 19], 'language': ['de'], 'layout': ['prose'], 'script-style': ['fraktur'], 'task': ['transcription']}

Configuration
Providergenai
Modelgemini-2.5-flash
  
Temperature0.0
DataclassDocument
  
Normalized Score92.20 %
Test timeunknown seconds
Prompt

## IDENTITY AND PURPOSE

You are an OCR and information extraction system trained to process historical newspaper pages printed in 18th-century German using Fraktur type. The pages contain mostly classified advertisements. Your task is to identify and extract each advertisement *exactly as printed*, including historical spellings, typographic errors, punctuation, and formatting.


## INSTRUCTIONS

- Extract **all advertisements** from the input image, one after the other, following the sequence on the page.
- Maintain the **original spelling**, capitalization, and any **typos or non-standard forms**.
- Follow these transcription rules: 
  - the long s (ſ) is transcribed as "s"
  - "/" is transcribed as ","
- Use the masthead of the newspaper only to extract the date, ignore other content.
- The layout is typically **two-column**; extract ads from both columns, including the ad number.
- Return the result as a **JSON object** in the specified format and **nothing else** (no explanations, summaries, or additional text).
- For each advertisement, include:
  - `"date"`: the publication date of the page in ISO 8061 format (YYYY-MM-DD)
  - `"tags_section"`: the heading under which the advertisement appears
  - `"text"`: the full advertisement text

## EXAMPLE OUTPUT

{
  "advertisements": [
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zum Verkauff offeriert",
      "text": "5. Ein kleines, jedoch listiges Lehrbuch der Zauberkunst, lange im Gebrauche des jungen Bartolomeus Simpson."
    },
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zum Verkauff offeriert",
      "text": "6. Ein rarer, mit Edelsteinen besetzter Saxophon-Kasten, aus dem Besitze der Jungfer Lisa Simpson."
    },
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zu Entleihen begehrt",
      "text": "7. Ein gar prachtvoller, jedoch etwas zerlesener Band mit Rezepten von Margaretha Simpsonin."
    }
  ]
}

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
0.94 n/a n/a n/a n/a n/a n/a n/a n/a
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 3.9K IT + 10.4K OT = 14.3K TTCost: 0.001$0.026$0.027$
Result 36 of 113

Test T0191 at 2026-01-24

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [18, 19], 'language': ['de'], 'layout': ['prose'], 'script-style': ['fraktur'], 'task': ['transcription']}

Configuration
Providermistral
Modelmistral-large-2411
  
Temperature0.0
DataclassDocument
  
Normalized Score0.70 %
Test timeunknown seconds
Prompt

## IDENTITY AND PURPOSE

You are an OCR and information extraction system trained to process historical newspaper pages printed in 18th-century German using Fraktur type. The pages contain mostly classified advertisements. Your task is to identify and extract each advertisement *exactly as printed*, including historical spellings, typographic errors, punctuation, and formatting.


## INSTRUCTIONS

- Extract **all advertisements** from the input image, one after the other, following the sequence on the page.
- Maintain the **original spelling**, capitalization, and any **typos or non-standard forms**.
- Follow these transcription rules: 
  - the long s (ſ) is transcribed as "s"
  - "/" is transcribed as ","
- Use the masthead of the newspaper only to extract the date, ignore other content.
- The layout is typically **two-column**; extract ads from both columns, including the ad number.
- Return the result as a **JSON object** in the specified format and **nothing else** (no explanations, summaries, or additional text).
- For each advertisement, include:
  - `"date"`: the publication date of the page in ISO 8061 format (YYYY-MM-DD)
  - `"tags_section"`: the heading under which the advertisement appears
  - `"text"`: the full advertisement text

## EXAMPLE OUTPUT

{
  "advertisements": [
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zum Verkauff offeriert",
      "text": "5. Ein kleines, jedoch listiges Lehrbuch der Zauberkunst, lange im Gebrauche des jungen Bartolomeus Simpson."
    },
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zum Verkauff offeriert",
      "text": "6. Ein rarer, mit Edelsteinen besetzter Saxophon-Kasten, aus dem Besitze der Jungfer Lisa Simpson."
    },
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zu Entleihen begehrt",
      "text": "7. Ein gar prachtvoller, jedoch etwas zerlesener Band mit Rezepten von Margaretha Simpsonin."
    }
  ]
}

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
0.01 n/a n/a n/a n/a n/a n/a n/a n/a
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 4.5K IT + 1.1K OT = 5.6K TTCost: 0.009$0.007$0.016$
Result 37 of 113

Test T0123 at 2026-01-24

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [18, 19], 'language': ['de'], 'layout': ['prose'], 'script-style': ['fraktur'], 'task': ['transcription']}

Configuration
Provideranthropic
Modelclaude-opus-4-1-20250805
  
Temperature0.0
DataclassDocument
  
Normalized Score18.50 %
Test timeunknown seconds
Prompt

## IDENTITY AND PURPOSE

You are an OCR and information extraction system trained to process historical newspaper pages printed in 18th-century German using Fraktur type. The pages contain mostly classified advertisements. Your task is to identify and extract each advertisement *exactly as printed*, including historical spellings, typographic errors, punctuation, and formatting.


## INSTRUCTIONS

- Extract **all advertisements** from the input image, one after the other, following the sequence on the page.
- Maintain the **original spelling**, capitalization, and any **typos or non-standard forms**.
- Follow these transcription rules: 
  - the long s (ſ) is transcribed as "s"
  - "/" is transcribed as ","
- Use the masthead of the newspaper only to extract the date, ignore other content.
- The layout is typically **two-column**; extract ads from both columns, including the ad number.
- Return the result as a **JSON object** in the specified format and **nothing else** (no explanations, summaries, or additional text).
- For each advertisement, include:
  - `"date"`: the publication date of the page in ISO 8061 format (YYYY-MM-DD)
  - `"tags_section"`: the heading under which the advertisement appears
  - `"text"`: the full advertisement text

## EXAMPLE OUTPUT

{
  "advertisements": [
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zum Verkauff offeriert",
      "text": "5. Ein kleines, jedoch listiges Lehrbuch der Zauberkunst, lange im Gebrauche des jungen Bartolomeus Simpson."
    },
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zum Verkauff offeriert",
      "text": "6. Ein rarer, mit Edelsteinen besetzter Saxophon-Kasten, aus dem Besitze der Jungfer Lisa Simpson."
    },
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zu Entleihen begehrt",
      "text": "7. Ein gar prachtvoller, jedoch etwas zerlesener Band mit Rezepten von Margaretha Simpsonin."
    }
  ]
}

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
0.19 n/a n/a n/a n/a n/a n/a n/a n/a
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 5.5K IT + 6.9K OT = 12.4K TTCost: 0.082$0.517$0.600$
Result 38 of 113

Test T0536 at 2026-01-24

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [18, 19], 'language': ['de'], 'layout': ['prose'], 'script-style': ['fraktur'], 'task': ['transcription']}

Configuration
Providermistral
Modelmistral-large-2512
  
Temperature0.0
DataclassDocument
  
Normalized Score0.00 %
Test timeunknown seconds
Prompt

## IDENTITY AND PURPOSE

You are an OCR and information extraction system trained to process historical newspaper pages printed in 18th-century German using Fraktur type. The pages contain mostly classified advertisements. Your task is to identify and extract each advertisement *exactly as printed*, including historical spellings, typographic errors, punctuation, and formatting.


## INSTRUCTIONS

- Extract **all advertisements** from the input image, one after the other, following the sequence on the page.
- Maintain the **original spelling**, capitalization, and any **typos or non-standard forms**.
- Follow these transcription rules: 
  - the long s (ſ) is transcribed as "s"
  - "/" is transcribed as ","
- Use the masthead of the newspaper only to extract the date, ignore other content.
- The layout is typically **two-column**; extract ads from both columns, including the ad number.
- Return the result as a **JSON object** in the specified format and **nothing else** (no explanations, summaries, or additional text).
- For each advertisement, include:
  - `"date"`: the publication date of the page in ISO 8061 format (YYYY-MM-DD)
  - `"tags_section"`: the heading under which the advertisement appears
  - `"text"`: the full advertisement text

## EXAMPLE OUTPUT

{
  "advertisements": [
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zum Verkauff offeriert",
      "text": "5. Ein kleines, jedoch listiges Lehrbuch der Zauberkunst, lange im Gebrauche des jungen Bartolomeus Simpson."
    },
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zum Verkauff offeriert",
      "text": "6. Ein rarer, mit Edelsteinen besetzter Saxophon-Kasten, aus dem Besitze der Jungfer Lisa Simpson."
    },
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zu Entleihen begehrt",
      "text": "7. Ein gar prachtvoller, jedoch etwas zerlesener Band mit Rezepten von Margaretha Simpsonin."
    }
  ]
}

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
0.00 n/a n/a n/a n/a n/a n/a n/a n/a
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 3.9K IT + 53 OT = 4.0K TTCost: 0.002$0.000$0.002$
Result 39 of 113

Test T0084 at 2026-01-24

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [18, 19], 'language': ['de'], 'layout': ['prose'], 'script-style': ['fraktur'], 'task': ['transcription']}

Configuration
Provideropenai
Modelgpt-4.1-mini-2025-04-14
  
Temperature0.0
DataclassDocument
  
Normalized Score95.70 %
Test timeunknown seconds
Prompt

## IDENTITY AND PURPOSE

You are an OCR and information extraction system trained to process historical newspaper pages printed in 18th-century German using Fraktur type. The pages contain mostly classified advertisements. Your task is to identify and extract each advertisement *exactly as printed*, including historical spellings, typographic errors, punctuation, and formatting.


## INSTRUCTIONS

- Extract **all advertisements** from the input image, one after the other, following the sequence on the page.
- Maintain the **original spelling**, capitalization, and any **typos or non-standard forms**.
- Follow these transcription rules: 
  - the long s (ſ) is transcribed as "s"
  - "/" is transcribed as ","
- Use the masthead of the newspaper only to extract the date, ignore other content.
- The layout is typically **two-column**; extract ads from both columns, including the ad number.
- Return the result as a **JSON object** in the specified format and **nothing else** (no explanations, summaries, or additional text).
- For each advertisement, include:
  - `"date"`: the publication date of the page in ISO 8061 format (YYYY-MM-DD)
  - `"tags_section"`: the heading under which the advertisement appears
  - `"text"`: the full advertisement text

## EXAMPLE OUTPUT

{
  "advertisements": [
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zum Verkauff offeriert",
      "text": "5. Ein kleines, jedoch listiges Lehrbuch der Zauberkunst, lange im Gebrauche des jungen Bartolomeus Simpson."
    },
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zum Verkauff offeriert",
      "text": "6. Ein rarer, mit Edelsteinen besetzter Saxophon-Kasten, aus dem Besitze der Jungfer Lisa Simpson."
    },
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zu Entleihen begehrt",
      "text": "7. Ein gar prachtvoller, jedoch etwas zerlesener Band mit Rezepten von Margaretha Simpsonin."
    }
  ]
}

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
0.97 n/a n/a n/a n/a n/a n/a n/a n/a
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 13.5K IT + 6.4K OT = 19.8K TTCost: 0.005$0.010$0.016$
Result 40 of 113

Test T0229 at 2026-01-24

{'document-type': ['book-page'], 'writing': ['printed'], 'century': [18, 19], 'language': ['de'], 'layout': ['prose'], 'script-style': ['fraktur'], 'task': ['transcription']}

Configuration
Provideranthropic
Modelclaude-sonnet-4-5-20250929
  
Temperature0.0
DataclassDocument
  
Normalized Score0.00 %
Test timeunknown seconds
Prompt

## IDENTITY AND PURPOSE

You are an OCR and information extraction system trained to process historical newspaper pages printed in 18th-century German using Fraktur type. The pages contain mostly classified advertisements. Your task is to identify and extract each advertisement *exactly as printed*, including historical spellings, typographic errors, punctuation, and formatting.


## INSTRUCTIONS

- Extract **all advertisements** from the input image, one after the other, following the sequence on the page.
- Maintain the **original spelling**, capitalization, and any **typos or non-standard forms**.
- Follow these transcription rules: 
  - the long s (ſ) is transcribed as "s"
  - "/" is transcribed as ","
- Use the masthead of the newspaper only to extract the date, ignore other content.
- The layout is typically **two-column**; extract ads from both columns, including the ad number.
- Return the result as a **JSON object** in the specified format and **nothing else** (no explanations, summaries, or additional text).
- For each advertisement, include:
  - `"date"`: the publication date of the page in ISO 8061 format (YYYY-MM-DD)
  - `"tags_section"`: the heading under which the advertisement appears
  - `"text"`: the full advertisement text

## EXAMPLE OUTPUT

{
  "advertisements": [
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zum Verkauff offeriert",
      "text": "5. Ein kleines, jedoch listiges Lehrbuch der Zauberkunst, lange im Gebrauche des jungen Bartolomeus Simpson."
    },
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zum Verkauff offeriert",
      "text": "6. Ein rarer, mit Edelsteinen besetzter Saxophon-Kasten, aus dem Besitze der Jungfer Lisa Simpson."
    },
    {
      "date": "1731-01-02",
      "tags_section": "Es werden zu Entleihen begehrt",
      "text": "7. Ein gar prachtvoller, jedoch etwas zerlesener Band mit Rezepten von Margaretha Simpsonin."
    }
  ]
}

Results

no valid result

Scoring
Fuzzy Score F1 micro / macro Micro precision/recall Tue/False Positives
0.00 n/a n/a n/a n/a n/a n/a n/a n/a
      Micro Precision Micro Recall Instances TP FP FN
Costs / Pricing
Pricing Date: n/an/aTokens: 6.0K IT + 4.8K OT = 10.8K TTCost: 0.018$0.072$0.090$