RISE Humanities Data Benchmark

Introduction

This leaderboard presents a comparative overview of model performance across all benchmarks.
It brings together results from all used models and providers.

Benchmark Difficulty Ranking shows which tasks are most and least challenging on average.
Provider and Model Performance sections compare accuracy and consistency across different AI providers and individual models.
Cost Effectiveness and Model Speed visualizations balance performance against efficiency, highlighting practical trade-offs between quality, speed, and cost.

Together, these graphs offer an evidence-based snapshot of how current large language models perform on data-intensive tasks in the humanities.

Leaderboard

Introduction

Table of Contents

Which Benchmark performs best

Which Provider performs best

Which Model performs best

Cost Effectiveness

Model Speed