This leaderboard presents a comparative overview of model performance across all benchmarks.
It brings together results from all used models and providers.
- Benchmark Difficulty Ranking shows which tasks are most and least challenging on average.
- Provider and Model Performance sections compare accuracy and consistency across different AI providers and individual models.
- Cost Effectiveness and Model Speed visualizations balance performance against efficiency, highlighting practical trade-offs between quality, speed, and cost.
Together, these graphs offer an evidence-based snapshot of how current large language models perform on data-intensive tasks in the humanities.