Examines a model's ability to extract bounding boxes of advertisements from magazine pages.
Dataset Description Result Overview Test Runs
This benchmark has been run 52 times. It uses f1 metric.
Tested providers: mistral, openai, openrouter, contour_local, x-ai, alibaba, genai, anthropic
Tested models: ministral-14b-2512, qwen3.5-397b-a17b, pixtral-large-2411, claude-opus-4-5-20251101, o3, gpt-4.1-mini, claude-sonnet-4-6, magistral-small-2509, qwen/qwen3-vl-8b-thinking, qwen3.5-35b-a3b, qwen/qwen3-vl-8b-instruct, qwen/qwen3-vl-30b-a3b-instruct, gemini-2.0-flash, claude-opus-4-1-20250805, qwen3.5-flash-2026-02-23, gemini-2.5-flash, gpt-5, magistral-medium-2509, gpt-5-mini, gemini-3.1-flash-lite-preview, ministral-8b-2512, gemini-3-flash-preview, mistral-small-2506, gpt-4.1, gpt-4o-mini, grok-4.20-0309-reasoning, claude-opus-4-20250514, gemini-2.5-pro, mistral-large-2512, gpt-5.3-codex, qwen3.5-27b, qwen3.5-122b-a10b, opencv-contour, claude-sonnet-4-20250514, qwen3.5-plus-2026-02-15, gpt-4o, gemini-3.1-pro-preview, meta-llama/llama-4-maverick, gpt-4.1-nano, gemini-2.5-flash-lite, gemini-2.5-flash-lite-preview-09-2025, claude-haiku-4-5-20251001, gpt-5-nano, mistral-medium-2505, mistral-large-2411, gpt-5.2-2025-12-11, gpt-5.4-2026-03-05, claude-opus-4-6, mistral-medium-2508, gemini-2.0-flash-lite, gpt-5.1-2025-11-13, claude-sonnet-4-5-20250929
| Score | Date | Provider | Model |
|---|---|---|---|
| 0.00 | 1 week ago | alibaba | qwen3.5-flash-2026-02-23 |
| 0.00 | 1 week ago | alibaba | qwen3.5-27b |
| 0.00 | 1 week ago | alibaba | qwen3.5-35b-a3b |
| 0.00 | 1 week ago | alibaba | qwen3.5-122b-a10b |
| 0.00 | 1 week ago | alibaba | qwen3.5-397b-a17b |
| Role | Contributors |
|---|---|
| Domain expert | Lea Kasper |
| Data curator | Lea Kasper, Sorin Marti |
| Annotator | Lea Kasper, Sorin Marti |
| Analyst | arno_bosse |
| Engineer | Sorin Marti |