Meta-Harness: End-to-End Optimization of Model Harnesses

Meta-Harness: End-to-End Optimization of Model Harnesses | Signal Canvas | ScienceToStartup

Page Freshness

Signal Canvas proof surface

Canonical route: /signal-canvas/meta-harness-end-to-end-optimization-of-model-harnesses

stale

Proof freshness: stale
Proof status: unverified
Display score: 7/10
Last proof check: 2026-03-31
Score updated: 2026-04-02
Score fresh until: 2026-05-02
References: 76
Source count: 4
Coverage: 50%

This page is showing the last landed evidence receipt and score bundle because the latest proof data is outside the freshness window.

Agent Handoff

Canonical ID meta-harness-end-to-end-optimization-of-model-harnesses | Route /signal-canvas/meta-harness-end-to-end-optimization-of-model-harnesses

REST example

curl https://sciencetostartup.com/api/v1/agent-handoff/signal-canvas/meta-harness-end-to-end-optimization-of-model-harnesses

MCP example

{
  "tool": "search_signal_canvas",
  "arguments": {
    "mode": "paper",
    "paper_ref": "meta-harness-end-to-end-optimization-of-model-harnesses",
    "query_text": "Summarize Meta-Harness: End-to-End Optimization of Model Harnesses"
  }
}

source_context

{
  "surface": "signal_canvas",
  "mode": "paper",
  "query": "Meta-Harness: End-to-End Optimization of Model Harnesses",
  "normalized_query": "2603.28052",
  "route": "/signal-canvas/meta-harness-end-to-end-optimization-of-model-harnesses",
  "paper_ref": "meta-harness-end-to-end-optimization-of-model-harnesses",
  "topic_slug": null,
  "benchmark_ref": null,
  "dataset_ref": null
}

Evidence Receipt

Route status: building

Claims: 8

References: 76

Proof: Verification pending

Freshness state: computing

Source paper: Meta-Harness: End-to-End Optimization of Model Harnesses

PDF: https://arxiv.org/pdf/2603.28052v1

Source count: 4

Coverage: 50%

Last proof check: 2026-03-31T20:20:38.991Z

Signal Canvas receipt window

Watch and verify: Meta-Harness: End-to-End Optimization of Model Harnesses

/buildability/meta-harness-end-to-end-optimization-of-model-harnesses

Watchwatch

Subject: Meta-Harness: End-to-End Optimization of Model Harnesses

Verdict

Watch

Verdict is Watch because viability or proof quality is intermediate and should be re-evaluated before execution.

Time to first demo

Preparing verified analysis

GitHub Code Pulse

No public code linked for this paper yet.

Claim map

Strong 8Mixed 0Weak 0

Evidencepartial
Meta-Harness improves online text classification accuracy while using a smaller input context.
Implicationpartial
Directly stated in abstract and analysis with specific numeric comparisons in Table 2 and text.
Verificationpartial
partial
Evidencepartial
Meta-Harness improves accuracy on 200 IMO-level problems by 4.7 points on average across five held-out models.
Implicationpartial
Explicitly stated in abstract and supported by Table 6 showing average improvement.
Verificationpartial
partial
Evidencepartial
On agentic coding, discovered harnesses surpass the best hand-engineered baselines on TerminalBench-2.
Implicationpartial
Directly stated in abstract and supported by Figure 1 showing performance comparison.
Verificationpartial
partial
Evidencepartial
This paper considers settings that yield orders-of-magnitude more context per artifact evaluation.
Implicationpartial
Supported by Table 1 showing Meta-Harness with 10.0 Mtok/iter versus much lower values for other methods, with explanatory text.
Verificationpartial
partial
Evidencepartial
Access to raw execution traces is the key ingredient for enabling harness search.
Implicationpartial
Directly stated in analysis section interpreting ablation results.
Verificationpartial
partial
Evidencepartial
Given only the current metrics and the desired trade-off, the proposer is able to discover harnesses across a broad range of the frontier.
Implicationpartial
Stated in analysis with reference to Pareto frontier and optimization capability.
Verificationpartial
partial
Evidencepartial
Meta-Harness outperforms the next best method by 2.9 points on these 9 previously unseen tasks.
Implicationpartial
Directly supported by Table 5 showing test accuracy comparisons.
Verificationpartial
partial
Evidencepartial
Changing the harness around a fixed large language model (LLM) can produce a 6 × performance gap on the same benchmark.
Implicationpartial
Explicitly stated in introduction with citation, establishing the importance of harness optimization.
Verificationpartial
partial

Author intelligence and commercialization panels stay hidden until the proof receipt is verified, cites at least 3 references, includes at least 2 sources, and clears 50% coverage. The paper narrative and citation surfaces remain public while verification is pending.

Meta-Harness: End-to-End Optimization of Model Harnesses

Use Signal Canvas as the narrative proof surface

Use this Signal Canvas via API or MCP

Signal Canvas proof surface