MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants | Signal Canvas | ScienceToStartup

← Back to Paper

MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants

Stale68d agoVerification pending / evidence receipt incomplete

Export Brief Open in Build Loop Connect with Author

Use This Via API or MCP

Use this Signal Canvas via API or MCP

Route this paper proof surface into REST, MCP, or developer workflows while preserving the same evidence receipt and related-resource context.

Signal Canvas guide REST guide MCP guide

Page Freshness

Signal Canvas proof surface

Canonical route: /signal-canvas/miniappbench-evaluating-the-shift-from-text-to-interactive-html-responses-in-llm-powered-assistants

stale

Proof freshness: stale
Proof status: unverified
Display score: 8/10
Last proof check: 2026-04-02
Score updated: 2026-04-02
Score fresh until: 2026-05-02
References: 0
Source count: 0
Coverage: 17%

This page is showing the last landed evidence receipt and score bundle because the latest proof data is outside the freshness window.

Agent Handoff

MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants

Canonical ID miniappbench-evaluating-the-shift-from-text-to-interactive-html-responses-in-llm-powered-assistants | Route /signal-canvas/miniappbench-evaluating-the-shift-from-text-to-interactive-html-responses-in-llm-powered-assistants

REST example

curl https://sciencetostartup.com/api/v1/agent-handoff/signal-canvas/miniappbench-evaluating-the-shift-from-text-to-interactive-html-responses-in-llm-powered-assistants

MCP example

{
  "tool": "search_signal_canvas",
  "arguments": {
    "mode": "paper",
    "paper_ref": "miniappbench-evaluating-the-shift-from-text-to-interactive-html-responses-in-llm-powered-assistants",
    "query_text": "Summarize MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants"
  }
}

source_context

{
  "surface": "signal_canvas",
  "mode": "paper",
  "query": "MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants",
  "normalized_query": "2603.09652",
  "route": "/signal-canvas/miniappbench-evaluating-the-shift-from-text-to-interactive-html-responses-in-llm-powered-assistants",
  "paper_ref": "miniappbench-evaluating-the-shift-from-text-to-interactive-html-responses-in-llm-powered-assistants",
  "topic_slug": null,
  "benchmark_ref": null,
  "dataset_ref": null
}

Paper mode· single-doc scopescope: miniappbench-evaluating-the-shift-from-text-to-interactive-html-responses-in-llm-powered-assistants

Preparing verified analysis

GitHub Code Pulse

No public code linked for this paper yet.

Claim map

Strong 8Mixed 0Weak 0

Evidencepartial
To address this gap, we introduce MiniAppBench, the first comprehensive benchmark designed to evaluate principle-driven, interactive application generation.
Implicationpartial
Directly stated in the abstract with explicit 'first comprehensive benchmark' phrasing
Verificationpartial
partial
Evidencepartial
However, existing benchmarks primarily focus on algorithmic correctness or static layout reconstruction, failing to capture the capabilities required for this new paradigm.
Implicationpartial
Directly stated in abstract as a motivation for the new benchmark
Verificationpartial
partial
Evidencepartial
MiniAppBench distills 500 tasks across six domains (e.g., Games, Science, and Tools).
Implicationpartial
Directly stated in abstract with specific numbers and domain examples
Verificationpartial
partial
Evidencepartial
Sourced from a real-world application with 10M+ generations, MiniAppBench distills 500 tasks across six domains
Implicationpartial
Directly stated in abstract with specific numeric evidence
Verificationpartial
partial
Evidencepartial
MiniAppEval demonstrates high alignment with human judgment, establishing a reliable standard for future research.
Implicationpartial
Directly stated in abstract as a result of experiments
Verificationpartial
partial
Evidencepartial
Our experiments reveal that current LLMs still face significant challenges in generating high-quality MiniApps
Implicationpartial
Directly stated in abstract as a key finding from experiments
Verificationpartial
partial
Evidencepartial
Leveraging browser automation, it performs human-like exploratory testing to systematically assess applications across three dimensions: Intention, Static, and Dynamic.
Implicationpartial
Directly stated in abstract with specific dimensions mentioned
Verificationpartial
partial
Evidencepartial
human-AI interaction is evolving from static text responses to dynamic, interactive HTML-based applications, which we term MiniApps.
Implicationpartial
Directly stated in abstract as the premise of the research
Verificationpartial
partial

Startup potential card

Startup potential card preview

Share on X LinkedIn