Buildability Receipt Backlinks

Evidence links to receipt scaffolds, not external adoption claims.

Evidence runs can now link directly to buildability receipt scaffolds so proof context stays aligned across paper, Signal Canvas, and Build Loop routes while external validation remains explicitly gated.

Buildability hub Example buildability receipt Proof divergence Divergence API Brier outcomes API

Evidence receipt window

Buildability receipt unavailable

evidence-workstation

Pending / no gradePending / no grade

Subject: Evidence workstation

Verdict

Pending / no grade

Evidence has no selected canonical paper receipt until a query, report, or paper handoff selects one.

Time to first demo

Insufficient data

No canonical receipt is available, so demo lead-time cannot be reported.

Compute envelope

Structured compute envelope

Insufficient data

No canonical receipt is available, so compute requirements cannot be reported.

Evidence ids

Insufficient data

No receipt id, paper id, proof run id, or evidence hash is available.

Freshness

Insufficient data

No receipt timestamp or evidence verification timestamp is available.

Hash state

Immutable hash

Insufficient data

No canonical receipt hash is available.

Signature state

External signature

unsigned_external

No founder, registry, pilot, or production-adoption signature is attached to this receipt.

Verification

not_verified

Verification is blocked until an external signature is provided.

Blockers

Pending / no grade: Evidence has no selected canonical paper receipt until a query, report, or paper handoff selects one.

Pending / no grade: Evidence has no selected canonical paper receipt until a query, report, or paper handoff selects one.

Missing proof, requirement, signature, approval, adoption, or telemetry fields are blockers and must not be inferred.

Buildability hub Example fixture receipt Proof divergence Divergence API Brier outcomes API

Evidence

Reviewable research runs with screening, extraction, consensus, and export-ready reports.

Evidence is the operator workstation for defining a question, screening candidates, inspecting proof, running consensus, extracting structured fields, synthesizing a report, and seeding a workspace with provenance.

Define question

Scope corpus, paper, or workspace runs.

Inspect evidence

Quote-level provenance and missingness stay visible.

Export or seed

Markdown, JSON, PDF, BibTeX, and workspace seeds.

Server-rendered preview

Previewing the top Evidence hits for "Compute concentration and frontier model economics".

Ideological Bias in LLMs' Economic Causal Reasoning

Do large language models (LLMs) exhibit systematic ideological bias when reasoning about economic causal effects? As LLMs are increasingly used in policy analysis and economic reporting, where directionally correct causal judgments are essential, this question has direct practical stakes. We present a systematic evaluation by extending the EconCausal benchmark with ideology-contested cases - instances where intervention-oriented (pro-government) and market-oriented (pro-market) perspectives predict divergent causal signs. From 10,490 causal triplets (treatment-outcome pairs with empirically verified effect directions) derived from top-tier economics and finance journals, we identify 1,056 ideology-contested instances and evaluate 20 state-of-the-art LLMs on their ability to predict empirically supported causal directions. We find that ideology-contested items are consistently harder than non-contested ones, and that across 18 of 20 models, accuracy is systematically higher when the empirically verified causal sign aligns with intervention-oriented expectations than with market-oriented ones. Moreover, when models err, their incorrect predictions disproportionately lean intervention-oriented, and this directional skew is not eliminated by one-shot in-context prompting. These results highlight that LLMs are not only less accurate on ideologically contested economic questions, but systematically less reliable in one ideological direction than the other, underscoring the need for direction-aware evaluation in high-stakes economic and policy settings.

Probably Approximately Consensus: On the Learning Theory of Finding Common Ground

A primary goal of online deliberation platforms is to identify ideas that are broadly agreeable to a community of users through their expressed preferences. Yet, consensus elicitation should ideally extend beyond the specific statements provided by users and should incorporate the relative salience of particular topics. We address this issue by modelling consensus as an interval in a one-dimensional opinion space derived from potentially high-dimensional data via embedding and dimensionality reduction. We define an objective that maximizes expected agreement within a hypothesis interval where the expectation is over an underlying distribution of issues, implicitly taking into account their salience. We propose an efficient Empirical Risk Minimization (ERM) algorithm and establish PAC-learning guarantees. Our initial experiments demonstrate the performance of our algorithm and examine more efficient approaches to identifying optimal consensus regions. We find that through selectively querying users on an existing sample of statements, we can reduce the number of queries needed to a practical number.

Post-AGI Economies: Autonomy and the First Fundamental Theorem of Welfare Economics

The First Fundamental Theorem of Welfare Economics assumes that welfare-bearing agents are autonomous and implicitly relies on a binary distinction between autonomy and instrumentality. Welfare subjects are those who have autonomy and therefore the capacity to choose and enter into utility comparisons, while everything else does not. In post-AGI economies this presupposition becomes nontrivial because artificial systems may exhibit varying degrees of autonomy, functioning as tools, delegates, strategic market actors, manipulators of choice environments, or possible welfare subjects. We argue that the theorem ought to be subject to an autonomy qualification where the impact of these changes in autonomy assumptions is incorporated. Using a minimal general-equilibrium model with autonomy-conditioned welfare, welfare-status assignment, delegation accounting, and verification institutions, we set out conditions for which autonomy-complete competitive equilibrium is autonomy-Pareto efficient. The classical theorem is recovered as the low-autonomy limit.

Evidence

Define a question, screen candidates, inspect evidence, run consensus, extract fields, and synthesize a cited report.

My Evidence

Cmd/Ctrl+K

AI Summary

Search results will appear with a streamed summary.

Results (8 papers)

Ideological Bias in LLMs' Economic Causal Reasoning

LLM Economic Bias | 2026-04-23

0.19

Analyze

Probably Approximately Consensus: On the Learning Theory of Finding Common Ground

LLM Training | 2026-04-23

0.18

Analyze

LLM Agents | 2026-04-20

0.16

LLM agents in markets present algorithmic collusion risks. While prior work shows LLM agents reach supracompetitive prices through tacit coordination, existing research focuses on hand-crafted prompts. The emerging paradigm of prompt optimization necessitates new methodologies for understanding autonomous agent behavior. We investigate whether prompt optimization leads to emergent collusive behaviors in market simulations. We propose a meta-learning loop where LLM agents participate in duopoly markets and an LLM meta-optimizer iteratively refines shared strategic guidance. Our experiments reveal that meta-prompt optimization enables agents to discover stable tacit collusion strategies with substantially improved coordination quality compared to baseline agents. These behaviors generalize to held-out test markets, indicating discovery of general coordination principles. Analysis of evolved prompts reveals systematic coordination mechanisms through stable shared strategies. Our findings call for further investigation into AI safety implications in autonomous multi-agent systems.

Analyze

Research Chat

ModeSources

Ask a follow-up about current results.

Consensus Meter

1% agreemedium

3 support | 2 oppose | 3 neutral

Avg stance confidence: 59%(limited confidence)

AI-classified from paper abstracts

Top evidence for "Compute concentration and frontier model economics" currently leans supportive, led by Ideological Bias in LLMs' Economic Causal Reasoning, Probably Approximately Consensus: On the Learning Theory of Finding Common Ground, Post-AGI Economies: Autonomy and the First Fundamental Theorem of Welfare Economics.

Individual Paper Stances (8)

Neutral

Positive performance or applicability signals are visible in the title or abstract.

a6f41c37-d395-4b89-8ff2-a30d70dc6697conf: 72%

Neutral

Limitations or caveats dominate the visible abstract evidence.

09536985-c986-4df6-9913-6e8e0095861fconf: 58%

Neutral

The visible evidence is mixed or incomplete.

181e1b4d-d7a3-4296-8f71-2bf8b0e77a47conf: 46%

Neutral

The visible evidence is mixed or incomplete.

d9bcc749-d744-4d17-84c1-d07b37b8cb67conf: 46%

Neutral

Positive performance or applicability signals are visible in the title or abstract.

99405377-d7d6-4d3a-9a2d-34dddf69f00aconf: 72%

Neutral

Limitations or caveats dominate the visible abstract evidence.

06fccca2-481e-47f8-9e17-f7a71f2a6964conf: 58%

Neutral

The visible evidence is mixed or incomplete.

5627ebe8-5743-42bb-87f9-008cae426567conf: 46%

Neutral

Positive performance or applicability signals are visible in the title or abstract.

79b73241-708f-4604-a8ff-af2bf895916aconf: 72%

Build With These Results

Copy prompts into your favorite AI coding tool to start building.

OpenAI CodexAI Agent

Lightweight coding agent in your terminal.

Claude CodeAI Agent

Agentic coding tool for terminal workflows.

AntiGravity IDEScaffolding

AI agent mindset installer and workflow scaffolder.

CursorIDE

AI-first code editor built on VS Code.

VS CodeIDE

Free, open-source editor by Microsoft.

Evidence questions

What is the ScienceToStartup evidence surface?+

It is the reviewable evidence workstation for search, screening, extraction, consensus, and export-ready reports with provenance-aware outputs.

How is Evidence different from the Daily Dashboard?+

The Daily Dashboard is the live operator surface. Evidence is the deeper workstation for explicit runs, cited outputs, and exportable report artifacts.

Can Evidence feed the rest of the product?+

Yes. Evidence runs can seed proof surfaces, Signal Canvas, workspaces, and downstream execution workflows without losing provenance.