Skip to main content

Buildability Receipt Backlinks

Evidence links to receipt scaffolds, not external adoption claims.

Evidence runs can now link directly to buildability receipt scaffolds so proof context stays aligned across paper, Signal Canvas, and Build Loop routes while external validation remains explicitly gated.

Evidence receipt window

Buildability receipt unavailable

evidence-workstation

Pending / no gradePending / no grade

Subject: Evidence workstation

Verdict

Pending / no grade

Evidence has no selected canonical paper receipt until a query, report, or paper handoff selects one.

Time to first demo

Insufficient data

No canonical receipt is available, so demo lead-time cannot be reported.

Compute envelope

Structured compute envelope

Insufficient data

No canonical receipt is available, so compute requirements cannot be reported.

Evidence ids

Evidence ids

Insufficient data

No receipt id, paper id, proof run id, or evidence hash is available.

Freshness

Freshness

Insufficient data

No receipt timestamp or evidence verification timestamp is available.

Hash state

Immutable hash

Insufficient data

No canonical receipt hash is available.

Signature state

External signature

unsigned_external

No founder, registry, pilot, or production-adoption signature is attached to this receipt.

Verification

not_verified

Verification is blocked until an external signature is provided.

Blockers

  • Pending / no grade: Evidence has no selected canonical paper receipt until a query, report, or paper handoff selects one.

Pending / no grade: Evidence has no selected canonical paper receipt until a query, report, or paper handoff selects one.

Missing proof, requirement, signature, approval, adoption, or telemetry fields are blockers and must not be inferred.
Evidence

Reviewable research runs with screening, extraction, consensus, and export-ready reports.

Evidence is the operator workstation for defining a question, screening candidates, inspecting proof, running consensus, extracting structured fields, synthesizing a report, and seeding a workspace with provenance.

Define question
Scope corpus, paper, or workspace runs.
Inspect evidence
Quote-level provenance and missingness stay visible.
Export or seed
Markdown, JSON, PDF, BibTeX, and workspace seeds.
Server-rendered preview

Preview requested for "Compute concentration and frontier model economics".

Previewing the top Evidence hits for "Compute concentration and frontier model economics".

Ideological Bias in LLMs' Economic Causal Reasoning
Do large language models (LLMs) exhibit systematic ideological bias when reasoning about economic causal effects? As LLMs are increasingly used in policy analysis and economic reporting, where directionally correct causal judgments are essential, this question has direct practical stakes. We present a systematic evaluation by extending the EconCausal benchmark with ideology-contested cases - instances where intervention-oriented (pro-government) and market-oriented (pro-market) perspectives predict divergent causal signs. From 10,490 causal triplets (treatment-outcome pairs with empirically verified effect directions) derived from top-tier economics and finance journals, we identify 1,056 ideology-contested instances and evaluate 20 state-of-the-art LLMs on their ability to predict empirically supported causal directions. We find that ideology-contested items are consistently harder than non-contested ones, and that across 18 of 20 models, accuracy is systematically higher when the empirically verified causal sign aligns with intervention-oriented expectations than with market-oriented ones. Moreover, when models err, their incorrect predictions disproportionately lean intervention-oriented, and this directional skew is not eliminated by one-shot in-context prompting. These results highlight that LLMs are not only less accurate on ideologically contested economic questions, but systematically less reliable in one ideological direction than the other, underscoring the need for direction-aware evaluation in high-stakes economic and policy settings.
Probably Approximately Consensus: On the Learning Theory of Finding Common Ground
A primary goal of online deliberation platforms is to identify ideas that are broadly agreeable to a community of users through their expressed preferences. Yet, consensus elicitation should ideally extend beyond the specific statements provided by users and should incorporate the relative salience of particular topics. We address this issue by modelling consensus as an interval in a one-dimensional opinion space derived from potentially high-dimensional data via embedding and dimensionality reduction. We define an objective that maximizes expected agreement within a hypothesis interval where the expectation is over an underlying distribution of issues, implicitly taking into account their salience. We propose an efficient Empirical Risk Minimization (ERM) algorithm and establish PAC-learning guarantees. Our initial experiments demonstrate the performance of our algorithm and examine more efficient approaches to identifying optimal consensus regions. We find that through selectively querying users on an existing sample of statements, we can reduce the number of queries needed to a practical number.
Post-AGI Economies: Autonomy and the First Fundamental Theorem of Welfare Economics
The First Fundamental Theorem of Welfare Economics assumes that welfare-bearing agents are autonomous and implicitly relies on a binary distinction between autonomy and instrumentality. Welfare subjects are those who have autonomy and therefore the capacity to choose and enter into utility comparisons, while everything else does not. In post-AGI economies this presupposition becomes nontrivial because artificial systems may exhibit varying degrees of autonomy, functioning as tools, delegates, strategic market actors, manipulators of choice environments, or possible welfare subjects. We argue that the theorem ought to be subject to an autonomy qualification where the impact of these changes in autonomy assumptions is incorporated. Using a minimal general-equilibrium model with autonomy-conditioned welfare, welfare-status assignment, delegation accounting, and verification institutions, we set out conditions for which autonomy-complete competitive equilibrium is autonomy-Pareto efficient. The classical theorem is recovered as the low-autonomy limit.

Evidence

Define a question, screen candidates, inspect evidence, run consensus, extract fields, and synthesize a cited report.

My Evidence
AI Summary

Search results will appear with a streamed summary.

Results (8 papers)

Ideological Bias in LLMs' Economic Causal Reasoning

LLM Economic Bias | 2026-04-23

0.19

Do large language models (LLMs) exhibit systematic ideological bias when reasoning about economic causal effects? As LLMs are increasingly used in policy analysis and economic reporting, where directionally correct causal judgments are essential, this question has direct practical stakes. We present a systematic evaluation by extending the EconCausal benchmark with ideology-contested cases - instances where intervention-oriented (pro-government) and market-oriented (pro-market) perspectives predict divergent causal signs. From 10,490 causal triplets (treatment-outcome pairs with empirically verified effect directions) derived from top-tier economics and finance journals, we identify 1,056 ideology-contested instances and evaluate 20 state-of-the-art LLMs on their ability to predict empirically supported causal directions. We find that ideology-contested items are consistently harder than non-contested ones, and that across 18 of 20 models, accuracy is systematically higher when the empirically verified causal sign aligns with intervention-oriented expectations than with market-oriented ones. Moreover, when models err, their incorrect predictions disproportionately lean intervention-oriented, and this directional skew is not eliminated by one-shot in-context prompting. These results highlight that LLMs are not only less accurate on ideologically contested economic questions, but systematically less reliable in one ideological direction than the other, underscoring the need for direction-aware evaluation in high-stakes economic and policy settings.

Probably Approximately Consensus: On the Learning Theory of Finding Common Ground

LLM Training | 2026-04-23

0.18

A primary goal of online deliberation platforms is to identify ideas that are broadly agreeable to a community of users through their expressed preferences. Yet, consensus elicitation should ideally extend beyond the specific statements provided by users and should incorporate the relative salience of particular topics. We address this issue by modelling consensus as an interval in a one-dimensional opinion space derived from potentially high-dimensional data via embedding and dimensionality reduction. We define an objective that maximizes expected agreement within a hypothesis interval where the expectation is over an underlying distribution of issues, implicitly taking into account their salience. We propose an efficient Empirical Risk Minimization (ERM) algorithm and establish PAC-learning guarantees. Our initial experiments demonstrate the performance of our algorithm and examine more efficient approaches to identifying optimal consensus regions. We find that through selectively querying users on an existing sample of statements, we can reduce the number of queries needed to a practical number.

Post-AGI Economies: Autonomy and the First Fundamental Theorem of Welfare Economics

AI Economics | 2026-04-23

0.18

The First Fundamental Theorem of Welfare Economics assumes that welfare-bearing agents are autonomous and implicitly relies on a binary distinction between autonomy and instrumentality. Welfare subjects are those who have autonomy and therefore the capacity to choose and enter into utility comparisons, while everything else does not. In post-AGI economies this presupposition becomes nontrivial because artificial systems may exhibit varying degrees of autonomy, functioning as tools, delegates, strategic market actors, manipulators of choice environments, or possible welfare subjects. We argue that the theorem ought to be subject to an autonomy qualification where the impact of these changes in autonomy assumptions is incorporated. Using a minimal general-equilibrium model with autonomy-conditioned welfare, welfare-status assignment, delegation accounting, and verification institutions, we set out conditions for which autonomy-complete competitive equilibrium is autonomy-Pareto efficient. The classical theorem is recovered as the low-autonomy limit.

Dissecting AI Trading: Behavioral Finance and Market Bubbles

AI Agents in Finance | 2026-04-20

0.17

We study how AI agents form expectations and trade in experimental asset markets. Using a simulated open-call auction populated by autonomous Large Language Model (LLM) agents, we document three main findings. First, AI agents exhibit classic behavioral patterns: a pronounced disposition effect and recency-weighted extrapolative beliefs. Second, these individual-level patterns aggregate into equilibrium dynamics that replicate classic experimental findings (Smith et al., 1988), including the predictive power of excess demand for future prices and the positive relationship between disagreement and trading volume. Third, by analyzing the agents' reasoning text through a twenty-mechanism scoring framework, we show that targeted prompt interventions causally amplify or suppress specific behavioral mechanisms, significantly altering the magnitude of market bubbles.

Safety-Critical Contextual Control via Online Riemannian Optimization with World Models

Safety-Critical Control | 2026-04-21

0.17

Modern world models are becoming too complex to admit explicit dynamical descriptions. We study safety-critical contextual control, where a Planner must optimize a task objective using only feasibility samples from a black-box Simulator, conditioned on a context signal $ξ_t$. We develop a sample-based Penalized Predictive Control (PPC) framework grounded in online Riemannian optimization, in which the Simulator compresses the feasibility manifold into a score-based density $\hat{p}(u \mid ξ_t)$ that endows the action space with a Riemannian geometry guiding the Planner's gradient descent. The barrier curvature $κ(ξ_t)$, the minimum curvature of the conditional log-density $-\ln\hat{p}(\cdot\midξ_t)$, governs both convergence rate and safety margin, replacing the Lipschitz constant of the unknown dynamics. Our main result is a contextual safety bound showing that the distance from the true feasibility manifold is controlled by the score estimation error and a ratio that depends on $κ(ξ_t)$, both of which improve with richer context. Simulations on a dynamic navigation task confirm that contextual PPC substantially outperforms marginal and frozen density models, with the advantage growing after environment shifts.

Resolving space-sharing conflicts in road user interactions through uncertainty reduction: An active inference-based computational model

Autonomous Driving Behavior Modeling | 2026-04-21

0.17

Understanding how road users resolve space-sharing conflicts is important both for traffic safety and the safe deployment of autonomous vehicles. While existing models have captured specific aspects of such interactions (e.g., explicit communication), a theoretically-grounded computational framework has been lacking. In this paper, we extend a previously developed active inference-based driver behavior model to simulate interactive behavior of two agents. Our model captures three complementary mechanisms for uncertainty reduction in interaction: (i) implicit communication via direct behavioral coupling, (ii) reliance on normative expectations (stop signs, priority rules, etc.), and (iii) explicit communication. In a simplified intersection scenario, we show that normative and explicit communication cues can increase the likelihood of a successful conflict resolution. However, this relies on agents acting as expected. In situations where another agent (intentionally or unintentionally) violates normative expectations or communicates misleading information, reliance on these cues may induce collisions. These findings illustrate how active inference can provide a novel framework for modeling road user interactions which is also applicable in other fields.

On The Mathematics of the Natural Physics of Optimization

Optimization Theory | 2026-04-19

0.17

A number of optimization algorithms have been inspired by the physics of Newtonian motion. Here, we ask the question: do algorithms themselves obey some ``natural laws of motion,'' and can they be derived by an application of these laws? We explore this question by positing the theory that optimization algorithms may be considered as some manifestation of hidden algorithm primitives that obey certain universal non-Newtonian dynamics. This natural physics of optimization is developed by equating the terminal transversality conditions of an optimal control problem to the generalized Karush/John-Kuhn-Tucker conditions of an optimization problem. Through this equivalence formulation, the data functions of a given constrained optimization problem generate a natural vector field that permeates an entire hidden space with information on the optimality conditions. An ``action-at-a-distance'' operation via a Pontryagin-type minimum principle produces a local action to deliver a globalized result by way of a Hamilton-Jacobi inequality. An inverse-optimal algorithm is generated by performing control jumps that dissipate quantized ``energy'' defined by a search Lyapunov function. Illustrative applications of the proposed theory show that a large number of algorithms can be generated and explained in terms of the new mathematical physics of optimization.

Prompt Optimization Enables Stable Algorithmic Collusion in LLM Agents

LLM Agents | 2026-04-20

0.16

LLM agents in markets present algorithmic collusion risks. While prior work shows LLM agents reach supracompetitive prices through tacit coordination, existing research focuses on hand-crafted prompts. The emerging paradigm of prompt optimization necessitates new methodologies for understanding autonomous agent behavior. We investigate whether prompt optimization leads to emergent collusive behaviors in market simulations. We propose a meta-learning loop where LLM agents participate in duopoly markets and an LLM meta-optimizer iteratively refines shared strategic guidance. Our experiments reveal that meta-prompt optimization enables agents to discover stable tacit collusion strategies with substantially improved coordination quality compared to baseline agents. These behaviors generalize to held-out test markets, indicating discovery of general coordination principles. Analysis of evolved prompts reveals systematic coordination mechanisms through stable shared strategies. Our findings call for further investigation into AI safety implications in autonomous multi-agent systems.

Research Chat

Ask a follow-up about current results.

Consensus Meter
1% agreemedium
3 support | 2 oppose | 3 neutral
Avg stance confidence: 59%(limited confidence)

AI-classified from paper abstracts

Top evidence for "Compute concentration and frontier model economics" currently leans supportive, led by Ideological Bias in LLMs' Economic Causal Reasoning, Probably Approximately Consensus: On the Learning Theory of Finding Common Ground, Post-AGI Economies: Autonomy and the First Fundamental Theorem of Welfare Economics.

Individual Paper Stances (8)
Neutral

Positive performance or applicability signals are visible in the title or abstract.

a6f41c37-d395-4b89-8ff2-a30d70dc6697conf: 72%
Neutral

Limitations or caveats dominate the visible abstract evidence.

09536985-c986-4df6-9913-6e8e0095861fconf: 58%
Neutral

The visible evidence is mixed or incomplete.

181e1b4d-d7a3-4296-8f71-2bf8b0e77a47conf: 46%
Neutral

The visible evidence is mixed or incomplete.

d9bcc749-d744-4d17-84c1-d07b37b8cb67conf: 46%
Neutral

Positive performance or applicability signals are visible in the title or abstract.

99405377-d7d6-4d3a-9a2d-34dddf69f00aconf: 72%
Neutral

Limitations or caveats dominate the visible abstract evidence.

06fccca2-481e-47f8-9e17-f7a71f2a6964conf: 58%
Neutral

The visible evidence is mixed or incomplete.

5627ebe8-5743-42bb-87f9-008cae426567conf: 46%
Neutral

Positive performance or applicability signals are visible in the title or abstract.

79b73241-708f-4604-a8ff-af2bf895916aconf: 72%

Build With These Results

Copy prompts into your favorite AI coding tool to start building.

OpenAI Codex
OpenAI CodexAI Agent

Lightweight coding agent in your terminal.

Claude Code
Claude CodeAI Agent

Agentic coding tool for terminal workflows.

AntiGravity IDE
AntiGravity IDEScaffolding

AI agent mindset installer and workflow scaffolder.

Cursor
CursorIDE

AI-first code editor built on VS Code.

VS Code
VS CodeIDE

Free, open-source editor by Microsoft.

People Also Ask

Evidence questions

What is the ScienceToStartup evidence surface?+

It is the reviewable evidence workstation for search, screening, extraction, consensus, and export-ready reports with provenance-aware outputs.

How is Evidence different from the Daily Dashboard?+

The Daily Dashboard is the live operator surface. Evidence is the deeper workstation for explicit runs, cited outputs, and exportable report artifacts.

Can Evidence feed the rest of the product?+

Yes. Evidence runs can seed proof surfaces, Signal Canvas, workspaces, and downstream execution workflows without losing provenance.