How does ScienceToStartup rank AI research papers?

We use our Signal Fusion algorithm that combines viability scores, community predictions, GitHub activity, and evidence freshness to rank each paper by commercialization potential.

What is a viability score for AI research?

A viability score measures how commercially viable a research paper is based on technical strength, market readiness, team signals, method quality, and evidence coverage.

How do I find AI papers with startup potential?

Use the dashboard snapshot to review today's highest-ranked papers, inspect the evidence receipt, open Signal Canvas, seed a workspace, or move directly into Build Loop and Talent from the same canonical card.

ScienceToStartup is an Agent Operating System for Research Commercialization

API and MCP Platform for Turning Research Papers into Buildable Product Signals.

Turn papers, topics, benchmarks, datasets, and Signal Canvas threads into buildable product signals for your agents and operator workflows.

Developers API Docs

Tech Stack

ScienceToStartup

Research Intelligence

Apr 9

Papers167

Opportunities127

Research Map

Daily cluster surface for the landed snapshot.

Daily Brief

167 ranked papers landed for 2026-04-09. 120 are high-potential, 50 are quick builds, and opportunity share is 76.0%. DMax: Aggressive Parallel Decoding for dLLMs ranked #1 because signal fusion 84.0 with fresh evidence, 0 references, and 83% evidence coverage..

Built entirely from persisted dashboard metric snapshots and canonical opportunity kernels.

Top Anomalies

No anomaly crossed the dashboard trust threshold versus the last landed snapshot.

Action Rail

Sources counted: 0

Daily Snapshot

Every card below renders the canonical `MetricContract`, including freshness, provenance, and formula labels.

Trending Today

The highest-ranked papers in the canonical dashboard snapshot, rendered without client-side score recomputation.

Rank #1Signal 84.0fresh

DMax: Aggressive Parallel Decoding for dLLMs

We present DMax, a new paradigm for efficient diffusion language models (dLLMs). It mitigates error accumulation in parallel decoding, enabling aggressive decoding parallelism while preserving generation quality. Unlike conventional masked dLLMs that decode through a binary mask-to-token transition, DMax reformulates decoding as a progressive self-refinement from mask embeddings to token embeddings. At the core of our approach is On-Policy Uniform Training, a novel training strategy that efficiently unifies masked and uniform dLLMs, equipping the model to recover clean tokens from both masked inputs and its own erroneous predictions. Building on this foundation, we further propose Soft Parallel Decoding. We represent each intermediate decoding state as an interpolation between the predicted token embedding and the mask embedding, enabling iterative self-revising in embedding space. Extensive experiments across a variety of benchmarks demonstrate the effectiveness of DMax. Compared with the original LLaDA-2.0-mini, our method improves TPF on GSM8K from 2.04 to 5.47 while preserving accuracy. On MBPP, it increases TPF from 2.71 to 5.86 while maintaining comparable performance. On two H200 GPUs, our model achieves an average of 1,338 TPS at batch size 1. Code is available at: https://github.com/czg1225/DMax

Why This Ranked Here

Signal Fusion 84.0 with fresh evidence, 0 references, and 83% evidence coverage.

Evidence Receipt

proof unverifiedrepo active0 refs4 sources

DMax: Aggressive Parallel Decoding for dLLMs visual preview

Figure Preview

Top extracted figure from the paper figures store.

Score Breakdown

Overall

8.0

Technical

6.4

Commercial

5.0

Market

5.5

Team

4.9

Method

4.7

LLaDA-2.0-mini

GitHub Velocity

Repository stars tracked from cached pulse or recent historical snapshots.

0/wkHealth C

Prediction Market...

Community Confidence...

Paper Signal Canvas Build Loop TalentOpenRepo active

Rank #2Signal 77.2fresh

SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions

Reinforcement Learning with Verifiable Rewards (RLVR) has significantly improved large language model (LLM) reasoning in formal domains such as mathematics and code. Despite these advancements, LLMs still struggle with general reasoning tasks requiring capabilities such as causal inference and temporal understanding. Extending RLVR to general reasoning is fundamentally constrained by the lack of high-quality, verifiable training data that spans diverse reasoning skills. To address this challenge, we propose SUPERNOVA, a data curation framework for RLVR aimed at enhancing general reasoning. Our key insight is that instruction-tuning datasets containing expert-annotated ground-truth encode rich reasoning patterns that can be systematically adapted for RLVR. To study this, we conduct 100+ controlled RL experiments to analyze how data design choices impact downstream reasoning performance. In particular, we investigate three key factors: (i) source task selection, (ii) task mixing strategies, and (iii) synthetic interventions for improving data quality. Our analysis reveals that source task selection is non-trivial and has a significant impact on downstream reasoning performance. Moreover, selecting tasks based on their performance for individual target tasks outperforms strategies based on overall average performance. Finally, models trained on SUPERNOVA outperform strong baselines (e.g., Qwen3.5) on challenging reasoning benchmarks including BBEH, Zebralogic, and MMLU-Pro. In particular, training on SUPERNOVA yields relative improvements of up to 52.8\% on BBEH across model sizes, demonstrating the effectiveness of principled data curation for RLVR. Our findings provide practical insights for curating human-annotated resources to extend RLVR to general reasoning. The code and data is available at https://github.com/asuvarna31/supernova.

Why This Ranked Here

Signal Fusion 77.2 with fresh evidence, 0 references, and 83% evidence coverage.

Evidence Receipt

proof unverifiedrepo active0 refs4 sources

SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions visual preview

Figure Preview

Top extracted figure from the paper figures store.

Score Breakdown

Overall

8.0

Technical

6.4

Commercial

5.0

Market

5.5

Team

6.1

Method

4.7

PyTorchGitHub

GitHub Velocity

Repository stars tracked from cached pulse or recent historical snapshots.

0/wkHealth C

Prediction Market...

Community Confidence...

Paper Signal Canvas Build Loop TalentOpenRepo active

Rank #3Signal 75.8fresh

OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks

Group Relative Policy Optimization (GRPO) has emerged as the de facto Reinforcement Learning (RL) objective driving recent advancements in Multimodal Large Language Models. However, extending this success to open-source multimodal generalist models remains heavily constrained by two primary challenges: the extreme variance in reward topologies across diverse visual tasks, and the inherent difficulty of balancing fine-grained perception with multi-step reasoning capabilities. To address these issues, we introduce Gaussian GRPO (G$^2$RPO), a novel RL training objective that replaces standard linear scaling with non-linear distributional matching. By mathematically forcing the advantage distribution of any given task to strictly converge to a standard normal distribution, $\mathcal{N}(0,1)$, G$^2$RPO theoretically ensures inter-task gradient equity, mitigates vulnerabilities to heavy-tail outliers, and offers symmetric update for positive and negative rewards. Leveraging the enhanced training stability provided by G$^2$RPO, we introduce two task-level shaping mechanisms to seamlessly balance perception and reasoning. First, response length shaping dynamically elicits extended reasoning chains for complex queries while enforce direct outputs to bolster visual grounding. Second, entropy shaping tightly bounds the model's exploration zone, effectively preventing both entropy collapse and entropy explosion. Integrating these methodologies, we present OpenVLThinkerV2, a highly robust, general-purpose multimodal model. Extensive evaluations across 18 diverse benchmarks demonstrate its superior performance over strong open-source and leading proprietary frontier models.

Why This Ranked Here

Signal Fusion 75.8 with fresh evidence, 0 references, and 83% evidence coverage.

Evidence Receipt

proof partialrepo active0 refs4 sources

OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks visual preview

Figure Preview

Top extracted figure from the paper figures store.

Score Breakdown

Overall

7.0

Technical

1.4

Commercial

5.0

Market

5.5

Team

6.2

Method

4.3

PyTorchGRPOG2RPO

GitHub Velocity

139

Repository stars tracked from cached pulse or recent historical snapshots.

+1/wkHealth C

Prediction Market...

Community Confidence...

Paper Signal Canvas Build Loop TalentOpenRepo active

Ranked Opportunities

Canonical score, evidence, and direct execution links.

Ranked Opportunities

Filtered locally for discovery only. Rank, score, and freshness remain server-owned.

Rank #4Signal 75.7fresh

DSCA: Dynamic Subspace Concept Alignment for Lifelong VLM Editing

Model editing aims to update knowledge to add new concepts and change relevant information without retraining. Lifelong editing is a challenging task, prone to disrupting previously learned concepts, especially for Vision Language Models (VLMs), because sequential edits can lead to degraded reasoning and cross modal misalignment. Existing VLM knowledge editing methods based on gated adapters, activation edits, and parameter merging techniques address catastrophic forgetting seen in full fine tuning; however, they still operate in the shared representation space of the VLM, where concepts are entangled, so edits interfere with other non relevant concepts. We hypothesize that this instability persists because current methods algorithmically control edits via optimization rather than structurally separating knowledge. We introduce Dynamic Subspace Concept Alignment (DSCA) which by design mitigates this limitation by decomposing the representation space into a set of orthogonal semantic subspaces and proposing edits only in those transformed spaces. These subspaces are obtained through incremental clustering and PCA on joint vision language representations. This process structurally isolates concepts, enabling precise, non interfering edits by turning isolation from a soft training objective into an architectural property. The surgical edits are guided by a multi term loss function for maintaining task fidelity, edit locality, and cross modal alignment. With the base model frozen, our method achieves 98 percent single edit success, remains over 95 percent after 1000 sequential edits, lowers hallucination by 3 to 5 percent, and achieves the best backward transfer (BWT) scores on continual instruction tuning benchmarks. Extensive experiments demonstrate DSCA state of the art stability and knowledge retention capability in continual lifelong editing across various datasets and benchmarks.

Why This Ranked Here

Signal Fusion 75.7 with fresh evidence, 0 references, and 67% evidence coverage.

Evidence Receipt

proof verifiedrepo active0 refs4 sources

DSCA: Dynamic Subspace Concept Alignment for Lifelong VLM Editing visual preview

Figure Preview

Top extracted figure from the paper figures store.

Score Breakdown

Overall

8.0

Technical

6.4

Commercial

5.0

Market

2.0

Team

4.0

Method

0.0

GitHub Velocity

712

Repository stars tracked from cached pulse or recent historical snapshots.

0/wk

Prediction Market...

Community Confidence...

Paper Signal Canvas Build Loop TalentOpenRepo active

Rank #5Signal 73.1fresh

Data Selection for Multi-turn Dialogue Instruction Tuning

Instruction-tuned language models increasingly rely on large multi-turn dialogue corpora, but these datasets are often noisy and structurally inconsistent, with topic drift, repetitive chitchat, and mismatched answer formats across turns. We address this from a data selection perspective and propose \textbf{MDS} (Multi-turn Dialogue Selection), a dialogue-level framework that scores whole conversations rather than isolated turns. MDS combines a global coverage stage that performs bin-wise selection in the user-query trajectory space to retain representative yet non-redundant dialogues, with a local structural stage that evaluates within-dialogue reliability through entity-grounded topic grounding and information progress, together with query-answer form consistency for functional alignment. MDS outperforms strong single-turn selectors, dialogue-level LLM scorers, and heuristic baselines on three multi-turn benchmarks and an in-domain Banking test set, achieving the best overall rank across reference-free and reference-based metrics, and is more robust on long conversations under the same training budget. Code and resources are included in the supplementary materials.

Why This Ranked Here

Signal Fusion 73.1 with fresh evidence, 0 references, and 83% evidence coverage.

Evidence Receipt

proof partialrepo active0 refs6 sources

Data Selection for Multi-turn Dialogue Instruction Tuning visual preview

Figure Preview

Top extracted figure from the paper figures store.

Score Breakdown

Overall

8.0

Technical

1.4

Commercial

5.0

Market

5.5

Team

3.8

Method

4.7

GitHub Velocity

Repository stars tracked from cached pulse or recent historical snapshots.

0/wkHealth C

Prediction Market...

Community Confidence...

Paper Signal Canvas Build Loop TalentOpenRepo active

Rank #6Signal 72.2fresh

PIArena: A Platform for Prompt Injection Evaluation

Prompt injection attacks pose serious security risks across a wide range of real-world applications. While receiving increasing attention, the community faces a critical gap: the lack of a unified platform for prompt injection evaluation. This makes it challenging to reliably compare defenses, understand their true robustness under diverse attacks, or assess how well they generalize across tasks and benchmarks. For instance, many defenses initially reported as effective were later found to exhibit limited robustness on diverse datasets and attacks. To bridge this gap, we introduce PIArena, a unified and extensible platform for prompt injection evaluation that enables users to easily integrate state-of-the-art attacks and defenses and evaluate them across a variety of existing and new benchmarks. We also design a dynamic strategy-based attack that adaptively optimizes injected prompts based on defense feedback. Through comprehensive evaluation using PIArena, we uncover critical limitations of state-of-the-art defenses: limited generalizability across tasks, vulnerability to adaptive attacks, and fundamental challenges when an injected task aligns with the target task. The code and datasets are available at https://github.com/sleeepeer/PIArena.

Why This Ranked Here

Signal Fusion 72.2 with fresh evidence, 0 references, and 83% evidence coverage.

Evidence Receipt

proof partialrepo active0 refs4 sources

PIArena: A Platform for Prompt Injection Evaluation visual preview

Figure Preview

Top extracted figure from the paper figures store.

Score Breakdown

Overall

7.0

Technical

1.4

Commercial

5.0

Market

5.5

Team

4.2

Method

4.7

GitHub

GitHub Velocity

Repository stars tracked from cached pulse or recent historical snapshots.

0/wkHealth C

Prediction Market...

Community Confidence...

Paper Signal Canvas Build Loop TalentOpenRepo active

Rank #7Signal 71.2fresh

SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds

Robotic manipulation with deformable objects represents a data-intensive regime in embodied learning, where shape, contact, and topology co-evolve in ways that far exceed the variability of rigids. Although simulation promises relief from the cost of real-world data acquisition, prevailing sim-to-real pipelines remain rooted in rigid-body abstractions, producing mismatched geometry, fragile soft dynamics, and motion primitives poorly suited for cloth interaction. We posit that simulation fails not for being synthetic, but for being ungrounded. To address this, we introduce SIM1, a physics-aligned real-to-sim-to-real data engine that grounds simulation in the physical world. Given limited demonstrations, the system digitizes scenes into metric-consistent twins, calibrates deformable dynamics through elastic modeling, and expands behaviors via diffusion-based trajectory generation with quality filtering. This pipeline transforms sparse observations into scaled synthetic supervision with near-demonstration fidelity. Experiments show that policies trained on purely synthetic data achieve parity with real-data baselines at a 1:15 equivalence ratio, while delivering 90% zero-shot success and 50% generalization gains in real-world deployment. These results validate physics-aligned simulation as scalable supervision for deformable manipulation and a practical pathway for data-efficient policy learning.

Why This Ranked Here

Signal Fusion 71.2 with fresh evidence, 0 references, and 83% evidence coverage.

Evidence Receipt

proof unverifiedrepo active0 refs6 sources

SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds visual preview

Figure Preview

Top extracted figure from the paper figures store.

Score Breakdown

Overall

7.0

Technical

1.4

Commercial

5.0

Market

2.0

Team

6.5

Method

4.7

PyTorchHugging Face

GitHub Velocity

Repository stars tracked from cached pulse or recent historical snapshots.

+1/wkHealth C

Prediction Market...

Community Confidence...

Paper Signal Canvas Build Loop TalentOpenRepo active

Rank #8Signal 68.2fresh

EigentSearch-Q+: Enhancing Deep Research Agents with Structured Reasoning Tools

Deep research requires reasoning over web evidence to answer open-ended questions, and it is a core capability for AI agents. Yet many deep research agents still rely on implicit, unstructured search behavior that causes redundant exploration and brittle evidence aggregation. Motivated by Anthropic's "think" tool paradigm and insights from the information-retrieval literature, we introduce Q+, a set of query and evidence processing tools that make web search more deliberate by guiding query planning, monitoring search progress, and extracting evidence from long web snapshots. We integrate Q+ into the browser sub-agent of Eigent, an open-source, production-ready multi-agent workforce for computer use, yielding EigentSearch-Q+. Across four benchmarks (SimpleQA-Verified, FRAMES, WebWalkerQA, and X-Bench DeepSearch), Q+ improves Eigent's browser agent benchmark-size-weighted average accuracy by 3.0, 3.8, and 0.6 percentage points (pp) for GPT-4.1, GPT-5.1, and Minimax M2.5 model backends, respectively. Case studies further suggest that EigentSearch-Q+ produces more coherent tool-calling trajectories by making search progress and evidence handling explicit.

Why This Ranked Here

Signal Fusion 68.2 with fresh evidence, 0 references, and 83% evidence coverage.

Evidence Receipt

proof partialrepo active0 refs4 sources

EigentSearch-Q+: Enhancing Deep Research Agents with Structured Reasoning Tools visual preview

Figure Preview

Top extracted figure from the paper figures store.

Score Breakdown

Overall

7.0

Technical

1.4

Commercial

5.0

Market

2.0

Team

6.5

Method

4.2

PyTorchGPT-4GPT-5Minimax

GitHub Velocity

Repository stars tracked from cached pulse or recent historical snapshots.

0/wk

Prediction Market...

Community Confidence...

Paper Signal Canvas Build Loop TalentOpenRepo active

Rank #9Signal 67.4fresh

PrivFedTalk: Privacy-Aware Federated Diffusion with Identity-Stable Adapters for Personalized Talking-Head Generation

Talking-head generation has advanced rapidly with diffusion-based generative models, but training usually depends on centralized face-video and speech datasets, raising major privacy concerns. The problem is more acute for personalized talking-head generation, where identity-specific data are highly sensitive and often cannot be pooled across users or devices. PrivFedTalk is presented as a privacy-aware federated framework for personalized talking-head generation that combines conditional latent diffusion with parameter-efficient identity adaptation. A shared diffusion backbone is trained across clients, while each client learns lightweight LoRA identity adapters from local private audio-visual data, avoiding raw data sharing and reducing communication cost. To address heterogeneous client distributions, Identity-Stable Federated Aggregation (ISFA) weights client updates using privacy-safe scalar reliability signals computed from on-device identity consistency and temporal stability estimates. Temporal-Denoising Consistency (TDC) regularization is introduced to reduce inter-frame drift, flicker, and identity drift during federated denoising. To limit update-side privacy risk, secure aggregation and client-level differential privacy are applied to adapter updates. The implementation supports both low-memory GPU execution and multi-GPU client-parallel training on heterogeneous shared hardware. Comparative experiments on the present setup across multiple training and aggregation conditions with PrivFedTalk, FedAvg, and FedProx show stable federated optimization and successful end-to-end training and evaluation under constrained resources. The results support the feasibility of privacy-aware personalized talking-head training in federated environments, while suggesting that stronger component-wise, privacy-utility, and qualitative claims need further standardized evaluation.

Why This Ranked Here

Signal Fusion 67.4 with fresh evidence, 0 references, and 83% evidence coverage.

Evidence Receipt

proof partialrepo active0 refs4 sources

PrivFedTalk: Privacy-Aware Federated Diffusion with Identity-Stable Adapters for Personalized Talking-Head Generation visual preview

Figure Preview

Top extracted figure from the paper figures store.

Score Breakdown

Overall

7.0

Technical

1.4

Commercial

5.0

Market

5.5

Team

5.3

Method

4.7

PyTorchLoRA

GitHub Velocity

Repository stars tracked from cached pulse or recent historical snapshots.

0/wkHealth C

Prediction Market...

Community Confidence...

Paper Signal Canvas Build Loop TalentOpenRepo active

Rank #10Signal 66.7fresh

SeLaR: Selective Latent Reasoning in Large Language Models

Chain-of-Thought (CoT) has become a cornerstone of reasoning in large language models, yet its effectiveness is constrained by the limited expressiveness of discrete token sampling. Recent latent reasoning approaches attempt to alleviate this limitation by replacing discrete tokens with soft embeddings (probability-weighted mixtures of token embeddings) or hidden states, but they commonly suffer from two issues: (1) global activation injects perturbations into high-confidence steps, impairing reasoning stability; and (2) soft embeddings quickly collapse toward the highest-probability token, limiting exploration of alternative trajectories. To address these challenges, we propose SeLaR (Selective Latent Reasoning), a lightweight and training-free framework. SeLaR introduces an entropy-gated mechanism that activates soft embeddings only at low-confidence steps, while preserving discrete decoding at high-confidence steps. Additionally, we propose an entropy-aware contrastive regularization that pushes soft embeddings away from the dominant (highest-probability) token's direction, encouraging sustained exploration of multiple latent reasoning paths. Experiments on five reasoning benchmarks demonstrate that SeLaR consistently outperforms standard CoT and state-of-the-art training-free methods.

Why This Ranked Here

Signal Fusion 66.7 with fresh evidence, 0 references, and 83% evidence coverage.

Evidence Receipt

proof partialrepo active0 refs4 sources

SeLaR: Selective Latent Reasoning in Large Language Models visual preview

Figure Preview

Top extracted figure from the paper figures store.

Score Breakdown

Overall

7.0

Technical

1.4

Commercial

5.0

Market

2.0

Team

4.0

Method

4.7

GitHub Velocity

Repository stars tracked from cached pulse or recent historical snapshots.

0/wk

Prediction Market...

Community Confidence...

Paper Signal Canvas Build Loop TalentOpenRepo active

Snapshot readySnapshot 2026-04-09

Canonical dashboard metrics and ranked papers are current.

Computed: Apr 10, 5:30 PMCoverage: 56%Sources counted: 1859Last landed snapshot: 2026-04-09

missing: trend_points.opportunity_share

ScienceToStartup is an Agent Operating System for Research Commercialization

Trending Today

Ranked Opportunities

Developer and proof surfaces

Frequently Asked Questions

Platform