Build Loop

#PAPERCLUSTERDATESCORE

1EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving AgentsPrompt Optimization1d60.2
2A History-Aware Visually Grounded Critic for Computer Use AgentsUncategorized1d36.73
3Mitigating Bias in Low-SNR Financial Reinforcement Learning via Quantum RepresentationsUncategorized1d36.02
4The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent MisalignmentUncategorized1d35.89
5A Unifying Lens on Supervised Fine-Tuning Through Target Distribution DesignSupervised Learning1d35.77
6Speech Meets ELF: Audio Conditional Continuous-Target Diffusion for Speech Recognition and TranslationUncategorized1d35.27
7FADA: Accessible fetal ultrasound interpretation and annotation with a selectively distilled unified vision-language modelUncategorized1d35.27
8From Context-Aware to Conflict-Aware: Generalizing Contrastive Decoding for Knowledge Conflict in LLMsUncategorized1d34.72
9One Token per Multimodal Evidence: Latent Memory for Resource-Constrained QAUncategorized1d33.84
10Beyond Static Evaluation: Co-Evolutionary Mechanisms for LLM-Driven Strategy Evolution in Adversarial GamesUncategorized1d33.77
11Spatial-Omni: Spatial Audio Understanding Integration in Multimodal LLMs via FOA EncodingUncategorized1d33.33
12++nnU-Net: Scaling nnU-Net with Prefix-Based Data AugmentationUncategorized1d33.22
13The Role of Feedback Alignment in Self-DistillationMachine Learning1d32.57
14LIBERO-Occ: Evaluating and Improving Vision-Language-Action Models under Scene-Induced Occlusion via Viewpoint ImaginationUncategorized1d32.42
15Piper: A Programmable Distributed Training SystemDistributed Training Systems1d31.77
16Self-Distillation Policy Optimization via Visual Feedback: Bridging Code and Visual ArtifactsUncategorized1d24.12
17Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional FieldsUncategorized1d23.81
18Mind the Gap: Can Frontier LLMs Pass a Standardized Office Proficiency Exam?Uncategorized1d23.26
19Machine Learning Methods for Studying Latent Neural Activity DynamicsUncategorized1d23.15
20ComBench: A Benchmark for Rigorous Proof Reasoning and Constructive Realization in Olympiad-Level CombinatoricsUncategorized1d23.15
21Role-Agent: Bootstrapping LLM Agents via Dual-Role EvolutionUncategorized1d23.15
22T1-Bench: Benchmarking Multi-Scenario Agents in Real-World DomainsUncategorized1d23.15
23Test-Time Gradient Guidance of Flow Policies in Reinforcement LearningUncategorized1d23.15
24CIAware-Bench: Benchmarking Control Intervention Awareness Across Frontier LLMsUncategorized1d23.15
25Dep-LLM: Training-Free Depression Diagnosis via Evidence-Guided Structured Multi-factor with Reliable LLM ReasoningUncategorized1d23.12
26Test-time Adversarial Takeover: A Real-time Hijacking Interface against Robotic Diffusion PoliciesUncategorized1d23.11
27Do VLMs Reason Like Engineers? A Benchmark and a Stage-wise EvaluationUncategorized1d22.81
28Supervised Fine-tuning with Synthetic Rationale Data Hurts Real-World Disease PredictionUncategorized1d22.71
29Reasoning or Memorization? Direction-Aware Diversity Exploration in LLM Reinforcement LearningUncategorized1d22.71
30Expert-Level Crisis Detection in Mental Health ConversationsUncategorized1d22.71

Select a paper from the list to view details.

Build Loop · Decide which papers become startups. · Today's queue.