# ScienceToStartup

> Research commercialization OS for agents.

## Start Here
- Human setup guide: https://sciencetostartup.com/developers/start-here
- Create key: https://sciencetostartup.com/developers/keys
- Verify key context: GET https://sciencetostartup.com/api/developers/whoami
- MCP endpoint: https://sciencetostartup.com/api/mcp

## Golden Path
- Recommended sequence:
  1. `search_papers`
  2. `workspace_create_from_seed`
  3. `workspace_run_action` with `run_kind=sidekick_brief`
  4. `workspace_run_action` with `run_kind=launch_pack`
  5. `get_launch_pack`
- Call `whoami` first in MCP clients to confirm the key owner, rate limits, and accessible workspaces.
- Workspace-native tools are bound to the authenticated key owner on the server. Do not supply arbitrary `user_id` values.

## REST Discovery
- OpenAPI spec: https://sciencetostartup.com/api/openapi.json
- Search papers: https://sciencetostartup.com/api/v1/free/papers
- Signal Fusion rankings: https://sciencetostartup.com/api/v1/signal-fusion
- Research search: POST https://sciencetostartup.com/api/v1/research/search
- Deep-search runs: POST https://sciencetostartup.com/api/v1/research/deep-search
- Canonical research report: GET https://sciencetostartup.com/api/v1/research/report/{runId}
- Research rerun diff: GET https://sciencetostartup.com/api/v1/research/report/{runId}/diff
- Research artifact: GET https://sciencetostartup.com/api/v1/research/report/{runId}/artifacts/{kind}
- JSON feed: https://sciencetostartup.com/api/feed/papers.json
- Signal Fusion feed: https://sciencetostartup.com/api/feed/signal-fusion.json
- Buildable papers feed: https://sciencetostartup.com/api/feed/buildable-papers.json
- Agent run-log feed: https://sciencetostartup.com/api/feed/agent-runs.json

## MCP Tools
- Identity: `whoami`
- Discovery: `search_papers`, `get_paper`, `get_signal_fusion_rankings`, `search_signal_canvas`
- Workspace: `workspace_list`, `workspace_get`, `workspace_create_from_seed`, `workspace_run_action`, `workspace_run_log`, `workspace_resolve_approval`, `workspace_deal_room_get`, `workspace_record_decision`
- Sidekick: `sidekick_profile_get`, `sidekick_profile_upsert`, `sidekick_background_overview`
- Execution: `build_room_create`, `build_room_get`, `build_room_run`, `get_launch_pack`, `get_diligence_memo_export`

## Stable Contracts
- Workspace seeds/source types: `paper`, `evidence_query`, `deep_search_run`, `report_run`, `topic`, `lab`, `institution`, `watchlist_entity`, `launch_pack`, `hugging_face_asset`
- Workspace run kinds: `workspace_refresh`, `evidence_rerun`, `recommendation`, `sidekick_brief`, `next_action_draft`, `launch_pack`
- Evidence receipts now travel with workspace briefs, canonical report exports, and launch-pack exports as JSON plus markdown
- Research artifact kinds: `markdown`, `json`, `csv`, `bibtex`, `pdf`, `diff`, `workspace_seed`
- Full docs: https://sciencetostartup.com/llms-full.txt

## Top Papers (by viability score)
- [AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning](https://sciencetostartup.com/paper/adareasoner-dynamic-tool-orchestration-for-iterative-visual-reasoning) (10/10) — AdaReasoner offers dynamic tool orchestration for enhanced visual reasoning in AI models.
- [Type-Aware Retrieval-Augmented Generation with Dependency Closure for Solver-Executable Industrial Optimization Modeling](https://sciencetostartup.com/paper/type-aware-retrieval-augmented-generation-with-dependency-closure-for-solver-executable-industrial-optimization-modeling) (10/10) — Automate industrial optimization modeling using a type-aware retrieval-augmented generation system that ensures solver-e
- [AI-CARE: Carbon-Aware Reporting Evaluation Metric for AI Models](https://sciencetostartup.com/paper/ai-care-carbon-aware-reporting-evaluation-metric-for-ai-models) (9/10) — AI-CARE revolutionizes AI evaluation by providing a carbon-aware metric that empowers sustainable model deployment decis
- [FlowAct-R1: Towards Interactive Humanoid Video Generation](https://sciencetostartup.com/paper/flowact-r1-towards-interactive-humanoid-video-generation) (9/10) — FlowAct-R1 generates lifelike interactive humanoid videos in real-time for virtual avatars and digital companions.
- [JobMatchAI An Intelligent Job Matching Platform Using Knowledge Graphs, Semantic Search and Explainable AI](https://sciencetostartup.com/paper/jobmatchai-an-intelligent-job-matching-platform-using-knowledge-graphs-semantic-search-and-explainable-ai) (9/10) — JobMatchAI is an intelligent job matching platform that leverages knowledge graphs and semantic search to optimize hirin
- [ARISE: Agent Reasoning with Intrinsic Skill Evolution in Hierarchical Reinforcement Learning](https://sciencetostartup.com/paper/arise-agent-reasoning-with-intrinsic-skill-evolution-in-hierarchical-reinforcement-learning) (9/10) — ARISE enhances mathematical reasoning in language models through a hierarchical reinforcement learning framework that ev
- [Odin: Multi-Signal Graph Intelligence for Autonomous Discovery in Knowledge Graphs](https://sciencetostartup.com/paper/odin-multi-signal-graph-intelligence-for-autonomous-discovery-in-knowledge-graphs) (9/10) — Odin offers a cutting-edge graph intelligence engine for autonomous pattern discovery in knowledge graphs, transforming 
- [POLCA: Stochastic Generative Optimization with LLM](https://sciencetostartup.com/paper/polca-stochastic-generative-optimization-with-llm) (9/10) — POLCA is a scalable framework that optimizes complex systems using generative language models guided by feedback.
- [RELATE: A Reinforcement Learning-Enhanced LLM Framework for Advertising Text Generation](https://sciencetostartup.com/paper/relate-a-reinforcement-learning-enhanced-llm-framework-for-advertising-text-generation) (9/10) — A reinforcement learning-powered framework for optimizing advertising text generation in real-time ad platforms.
- [One Size, Many Fits: Aligning Diverse Group-Wise Click Preferences in Large-Scale Advertising Image Generation](https://sciencetostartup.com/paper/one-size-many-fits-aligning-diverse-group-wise-click-preferences-in-large-scale-advertising-image-generation) (9/10) — A framework to tailor advertising images for diverse user groups, boosting CTR and ad effectiveness.
- [IntegratingWeather Foundation Model and Satellite to Enable Fine-Grained Solar Irradiance Forecasting](https://sciencetostartup.com/paper/integratingweather-foundation-model-and-satellite-to-enable-fine-grained-solar-irradiance-forecasting) (9/10) — Baguan-solar integrates weather models and satellite imagery for precise solar irradiance forecasting.
- [Adaptive Block-Scaled Data Types](https://sciencetostartup.com/paper/adaptive-block-scaled-data-types) (9/10) — Design and implement low-precision data types that improve the performance and efficiency of large language models on mo
- [Feature Recalibration Based Olfactory-Visual Multimodal Model for Fine-Grained Rice Deterioration Detection](https://sciencetostartup.com/paper/feature-recalibration-based-olfactory-visual-multimodal-model-for-fine-grained-rice-deterioration-detection) (9/10) — A multimodal AI model for precision detection of rice deterioration, enhancing accuracy and cost-effectiveness in agrifo
- [SciZoom: A Large-scale Benchmark for Hierarchical Scientific Summarization across the LLM Era](https://sciencetostartup.com/paper/scizoom-a-large-scale-benchmark-for-hierarchical-scientific-summarization-across-the-llm-era) (9/10) — SciZoom is a comprehensive benchmark for hierarchical scientific summarization, analyzing the evolution of scientific wr
- [ECHOSAT: Estimating Canopy Height Over Space And Time](https://sciencetostartup.com/paper/echosat-estimating-canopy-height-over-space-and-time) (9/10) — ECHOSAT provides a dynamic global tree height map for enhanced forest monitoring and carbon accounting.
- [A Scalable Curiosity-Driven Game-Theoretic Framework for Long-Tail Multi-Label Learning in Data Mining](https://sciencetostartup.com/paper/a-scalable-curiosity-driven-game-theoretic-framework-for-long-tail-multi-label-learning-in-data-mining) (9/10) — A scalable, curiosity-driven game-theoretic framework to enhance multi-label classification for imbalanced datasets in r
- [The Agentic Researcher: A Practical Guide to AI-Assisted Research in Mathematics and Machine Learning](https://sciencetostartup.com/paper/the-agentic-researcher-a-practical-guide-to-ai-assisted-research-in-mathematics-and-machine-learning) (9/10) — An open-source framework that transforms AI coding agents into autonomous research assistants for mathematics and machin
- [Learning to Watermark in the Latent Space of Generative Models](https://sciencetostartup.com/paper/learning-to-watermark-in-the-latent-space-of-generative-models) (9/10) — Enhancing AI-generated content integrity with robust and efficient latent space watermarking.
- [Rethinking UMM Visual Generation: Masked Modeling for Efficient Image-Only Pre-training](https://sciencetostartup.com/paper/rethinking-umm-visual-generation-masked-modeling-for-efficient-image-only-pre-training) (9/10) — IOMM revolutionizes visual generation by enabling efficient image-only pre-training for unified multimodal models.
- [Multi UAVs Preflight Planning in a Shared and Dynamic Airspace](https://sciencetostartup.com/paper/multi-uavs-preflight-planning-in-a-shared-and-dynamic-airspace) (9/10) — A scalable and efficient solution for preflight planning of large UAV fleets in dynamic urban airspaces.
- [Satellite-Based Detection of Looted Archaeological Sites Using Machine Learning](https://sciencetostartup.com/paper/satellite-based-detection-of-looted-archaeological-sites-using-machine-learning) (9/10) — AI-powered tool to automatically detect looted archaeological sites from satellite imagery, protecting cultural heritage
- [SpiralDiff: Spiral Diffusion with LoRA for RGB-to-RAW Conversion Across Cameras](https://sciencetostartup.com/paper/spiraldiff-spiral-diffusion-with-lora-for-rgb-to-raw-conversion-across-cameras) (9/10) — SpiralDiff revolutionizes RGB-to-RAW image conversion using a diffusion-based framework with camera-specific adaptations
- [Grounding the Score: Explicit Visual Premise Verification for Reliable Vision-Language Process Reward Models](https://sciencetostartup.com/paper/grounding-the-score-explicit-visual-premise-verification-for-reliable-vision-language-process-reward-models) (9/10) — EVPV enhances vision-language models by providing explicit verification of visual premises to improve reasoning accuracy
- [Beyond the Embedding Bottleneck: Adaptive Retrieval-Augmented 3D CT Report Generation](https://sciencetostartup.com/paper/beyond-the-embedding-bottleneck-adaptive-retrieval-augmented-3d-ct-report-generation) (9/10) — AdaRAG-CT enhances automated radiology report generation by overcoming visual representation bottlenecks with adaptive r
- [UMO: Unified In-Context Learning Unlocks Motion Foundation Model Priors](https://sciencetostartup.com/paper/umo-unified-in-context-learning-unlocks-motion-foundation-model-priors) (9/10) — UMO is a unified framework that enhances text-to-motion generation by adapting pretrained models for diverse motion task
- [Hunting CUDA Bugs at Scale with cuFuzz](https://sciencetostartup.com/paper/hunting-cuda-bugs-at-scale-with-cufuzz) (9/10) — cuFuzz is a CUDA-oriented fuzzer that enhances GPU program testing by effectively identifying memory-safety and concurre
- [OneSearch-V2: The Latent Reasoning Enhanced Self-distillation Generative Search Framework](https://sciencetostartup.com/paper/onesearch-v2-the-latent-reasoning-enhanced-self-distillation-generative-search-framework) (9/10) — OneSearch-V2 enhances e-commerce search with reasoning and self-distillation, boosting conversion rates and reducing sea
- [Detect Anything in Real Time: From Single-Prompt Segmentation to Multi-Class Detection](https://sciencetostartup.com/paper/detect-anything-in-real-time-from-single-prompt-segmentation-to-multi-class-detection) (9/10) — DART transforms promptable detection into a real-time multi-class system with significant speed improvements.
- [Designing probabilistic AI monsoon forecasts to inform agricultural decision-making](https://sciencetostartup.com/paper/designing-probabilistic-ai-monsoon-forecasts-to-inform-agricultural-decision-making) (9/10) — AI-powered monsoon forecasting system deployed to 38 million farmers, providing tailored insights for planting decisions
- [OpenClaw-RL: Train Any Agent Simply by Talking](https://sciencetostartup.com/paper/openclaw-rl-train-any-agent-simply-by-talking) (9/10) — OpenClaw-RL enables agents to learn from user interactions in real-time, enhancing their performance through continuous 
- [CIGPose: Causal Intervention Graph Neural Network for Whole-Body Pose Estimation](https://sciencetostartup.com/paper/cigpose-causal-intervention-graph-neural-network-for-whole-body-pose-estimation) (9/10) — CIGPose leverages causal intervention to enhance whole-body pose estimation, achieving state-of-the-art accuracy with ro
- [MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild](https://sciencetostartup.com/paper/metaclaw-just-talk-an-agent-that-meta-learns-and-evolves-in-the-wild) (9/10) — MetaClaw is a continual meta-learning framework that enables LLM agents to adapt and evolve in real-time without downtim
- [Synergistic Directed Execution and LLM-Driven Analysis for Zero-Day AI-Generated Malware Detection](https://sciencetostartup.com/paper/synergistic-directed-execution-and-llm-driven-analysis-for-zero-day-ai-generated-malware-detection) (9/10) — A hybrid analysis framework leveraging LLMs and deep learning to detect AI-generated malware with high accuracy.
- [RealVLG-R1: A Large-Scale Real-World Visual-Language Grounding Benchmark for Robotic Perception and Manipulation](https://sciencetostartup.com/paper/realvlg-r1-a-large-scale-real-world-visual-language-grounding-benchmark-for-robotic-perception-and-manipulation) (9/10) — RealVLG-R1 revolutionizes robotic manipulation by integrating visual-language grounding with a comprehensive dataset and
- [Self-Conditioned Denoising for Atomistic Representation Learning](https://sciencetostartup.com/paper/self-conditioned-denoising-for-atomistic-representation-learning) (9/10) — Self-Conditioned Denoising (SCD) revolutionizes atomistic data representation learning by significantly enhancing perfor
- [Legal-DC: Benchmarking Retrieval-Augmented Generation for Legal Documents](https://sciencetostartup.com/paper/legal-dc-benchmarking-retrieval-augmented-generation-for-legal-documents) (9/10) — Legal-DC offers a specialized benchmark and framework for enhancing retrieval-augmented generation in Chinese legal docu
- [Symphony: A Cognitively-Inspired Multi-Agent System for Long-Video Understanding](https://sciencetostartup.com/paper/symphony-a-cognitively-inspired-multi-agent-system-for-long-video-understanding) (9/10) — Symphony is a multi-agent system designed for enhanced long-video understanding through cognitive-inspired reasoning.
- [MCoT-MVS: Multi-level Vision Selection by Multi-modal Chain-of-Thought Reasoning for Composed Image Retrieval](https://sciencetostartup.com/paper/mcot-mvs-multi-level-vision-selection-by-multi-modal-chain-of-thought-reasoning-for-composed-image-retrieval) (9/10) — MCoT-MVS enhances composed image retrieval by integrating multi-level vision features with multi-modal reasoning for imp
- [Generalizable Geometric Prior and Recurrent Spiking Feature Learning for Humanoid Robot Manipulation](https://sciencetostartup.com/paper/generalizable-geometric-prior-and-recurrent-spiking-feature-learning-for-humanoid-robot-manipulation) (9/10) — Platform for leveraging geometric prior and spiking features to enhance humanoid robot manipulation capabilities in new 
- [A New Dataset and Framework for Robust Road Surface Classification via Camera-IMU Fusion](https://sciencetostartup.com/paper/a-new-dataset-and-framework-for-robust-road-surface-classification-via-camera-imu-fusion) (9/10) — A robust framework for road surface classification using a new multimodal dataset that enhances predictive maintenance v
- [Context and Transcripts Improve Detection of Deepfake Audios of Public Figures](https://sciencetostartup.com/paper/context-and-transcripts-improve-detection-of-deepfake-audios-of-public-figures) (9/10) — A cutting-edge audio deepfake detection tool that leverages context and transcripts for significantly improved accuracy 
- [How does information access affect LLM monitors' ability to detect sabotage?](https://sciencetostartup.com/paper/how-does-information-access-affect-llm-monitors-ability-to-detect-sabotage) (9/10) — Develop a robust LLM monitoring tool using the extract-and-evaluate method to detect sabotage with minimal information e
- [Solver-in-the-Loop: MDP-Based Benchmarks for Self-Correction and Behavioral Rationality in Operations Research](https://sciencetostartup.com/paper/solver-in-the-loop-mdp-based-benchmarks-for-self-correction-and-behavioral-rationality-in-operations-research) (9/10) — A new benchmark suite for iterative self-correction and bias reduction in operations research, outperforming existing me
- [Omnilingual SONAR: Cross-Lingual and Cross-Modal Sentence Embeddings Bridging Massively Multilingual Text and Speech](https://sciencetostartup.com/paper/omnilingual-sonar-cross-lingual-and-cross-modal-sentence-embeddings-bridging-massively-multilingual-text-and-speech) (9/10) — OmniSONAR offers an unprecedented omnilingual cross-modal embedding solution for multilingual translation and search app
- [Optimization and Mobile Deployment for Anthropocene Neural Style Transfer](https://sciencetostartup.com/paper/optimization-and-mobile-deployment-for-anthropocene-neural-style-transfer) (9/10) — AnthropoCam brings real-time neural style transfer to mobile for expressive, environmental visualization of Anthropocene
- [ACPV-Net: All-Class Polygonal Vectorization for Seamless Vector Map Generation from Aerial Imagery](https://sciencetostartup.com/paper/acpv-net-all-class-polygonal-vectorization-for-seamless-vector-map-generation-from-aerial-imagery) (9/10) — Transform aerial imagery into seamless, topology-consistent vector maps with ACPV-Net.
- [ExpressMind: A Multimodal Pretrained Large Language Model for Expressway Operation](https://sciencetostartup.com/paper/expressmind-a-multimodal-pretrained-large-language-model-for-expressway-operation) (9/10) — ExpressMind is a pioneering multimodal AI solution optimizing expressway operations through advanced reasoning and real-
- [Memory-Augmented Vision-Language Agents for Persistent and Semantically Consistent Object Captioning](https://sciencetostartup.com/paper/memory-augmented-vision-language-agents-for-persistent-and-semantically-consistent-object-captioning) (9/10) — A memory-augmented vision-language model ensuring consistent multi-view object captioning for better embodied agent navi
- [MobileFetalCLIP: Selective Repulsive Knowledge Distillation for Mobile Fetal Ultrasound Analysis](https://sciencetostartup.com/paper/mobilefetalclip-selective-repulsive-knowledge-distillation-for-mobile-fetal-ultrasound-analysis) (9/10) — Real-time fetal ultrasound analysis on mobile devices, outperforming larger models with a novel knowledge distillation t
- [CyberThreat-Eval: Can Large Language Models Automate Real-World Threat Research?](https://sciencetostartup.com/paper/cyberthreat-eval-can-large-language-models-automate-real-world-threat-research) (9/10) — CyberThreat-Eval automates threat research using LLMs, enhancing the accuracy and efficiency of Cyber Threat Intelligenc
- [Zipper-LoRA: Dynamic Parameter Decoupling for Speech-LLM based Multilingual Speech Recognition](https://sciencetostartup.com/paper/zipper-lora-dynamic-parameter-decoupling-for-speech-llm-based-multilingual-speech-recognition) (9/10) — Zipper-LoRA enhances multilingual speech recognition by dynamically optimizing language-specific and shared model parame
- [Shattering the Shortcut: A Topology-Regularized Benchmark for Multi-hop Medical Reasoning in LLMs](https://sciencetostartup.com/paper/shattering-the-shortcut-a-topology-regularized-benchmark-for-multi-hop-medical-reasoning-in-llms) (9/10) — ShatterMed-QA is a benchmark designed to enhance multi-hop diagnostic reasoning in LLMs by addressing shortcut learning 
- [Meeting SLOs, Slashing Hours: Automated Enterprise LLM Optimization with OptiKIT](https://sciencetostartup.com/paper/meeting-slos-slashing-hours-automated-enterprise-llm-optimization-with-optikit) (9/10) — OptiKIT automates LLM optimization to save time and resources for enterprises by enhancing GPU throughput and enabling A
- [Feed-forward Gaussian Registration for Head Avatar Creation and Editing](https://sciencetostartup.com/paper/feed-forward-gaussian-registration-for-head-avatar-creation-and-editing) (9/10) — MATCH enables rapid creation and editing of personalized head avatars using a novel multi-view Gaussian registration met
- [Learning Personalized Agents from Human Feedback](https://sciencetostartup.com/paper/learning-personalized-agents-from-human-feedback) (9/10) — A new AI framework that dynamically personalizes agents to user preferences via live feedback, enhancing user interactio
- [Surg-R1: A Hierarchical Reasoning Foundation Model for Scalable and Interpretable Surgical Decision Support with Multi-Center Clinical Validation](https://sciencetostartup.com/paper/surg-r1-a-hierarchical-reasoning-foundation-model-for-scalable-and-interpretable-surgical-decision-support-with-multi-ce) (9/10) — Surg-R1 is a hierarchical reasoning foundation model designed to enhance surgical decision support through interpretable
- [SDF-Net: Structure-Aware Disentangled Feature Learning for Opticall-SAR Ship Re-identification](https://sciencetostartup.com/paper/sdf-net-structure-aware-disentangled-feature-learning-for-opticall-sar-ship-re-identification) (9/10) — SDF-Net uses a structure-aware network to enhance cross-modal ship re-identification between optical and SAR imagery.
- [RoboSubtaskNet: Temporal Sub-task Segmentation for Human-to-Robot Skill Transfer in Real-World Environments](https://sciencetostartup.com/paper/robosubtasknet-temporal-sub-task-segmentation-for-human-to-robot-skill-transfer-in-real-world-environments) (9/10) — RoboSubtaskNet enables effective human-to-robot skill transfer for precise and adaptive task automation in collaborative
- [VitalDiagnosis: AI-Driven Ecosystem for 24/7 Vital Monitoring and Chronic Disease Management](https://sciencetostartup.com/paper/vitaldiagnosis-ai-driven-ecosystem-for-24-7-vital-monitoring-and-chronic-disease-management) (9/10) — VitalDiagnosis: AI-driven chronic disease management through proactive engagement and wearable device integration.
- [MetaboNet: The Largest Publicly Available Consolidated Dataset for Type 1 Diabetes Management](https://sciencetostartup.com/paper/metabonet-the-largest-publicly-available-consolidated-dataset-for-type-1-diabetes-management) (9/10) — MetaboNet offers a standardized, consolidated dataset for type 1 diabetes management, poised to become the benchmark for
- [LOOKAT: Lookup-Optimized Key-Attention for Memory-Efficient Transformers](https://sciencetostartup.com/paper/lookat-lookup-optimized-key-attention-for-memory-efficient-transformers) (9/10) — Introduce LOOKAT to significantly compress KV-cache for edge deployment without architecture changes.
- [PathGLS: Evaluating Pathology Vision-Language Models without Ground Truth through Multi-Dimensional Consistency](https://sciencetostartup.com/paper/pathgls-evaluating-pathology-vision-language-models-without-ground-truth-through-multi-dimensional-consistency) (9/10) — PathGLS is a novel evaluation framework for pathology vision-language models that quantifies hallucination rates and rob
- [Gen-Searcher: Reinforcing Agentic Search for Image Generation](https://sciencetostartup.com/paper/gen-searcher-reinforcing-agentic-search-for-image-generation) (9/10) — Gen-Searcher leverages agentic reinforcement learning for search-augmented image generation, delivering contextually rel
- [Operationalising Cyber Risk Management Using AI: Connecting Cyber Incidents to MITRE ATT&CK Techniques, Security Controls, and Metrics](https://sciencetostartup.com/paper/operationalising-cyber-risk-management-using-ai-connecting-cyber-incidents-to-mitre-att-ck-techniques-security-controls-) (9/10) — An AI-driven framework that automates the mapping of cyber incidents to security controls, enhancing threat intelligence
- [OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data](https://sciencetostartup.com/paper/openseeker-democratizing-frontier-search-agents-by-fully-open-sourcing-training-data) (9/10) — Fully open-source search agent democratizing high-performance frontier search through open data and code.
- [Less Data, Faster Convergence: Goal-Driven Data Optimization for Multimodal Instruction Tuning](https://sciencetostartup.com/paper/less-data-faster-convergence-goal-driven-data-optimization-for-multimodal-instruction-tuning) (9/10) — GDO optimizes data usage for multimodal instruction tuning, achieving faster convergence with fewer samples.
- [Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking](https://sciencetostartup.com/paper/jailbreak-foundry-from-papers-to-runnable-attacks-for-reproducible-benchmarking) (9/10) — Automatically convert jailbreak research into standardized attack modules for consistent benchmarking.
- [Point Cloud as a Foreign Language for Multi-modal Large Language Model](https://sciencetostartup.com/paper/point-cloud-as-a-foreign-language-for-multi-modal-large-language-model) (9/10) — SAGE is an end-to-end 3D multi-modal large language model that processes raw point clouds for enhanced 3D understanding.
- [KG-CRAFT: Knowledge Graph-based Contrastive Reasoning with LLMs for Enhancing Automated Fact-checking](https://sciencetostartup.com/paper/kg-craft-knowledge-graph-based-contrastive-reasoning-with-llms-for-enhancing-automated-fact-checking) (9/10) — KG-CRAFT uses knowledge graphs and contrastive reasoning to enhance fact-checking accuracy, achieving state-of-the-art r
- [Exploring Reasoning Reward Model for Agents](https://sciencetostartup.com/paper/exploring-reasoning-reward-model-for-agents) (9/10) — A breakthrough reinforcement learning platform that enhances agent reasoning with multi-level feedback, improving perfor
- [HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems](https://sciencetostartup.com/paper/hubscan-detecting-hubness-poisoning-in-retrieval-augmented-generation-systems) (9/10) — HubScan detects and mitigates hubness poisoning attacks in retrieval-augmented generation systems for secure AI data acc
- [ReHARK: Refined Hybrid Adaptive RBF Kernels for Robust One-Shot Vision-Language Adaptation](https://sciencetostartup.com/paper/rehark-refined-hybrid-adaptive-rbf-kernels-for-robust-one-shot-vision-language-adaptation) (9/10) — ReHARK offers a novel training-free framework for robust one-shot adaptation of Vision-Language Models, achieving state-
- [ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning](https://sciencetostartup.com/paper/reflexicoder-teaching-large-language-models-to-self-reflect-on-generated-code-and-self-correct-it-via-reinforcement-lear) (9/10) — ReflexiCoder is an RL-trained LLM that self-reflects and corrects code, achieving SOTA performance with improved token e
- [AdapterTune: Zero-Initialized Low-Rank Adapters for Frozen Vision Transformers](https://sciencetostartup.com/paper/adaptertune-zero-initialized-low-rank-adapters-for-frozen-vision-transformers) (9/10) — AdapterTune optimizes Vision Transformers by introducing zero-initialized low-rank adapters, significantly improving tra
- [Logics-Parsing-Omni Technical Report](https://sciencetostartup.com/paper/logics-parsing-omni-technical-report) (9/10) — AI-driven framework for parsing unstructured multimedia into structured, machine-readable knowledge.
- [Toward Complex-Valued Neural Networks for Waveform Generation](https://sciencetostartup.com/paper/toward-complex-valued-neural-networks-for-waveform-generation) (9/10) — ComVo is a complex-valued neural vocoder that enhances waveform generation with structured feedback and improved trainin
- [SynCABEL: Synthetic Contextualized Augmentation for Biomedical Entity Linking](https://sciencetostartup.com/paper/syncabel-synthetic-contextualized-augmentation-for-biomedical-entity-linking) (9/10) — Revolutionize biomedical entity linking using synthetic augmentation to significantly reduce data annotation costs.
- [Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception](https://sciencetostartup.com/paper/zooming-without-zooming-region-to-image-distillation-for-fine-grained-multimodal-perception) (9/10) — Region-to-Image Distillation for improving fine-grained multimodal perception in MLLMs.
- [Meissa: Multi-modal Medical Agentic Intelligence](https://sciencetostartup.com/paper/meissa-multi-modal-medical-agentic-intelligence) (9/10) — Meissa is a lightweight, offline multi-modal medical language model that enhances clinical decision-making with agentic 
- [Robo-Saber: Generating and Simulating Virtual Reality Players](https://sciencetostartup.com/paper/robo-saber-generating-and-simulating-virtual-reality-players) (9/10) — Robo-Saber revolutionizes VR game testing by automatically generating realistic player data to streamline development an
- [Physics-Informed Neural Engine Sound Modeling with Differentiable Pulse-Train Synthesis](https://sciencetostartup.com/paper/physics-informed-neural-engine-sound-modeling-with-differentiable-pulse-train-synthesis) (9/10) — A physics-informed neural engine sound modeling tool that synthesizes realistic engine audio using pulse-train resonator
- [A Methodology for Thermal Limit Bias Predictability Through Artificial Intelligence](https://sciencetostartup.com/paper/a-methodology-for-thermal-limit-bias-predictability-through-artificial-intelligence) (9/10) — A deep learning methodology that predicts and corrects thermal limit bias in nuclear power plants to enhance operational
- [FAME: Force-Adaptive RL for Expanding the Manipulation Envelope of a Full-Scale Humanoid](https://sciencetostartup.com/paper/fame-force-adaptive-rl-for-expanding-the-manipulation-envelope-of-a-full-scale-humanoid) (9/10) — FAME is a force-adaptive reinforcement learning framework that enhances humanoid manipulation by adapting to external fo
- [DyWeight: Dynamic Gradient Weighting for Few-Step Diffusion Sampling](https://sciencetostartup.com/paper/dyweight-dynamic-gradient-weighting-for-few-step-diffusion-sampling) (9/10) — DyWeight introduces a dynamic gradient weighting method to enhance the efficiency of diffusion models in generative task
- [CEI-3D: Collaborative Explicit-Implicit 3D Reconstruction for Realistic and Fine-Grained Object Editing](https://sciencetostartup.com/paper/cei-3d-collaborative-explicit-implicit-3d-reconstruction-for-realistic-and-fine-grained-object-editing) (9/10) — CEI-3D is a collaborative 3D reconstruction pipeline that enables realistic and fine-grained object editing with localiz
- [WAFT-Stereo: Warping-Alone Field Transforms for Stereo Matching](https://sciencetostartup.com/paper/waft-stereo-warping-alone-field-transforms-for-stereo-matching) (9/10) — A revolutionary warping-based stereo matching solution that outperforms existing methods in accuracy and speed.
- [PyHealth 2.0: A Comprehensive Open-Source Toolkit for Accessible and Reproducible Clinical Deep Learning](https://sciencetostartup.com/paper/pyhealth-2-0-a-comprehensive-open-source-toolkit-for-accessible-and-reproducible-clinical-deep-learning) (9/10) — PyHealth 2.0 offers an open-source toolkit for accessible and reproducible clinical AI, bridging the gap between technic
- [PathoScribe: Transforming Pathology Data into a Living Library with a Unified LLM-Driven Framework for Semantic Retrieval and Clinical Integration](https://sciencetostartup.com/paper/pathoscribe-transforming-pathology-data-into-a-living-library-with-a-unified-llm-driven-framework-for-semantic-retrieval) (9/10) — PathoScribe transforms static pathology archives into an interactive, LLM-driven living library for enhanced clinical de
- [Causal-JEPA: Learning World Models through Object-Level Latent Interventions](https://sciencetostartup.com/paper/causal-jepa-learning-world-models-through-object-level-latent-interventions) (9/10) — C-JEPA offers an efficient object-centric world model enhancing visual question answering and agent control with latent 
- [PACE-RAG: Patient-Aware Contextual and Evidence-based Policy RAG for Clinical Drug Recommendation](https://sciencetostartup.com/paper/pace-rag-patient-aware-contextual-and-evidence-based-policy-rag-for-clinical-drug-recommendation) (9/10) — PACE-RAG is a personalized drug recommendation system that integrates patient context with clinical prescribing patterns
- [Think While Watching: Online Streaming Segment-Level Memory for Multi-Turn Video Reasoning in Multimodal Large Language Models](https://sciencetostartup.com/paper/think-while-watching-online-streaming-segment-level-memory-for-multi-turn-video-reasoning-in-multimodal-large-language-m) (9/10) — A memory-anchored framework for real-time multi-turn video reasoning in multimodal large language models.
- [Unleashing Video Language Models for Fine-grained HRCT Report Generation](https://sciencetostartup.com/paper/unleashing-video-language-models-for-fine-grained-hrct-report-generation) (9/10) — AbSteering leverages Video Language Models for precise HRCT report generation, enhancing diagnostic accuracy in medical 
- [Medical SAM3: A Foundation Model for Universal Prompt-Driven Medical Image Segmentation](https://sciencetostartup.com/paper/medical-sam3-a-foundation-model-for-universal-prompt-driven-medical-image-segmentation) (9/10) — Medical SAM3 delivers a universal, prompt-driven segmentation model for medical imaging, solving domain shift challenges
- [Listen to the Layers: Mitigating Hallucinations with Inter-Layer Disagreement](https://sciencetostartup.com/paper/listen-to-the-layers-mitigating-hallucinations-with-inter-layer-disagreement) (9/10) — CoCoA offers a novel training-free method to significantly reduce AI hallucinations at inference time, enhancing LLM rel
- [Enhanced Portable Ultra Low-Field Diffusion Tensor Imaging with Bayesian Artifact Correction and Deep Learning-Based Super-Resolution](https://sciencetostartup.com/paper/enhanced-portable-ultra-low-field-diffusion-tensor-imaging-with-bayesian-artifact-correction-and-deep-learning-based-sup) (9/10) — Develops a portable ultra-low-field MRI enhancement tool for improved neuroimaging quality with Bayesian and super-resol
- [MV-SAM3D: Adaptive Multi-View Fusion for Layout-Aware 3D Generation](https://sciencetostartup.com/paper/mv-sam3d-adaptive-multi-view-fusion-for-layout-aware-3d-generation) (9/10) — MV-SAM3D enhances 3D generation by integrating multi-view consistency and physical plausibility without additional train
- [DesertFormer: Transformer-Based Semantic Segmentation for Off-Road Desert Terrain Classification in Autonomous Navigation Systems](https://sciencetostartup.com/paper/desertformer-transformer-based-semantic-segmentation-for-off-road-desert-terrain-classification-in-autonomous-navigation) (9/10) — DesertFormer is a semantic segmentation tool for classifying off-road desert terrain to enhance autonomous navigation sa
- [ACE-LoRA: Graph-Attentive Context Enhancement for Parameter-Efficient Adaptation of Medical Vision-Language Models](https://sciencetostartup.com/paper/ace-lora-graph-attentive-context-enhancement-for-parameter-efficient-adaptation-of-medical-vision-language-models) (9/10) — ACE-LoRA enhances medical vision-language models with parameter-efficient adaptation for improved diagnostic accuracy.
- [RUVA: Personalized Transparent On-Device Graph Reasoning](https://sciencetostartup.com/paper/ruva-personalized-transparent-on-device-graph-reasoning) (9/10) — RUVA offers on-device, transparent, and editable personal AI knowledge management, ensuring user privacy and control.
- [Rethinking ANN-based Retrieval: Multifaceted Learnable Index for Large-scale Recommendation System](https://sciencetostartup.com/paper/rethinking-ann-based-retrieval-multifaceted-learnable-index-for-large-scale-recommendation-system) (9/10) — A real-time recommendation framework that replaces ANN search with a learnable multifaceted index for better efficiency 
- [MDER-DR: Multi-Hop Question Answering with Entity-Centric Summaries](https://sciencetostartup.com/paper/mder-dr-multi-hop-question-answering-with-entity-centric-summaries) (9/10) — MDER-DR is a robust, LLM-driven QA pipeline that enhances multi-hop question answering using knowledge graphs.
- [SafeLand: Safe Autonomous Landing in Unknown Environments with Bayesian Semantic Mapping](https://sciencetostartup.com/paper/safeland-safe-autonomous-landing-in-unknown-environments-with-bayesian-semantic-mapping) (9/10) — SafeLand is a vision-based system for safe autonomous landing of UAVs in dynamic environments without prior information.
- [Tabular LLMs for Interpretable Few-Shot Alzheimer's Disease Prediction with Multimodal Biomedical Data](https://sciencetostartup.com/paper/tabular-llms-for-interpretable-few-shot-alzheimer-s-disease-prediction-with-multimodal-biomedical-data) (9/10) — TAP-GPT is a domain-adapted tabular LLM for accurate Alzheimer's disease prediction using multimodal biomedical data.
- [Fish Audio S2 Technical Report](https://sciencetostartup.com/paper/fish-audio-s2-technical-report) (9/10) — Fish Audio S2 is an open-sourced text-to-speech system that enables multi-speaker, instruction-following audio generatio
- [When should I search more: Adaptive Complex Query Optimization with Reinforcement Learning](https://sciencetostartup.com/paper/when-should-i-search-more-adaptive-complex-query-optimization-with-reinforcement-learning) (9/10) — Adaptive Complex Query Optimization (ACQO) leverages reinforcement learning to revolutionize query optimization in Retri
- [Self-Evolving Recommendation System: End-To-End Autonomous Model Optimization With LLM Agents](https://sciencetostartup.com/paper/self-evolving-recommendation-system-end-to-end-autonomous-model-optimization-with-llm-agents) (9/10) — Develop autonomous recommendation system optimization with LLM agents for improved user engagement.
- [SoftJAX & SoftTorch: Empowering Automatic Differentiation Libraries with Informative Gradients](https://sciencetostartup.com/paper/softjax-softtorch-empowering-automatic-differentiation-libraries-with-informative-gradients) (9/10) — SoftJAX and SoftTorch provide open-source libraries for soft differentiable programming, enhancing automatic differentia
- [MAC: A Conversion Rate Prediction Benchmark Featuring Labels Under Multiple Attribution Mechanisms](https://sciencetostartup.com/paper/mac-a-conversion-rate-prediction-benchmark-featuring-labels-under-multiple-attribution-mechanisms) (9/10) — MAC provides a conversion rate prediction benchmark featuring multi-attribution labels, significantly enhancing accuracy
- [AW-MoE: All-Weather Mixture of Experts for Robust Multi-Modal 3D Object Detection](https://sciencetostartup.com/paper/aw-moe-all-weather-mixture-of-experts-for-robust-multi-modal-3d-object-detection) (9/10) — AW-MoE enhances 3D object detection in adverse weather conditions using a novel Mixture of Experts framework.
- [Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind](https://sciencetostartup.com/paper/dancing-in-chains-strategic-persuasion-in-academic-rebuttal-via-theory-of-mind) (8.8/10) — Revolutionize academic rebuttals with AI-driven strategic persuasion leveraging Theory of Mind.
- [LLM-Guided Quantified SMT Solving over Uninterpreted Functions](https://sciencetostartup.com/paper/llm-guided-quantified-smt-solving-over-uninterpreted-functions) (8.7/10) — AquaForte leverages large language models to optimize SMT solving over uninterpreted functions, significantly outperform
- [Private LLM Inference on Consumer Blackwell GPUs: A Practical Guide for Cost-Effective Local Deployment in SMEs](https://sciencetostartup.com/paper/private-llm-inference-on-consumer-blackwell-gpus-a-practical-guide-for-cost-effective-local-deployment-in-smes) (8.7/10) — Deploy cost-effective private LLM inference on consumer GPUs for SMEs, enhancing privacy and reducing costs.
- [Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning](https://sciencetostartup.com/paper/simple-recipe-works-vision-language-action-models-are-natural-continual-learners-with-reinforcement-learning) (8/10) — A novel approach to continual reinforcement learning for vision-language-action models that enhances adaptability and re
- [Seeing Beyond: Extrapolative Domain Adaptive Panoramic Segmentation](https://sciencetostartup.com/paper/seeing-beyond-extrapolative-domain-adaptive-panoramic-segmentation) (8/10) — EDA-PSeg enhances panoramic semantic segmentation by addressing geometric distortions and unseen classes through innovat
- [SimuAgent: An LLM-Based Simulink Modeling Assistant Enhanced with Reinforcement Learning](https://sciencetostartup.com/paper/simuagent-an-llm-based-simulink-modeling-assistant-enhanced-with-reinforcement-learning) (8/10) — SimuAgent provides an AI-driven, efficient modeling assistant for Simulink users, enhancing design productivity and accu
- [Safe and Scalable Web Agent Learning via Recreated Websites](https://sciencetostartup.com/paper/safe-and-scalable-web-agent-learning-via-recreated-websites) (8/10) — VeriEnv is a framework that enables safe and scalable training of web agents by recreating real-world websites into synt
- [Rethinking Diffusion Models with Symmetries through Canonicalization with Applications to Molecular Graph Generation](https://sciencetostartup.com/paper/rethinking-diffusion-models-with-symmetries-through-canonicalization-with-applications-to-molecular-graph-generation) (8/10) — Introducing a novel canonical diffusion framework for efficient and expressive molecular graph generation.
- [Reasoning-Oriented Programming: Chaining Semantic Gadgets to Jailbreak Large Vision Language Models](https://sciencetostartup.com/paper/reasoning-oriented-programming-chaining-semantic-gadgets-to-jailbreak-large-vision-language-models) (8/10) — Introducing a framework that exploits vulnerabilities in large vision-language models to bypass safety alignment.
- [Rethinking Refinement: Correcting Generative Bias without Noise Injection](https://sciencetostartup.com/paper/rethinking-refinement-correcting-generative-bias-without-noise-injection) (8/10) — Bi-stage Flow Refinement (BFR) framework offers state-of-the-art bias correction for generative models, improving image 
- [Safe Consensus of Cooperative Manipulation with Hierarchical Event-Triggered Control Barrier Functions](https://sciencetostartup.com/paper/safe-consensus-of-cooperative-manipulation-with-hierarchical-event-triggered-control-barrier-functions) (8/10) — A distributed control framework for multi-robot cooperative manipulation that ensures safety and reduces computational c
- [RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies](https://sciencetostartup.com/paper/robomme-benchmarking-and-understanding-memory-for-robotic-generalist-policies) (8/10) — RoboMME provides a standardized benchmark and evaluation suite for enhancing memory capabilities in vision-language-acti
- [S-VAM: Shortcut Video-Action Model by Self-Distilling Geometric and Semantic Foresight](https://sciencetostartup.com/paper/s-vam-shortcut-video-action-model-by-self-distilling-geometric-and-semantic-foresight) (8/10) — S-VAM is a shortcut video-action model that enhances robot learning through efficient geometric and semantic foresight.
- [SciDER: Scientific Data-centric End-to-end Researcher](https://sciencetostartup.com/paper/scider-scientific-data-centric-end-to-end-researcher) (8/10) — SciDER is a Python package that automates scientific research by analyzing data and producing executable code, accelerat
- [Seed2Scale: A Self-Evolving Data Engine for Embodied AI via Small to Large Model Synergy and Multimodal Evaluation](https://sciencetostartup.com/paper/seed2scale-a-self-evolving-data-engine-for-embodied-ai-via-small-to-large-model-synergy-and-multimodal-evaluation) (8/10) — Seed2Scale is a self-evolving data engine for embodied AI that leverages small and large model synergy to generate high-
- [Self-MedRAG: a Self-Reflective Hybrid Retrieval-Augmented Generation Framework for Reliable Medical Question Answering](https://sciencetostartup.com/paper/self-medrag-a-self-reflective-hybrid-retrieval-augmented-generation-framework-for-reliable-medical-question-answering) (8/10) — Self-MedRAG enhances medical question answering reliability by integrating hybrid retrieval and iterative self-reflectio
- [Sim2real Image Translation Enables Viewpoint-Robust Policies from Fixed-Camera Datasets](https://sciencetostartup.com/paper/sim2real-image-translation-enables-viewpoint-robust-policies-from-fixed-camera-datasets) (8/10) — MANGO enables robust robot vision policies through sim2real image translation, leveraging viewpoint diversity from simul
- [Simulation Distillation: Pretraining World Models in Simulation for Rapid Real-World Adaptation](https://sciencetostartup.com/paper/simulation-distillation-pretraining-world-models-in-simulation-for-rapid-real-world-adaptation) (8/10) — SimDist enables rapid real-world adaptation in robotics by distilling structural priors from simulation for efficient pl
- [RADAR: Closed-Loop Robotic Data Generation via Semantic Planning and Autonomous Causal Environment Reset](https://sciencetostartup.com/paper/radar-closed-loop-robotic-data-generation-via-semantic-planning-and-autonomous-causal-environment-reset) (8/10) — RADAR is an autonomous data generation engine that revolutionizes robotic learning by eliminating human intervention in 
- [ProgAgent:A Continual RL Agent with Progress-Aware Rewards](https://sciencetostartup.com/paper/progagent-a-continual-rl-agent-with-progress-aware-rewards) (8/10) — ProgAgent is a continual reinforcement learning agent that learns from unlabeled expert videos and adapts to new tasks, 
- [Randomization Boosts KV Caching, Learning Balances Query Load: A Joint Perspective](https://sciencetostartup.com/paper/randomization-boosts-kv-caching-learning-balances-query-load-a-joint-perspective) (8/10) — Optimize LLM inference with a novel algorithm for KV caching that dramatically reduces latency and boosts efficiency.
- [Phishing the Phishers with SpecularNet: Hierarchical Graph Autoencoding for Reference-Free Web Phishing Detection](https://sciencetostartup.com/paper/phishing-the-phishers-with-specularnet-hierarchical-graph-autoencoding-for-reference-free-web-phishing-detection) (8/10) — SpecularNet offers a lightweight, reference-free framework for rapid phishing detection using hierarchical graph autoenc
- [Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders](https://sciencetostartup.com/paper/penguin-vl-exploring-the-efficiency-limits-of-vlm-with-llm-based-vision-encoders) (8/10) — Penguin-VL offers a lightweight, high-fidelity VLM solution for resource-constrained devices, outperforming leading VLMs
- [Pinterest Canvas: Large-Scale Image Generation at Pinterest](https://sciencetostartup.com/paper/pinterest-canvas-large-scale-image-generation-at-pinterest) (8/10) — Pinterest Canvas is a large-scale image generation system that fine-tunes diffusion models for specific image editing ta
- [Ranking Reasoning LLMs under Test-Time Scaling](https://sciencetostartup.com/paper/ranking-reasoning-llms-under-test-time-scaling) (8/10) — Scorio is an open-source library for ranking reasoning LLMs under test-time scaling using advanced statistical methods.
- [PACE: A Personalized Adaptive Curriculum Engine for 9-1-1 Call-taker Training](https://sciencetostartup.com/paper/pace-a-personalized-adaptive-curriculum-engine-for-9-1-1-call-taker-training) (8/10) — PACE is a co-pilot system that personalizes 9-1-1 call-taker training, accelerating competence and improving mastery.
- [Orion-RAG: Path-Aligned Hybrid Retrieval for Graphless Data](https://sciencetostartup.com/paper/orion-rag-path-aligned-hybrid-retrieval-for-graphless-data) (8/10) — Orion-RAG optimizes data retrieval by creating lightweight paths linking fragmented documents to transform them into sem
- [PanoAffordanceNet: Towards Holistic Affordance Grounding in 360° Indoor Environments](https://sciencetostartup.com/paper/panoaffordancenet-towards-holistic-affordance-grounding-in-360-indoor-environments) (8/10) — PanoAffordanceNet enables holistic affordance grounding in 360° indoor environments, enhancing scene-level perception fo
- [One-Step Flow Policy: Self-Distillation for Fast Visuomotor Policies](https://sciencetostartup.com/paper/one-step-flow-policy-self-distillation-for-fast-visuomotor-policies) (8/10) — One-Step Flow Policy revolutionizes robotic control with high-fidelity, low-latency action generation.
- [O3N: Omnidirectional Open-Vocabulary Occupancy Prediction](https://sciencetostartup.com/paper/o3n-omnidirectional-open-vocabulary-occupancy-prediction) (8/10) — O3N is an omnidirectional occupancy prediction framework that enhances 3D perception for autonomous agents through advan
- [OneWorld: Taming Scene Generation with 3D Unified Representation Autoencoder](https://sciencetostartup.com/paper/oneworld-taming-scene-generation-with-3d-unified-representation-autoencoder) (8/10) — OneWorld is a framework for generating high-quality 3D scenes with superior cross-view consistency using a unified repre
- [Panoramic Multimodal Semantic Occupancy Prediction for Quadruped Robots](https://sciencetostartup.com/paper/panoramic-multimodal-semantic-occupancy-prediction-for-quadruped-robots) (8/10) — Develop VoxelHound, a panoramic multimodal perception framework for quadruped robots, using the new PanoMMOcc dataset.
- [RareAlert: Aligning heterogeneous large language model reasoning for early rare disease risk screening](https://sciencetostartup.com/paper/rarealert-aligning-heterogeneous-large-language-model-reasoning-for-early-rare-disease-risk-screening) (8/10) — RareAlert provides early risk screening for rare diseases using calibrated LLM reasoning, facilitating quicker diagnosis
- [SinGeo: Unlock Single Model's Potential for Robust Cross-View Geo-Localization](https://sciencetostartup.com/paper/singeo-unlock-single-model-s-potential-for-robust-cross-view-geo-localization) (8/10) — SinGeo is a robust framework for cross-view geo-localization using a single model, outperforming existing methods with s
- [Multi-DNN Inference of Sparse Models on Edge SoCs](https://sciencetostartup.com/paper/multi-dnn-inference-of-sparse-models-on-edge-socs) (8/10) — SparseLoom enhances multi-DNN inference on edge devices by optimizing model deployment without retraining.
- [mmGAT: Pose Estimation by Graph Attention with Mutual Features from mmWave Radar Point Cloud](https://sciencetostartup.com/paper/mmgat-pose-estimation-by-graph-attention-with-mutual-features-from-mmwave-radar-point-cloud) (8/10) — mmGAT leverages mmWave radar and Graph Neural Networks to achieve state-of-the-art human pose estimation, offering a pri
- [Multilingual Reference Need Assessment System for Wikipedia](https://sciencetostartup.com/paper/multilingual-reference-need-assessment-system-for-wikipedia) (8/10) — A multilingual machine learning system that assists Wikipedia editors in identifying claims needing citations, enhancing
- [MERLIN: Building Low-SNR Robust Multimodal LLMs for Electromagnetic Signals](https://sciencetostartup.com/paper/merlin-building-low-snr-robust-multimodal-llms-for-electromagnetic-signals) (8/10) — MERLIN is a robust MLLM framework for electromagnetic signals, enhanced for low-SNR environments, with a released datase
- [Med-V1: Small Language Models for Zero-shot and Scalable Biomedical Evidence Attribution](https://sciencetostartup.com/paper/med-v1-small-language-models-for-zero-shot-and-scalable-biomedical-evidence-attribution) (8/10) — Med-V1 is a family of small language models that efficiently and accurately performs biomedical evidence attribution, of
- [MessyKitchens: Contact-rich object-level 3D scene reconstruction](https://sciencetostartup.com/paper/messykitchens-contact-rich-object-level-3d-scene-reconstruction) (8/10) — MessyKitchens offers a novel dataset and advanced methods for accurate 3D scene reconstruction in cluttered environments
- [Multi-turn Physics-informed Vision-language Model for Physics-grounded Anomaly Detection](https://sciencetostartup.com/paper/multi-turn-physics-informed-vision-language-model-for-physics-grounded-anomaly-detection) (8/10) — A physics-informed vision-language model for robust anomaly detection in dynamic systems.
- [MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents](https://sciencetostartup.com/paper/ma-egoqa-question-answering-over-egocentric-videos-from-multiple-embodied-agents) (8/10) — MA-EgoQA enables effective question answering over multiple egocentric videos from embodied agents, enhancing human-agen
- [LLM2Vec-Gen: Generative Embeddings from Large Language Models](https://sciencetostartup.com/paper/llm2vec-gen-generative-embeddings-from-large-language-models) (8/10) — LLM2Vec-Gen leverages self-supervised learning to create generative embeddings from large language models, enhancing per
- [MAC-AMP: A Closed-Loop Multi-Agent Collaboration System for Multi-Objective Antimicrobial Peptide Design](https://sciencetostartup.com/paper/mac-amp-a-closed-loop-multi-agent-collaboration-system-for-multi-objective-antimicrobial-peptide-design) (8/10) — Advanced AI-driven system for designing effective and non-toxic antimicrobial peptides against resistant pathogens.
- [Lifelong Imitation Learning with Multimodal Latent Replay and Incremental Adjustment](https://sciencetostartup.com/paper/lifelong-imitation-learning-with-multimodal-latent-replay-and-incremental-adjustment) (8/10) — A new framework for lifelong imitation learning enabling adaptive robot behavior across evolving tasks using multimodal 
- [Learning Athletic Humanoid Tennis Skills from Imperfect Human Motion Data](https://sciencetostartup.com/paper/learning-athletic-humanoid-tennis-skills-from-imperfect-human-motion-data) (8/10) — LATENT enables humanoid robots to learn tennis skills from imperfect human motion data, achieving robust performance in 
- [LION: A Clifford Neural Paradigm for Multimodal-Attributed Graph Learning](https://sciencetostartup.com/paper/lion-a-clifford-neural-paradigm-for-multimodal-attributed-graph-learning) (8/10) — Develop multimodal-attributed graph learning tool using Clifford algebra to enhance data representation and performance.
- [Multimodal Mixture-of-Experts with Retrieval Augmentation for Protein Active Site Identification](https://sciencetostartup.com/paper/multimodal-mixture-of-experts-with-retrieval-augmentation-for-protein-active-site-identification) (8/10) — MERA leverages retrieval-augmented, multimodal mixture-of-experts for state-of-the-art protein active site identificatio
- [Multimodal OCR: Parse Anything from Documents](https://sciencetostartup.com/paper/multimodal-ocr-parse-anything-from-documents) (8/10) — A next-gen OCR system that parses documents into structured text and graphics for seamless integration and data retrieva
- [KARL: Knowledge Agents via Reinforcement Learning](https://sciencetostartup.com/paper/karl-knowledge-agents-via-reinforcement-learning) (8/10) — KARL leverages innovative reinforcement learning for affordable, high-performance enterprise search agents.
- [Interpretable Traffic Responsibility from Dashcam Video via Legal Multi Agent Reasoning](https://sciencetostartup.com/paper/interpretable-traffic-responsibility-from-dashcam-video-via-legal-multi-agent-reasoning) (8/10) — C-TRAIL transforms dashcam video evidence into legal responsibility assessments using a multimodal approach.
- [KCoEvo: A Knowledge Graph Augmented Framework for Evolutionary Code Generation](https://sciencetostartup.com/paper/kcoevo-a-knowledge-graph-augmented-framework-for-evolutionary-code-generation) (8/10) — KCoEvo is a knowledge graph-augmented framework that helps developers automatically migrate code when APIs evolve, impro
- [Improving through Interaction: Searching Behavioral Representation Spaces with CMA-ES-IG](https://sciencetostartup.com/paper/improving-through-interaction-searching-behavioral-representation-spaces-with-cma-es-ig) (8/10) — CMA-ES-IG enhances robot user interaction by optimizing preference learning through user-friendly behavior rankings.
- [Human-AI Co-reasoning for Clinical Diagnosis with Evidence-Integrated Language Agent](https://sciencetostartup.com/paper/human-ai-co-reasoning-for-clinical-diagnosis-with-evidence-integrated-language-agent) (8/10) — PULSE is a medical reasoning agent that enhances diagnostic decision-making by integrating a domain-tuned language model
- [IMSE: Intrinsic Mixture of Spectral Experts Fine-tuning for Test-Time Adaptation](https://sciencetostartup.com/paper/imse-intrinsic-mixture-of-spectral-experts-fine-tuning-for-test-time-adaptation) (8/10) — IMSE leverages spectral experts in Vision Transformers for test-time adaptation, offering state-of-the-art performance w
- [KDFlow: A User-Friendly and Efficient Knowledge Distillation Framework for Large Language Models](https://sciencetostartup.com/paper/kdflow-a-user-friendly-and-efficient-knowledge-distillation-framework-for-large-language-models) (8/10) — KDFlow streamlines the distillation of large language models with a novel, efficient framework featuring user-friendly A
- [HiMemVLN: Enhancing Reliability of Open-Source Zero-Shot Vision-and-Language Navigation with Hierarchical Memory System](https://sciencetostartup.com/paper/himemvln-enhancing-reliability-of-open-source-zero-shot-vision-and-language-navigation-with-hierarchical-memory-system) (8/10) — HiMemVLN enhances open-source vision-language navigation by addressing Navigation Amnesia with a Hierarchical Memory Sys
- [HeteroFedSyn: Differentially Private Tabular Data Synthesis for Heterogeneous Federated Settings](https://sciencetostartup.com/paper/heterofedsyn-differentially-private-tabular-data-synthesis-for-heterogeneous-federated-settings) (8/10) — HeteroFedSyn is a framework for differentially private tabular data synthesis in heterogeneous federated settings, enabl
- [High-Fidelity Medical Shape Generation via Skeletal Latent Diffusion](https://sciencetostartup.com/paper/high-fidelity-medical-shape-generation-via-skeletal-latent-diffusion) (8/10) — Generate high-fidelity 3D medical shapes from a latent diffusion model, enabling faster and more accurate anatomical mod
- [GNNVerifier: Graph-based Verifier for LLM Task Planning](https://sciencetostartup.com/paper/gnnverifier-graph-based-verifier-for-llm-task-planning) (8/10) — GNNVerifier enhances task planning for LLMs by using a graph-based approach to identify and correct flaws in generated p
- [Generative Visual Code Mobile World Models](https://sciencetostartup.com/paper/generative-visual-code-mobile-world-models) (8/10) — Build the next generation of mobile GUI agents with gWorld: efficient, high-fidelity, code-generating visual world model
- [GUITester: Enabling GUI Agents for Exploratory Defect Discovery](https://sciencetostartup.com/paper/guitester-enabling-gui-agents-for-exploratory-defect-discovery) (8/10) — A multi-agent framework for autonomous exploratory GUI testing that significantly outperforms existing methods.
- [High-Fidelity Pruning for Large Language Models](https://sciencetostartup.com/paper/high-fidelity-pruning-for-large-language-models) (8/10) — Efficiently prune large language models using information entropy to reduce computational costs without sacrificing perf
- [SkinFlow: Efficient Information Transmission for Open Dermatological Diagnosis via Dynamic Visual Encoding and Staged RL](https://sciencetostartup.com/paper/skinflow-efficient-information-transmission-for-open-dermatological-diagnosis-via-dynamic-visual-encoding-and-staged-rl) (8/10) — SkinFlow revolutionizes dermatological diagnosis with efficient visual encoding and reinforcement learning.
- [Size Matters: Reconstructing Real-Scale 3D Models from Monocular Images for Food Portion Estimation](https://sciencetostartup.com/paper/size-matters-reconstructing-real-scale-3d-models-from-monocular-images-for-food-portion-estimation) (8/10) — Precision nutrition tool for accurate food portion estimation using true-to-scale 3D models from photos.
- [SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks](https://sciencetostartup.com/paper/skillsbench-benchmarking-how-well-agent-skills-work-across-diverse-tasks) (8/10) — SkillsBench evaluates the effectiveness of procedural Skills in boosting LLM agent task performance.
- [Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization](https://sciencetostartup.com/paper/frequency-matters-fast-model-agnostic-data-curation-for-pruning-and-quantization) (8/10) — ZipCal is a fast, model-agnostic data curation strategy for optimizing calibration data in model compression.
- [Follow the Saliency: Supervised Saliency for Retrieval-augmented Dense Video Captioning](https://sciencetostartup.com/paper/follow-the-saliency-supervised-saliency-for-retrieval-augmented-dense-video-captioning) (8/10) — STaRC enhances Dense Video Captioning by using supervised saliency for improved temporal segmentation and caption genera
- [From Natural Language to Executable Option Strategies via Large Language Models](https://sciencetostartup.com/paper/from-natural-language-to-executable-option-strategies-via-large-language-models) (8/10) — Transform natural language trading intents into executable option strategies using a novel query language and LLMs.
- [Fine-Grained Post-Training Quantization for Large Vision Language Models with Quantization-Aware Integrated Gradients](https://sciencetostartup.com/paper/fine-grained-post-training-quantization-for-large-vision-language-models-with-quantization-aware-integrated-gradients) (8/10) — A fine-grained quantization strategy for large vision language models that enhances accuracy while reducing computationa
- [Fast SAM 3D Body: Accelerating SAM 3D Body for Real-Time Full-Body Human Mesh Recovery](https://sciencetostartup.com/paper/fast-sam-3d-body-accelerating-sam-3d-body-for-real-time-full-body-human-mesh-recovery) (8/10) — Fast SAM 3D Body accelerates real-time full-body human mesh recovery for interactive applications.
- [Fanar-Sadiq: A Multi-Agent Architecture for Grounded Islamic QA](https://sciencetostartup.com/paper/fanar-sadiq-a-multi-agent-architecture-for-grounded-islamic-qa) (8/10) — Fanar-Sadiq is a multi-agent Islamic assistant that provides grounded answers to religious queries, accessible via API a
- [Fast-WAM: Do World Action Models Need Test-time Future Imagination?](https://sciencetostartup.com/paper/fast-wam-do-world-action-models-need-test-time-future-imagination) (8/10) — Fast-WAM optimizes embodied control by eliminating test-time future imagination while maintaining competitive performanc
- [FINER: MLLMs Hallucinate under Fine-grained Negative Queries](https://sciencetostartup.com/paper/finer-mllms-hallucinate-under-fine-grained-negative-queries) (8/10) — FINER addresses hallucinations in multimodal large language models through innovative fine-grained negative queries and 
- [From Offline to Periodic Adaptation for Pose-Based Shoplifting Detection in Real-world Retail Security](https://sciencetostartup.com/paper/from-offline-to-periodic-adaptation-for-pose-based-shoplifting-detection-in-real-world-retail-security) (8/10) — Pose-based anomaly detection tool for shoplifting, leveraging IoT devices for low-latency monitoring in retail environme
- [Federated Active Learning Under Extreme Non-IID and Global Class Imbalance](https://sciencetostartup.com/paper/federated-active-learning-under-extreme-non-iid-and-global-class-imbalance) (8/10) — FairFAL is an adaptive federated active learning framework that enhances performance in class-imbalanced and non-IID set
- [FiLoRA: Focus-and-Ignore LoRA for Controllable Feature Reliance](https://sciencetostartup.com/paper/filora-focus-and-ignore-lora-for-controllable-feature-reliance) (8/10) — FiLoRA offers controllable feature reliance for robust multimodal model predictions using parameter-efficient adaptation
- [FlashPrefill: Instantaneous Pattern Discovery and Thresholding for Ultra-Fast Long-Context Prefilling](https://sciencetostartup.com/paper/flashprefill-instantaneous-pattern-discovery-and-thresholding-for-ultra-fast-long-context-prefilling) (8/10) — FlashPrefill accelerates long-context LLM prefilling by 27x with a novel pattern discovery and thresholding technique, o
- [FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding](https://sciencetostartup.com/paper/fluxmem-adaptive-hierarchical-memory-for-streaming-video-understanding) (8/10) — FluxMem offers real-time adaptive video compression and understanding for resource-efficient streaming applications.
- [Forest-Chat: Adapting Vision-Language Agents for Interactive Forest Change Analysis](https://sciencetostartup.com/paper/forest-chat-adapting-vision-language-agents-for-interactive-forest-change-analysis) (8/10) — Forest-Chat: An interactive AI tool for forest change analysis using vision-language models to enhance environmental mon
- [From Isolated Scoring to Collaborative Ranking: A Comparison-Native Framework for LLM-Based Paper Evaluation](https://sciencetostartup.com/paper/from-isolated-scoring-to-collaborative-ranking-a-comparison-native-framework-for-llm-based-paper-evaluation) (8/10) — A novel framework for collaborative ranking of scientific papers using LLMs to enhance evaluation accuracy.
- [GAVEL: Towards rule-based safety through activation monitoring](https://sciencetostartup.com/paper/gavel-towards-rule-based-safety-through-activation-monitoring) (8/10) — GAVEL offers an interpretable, customizable rule-based safety framework for real-time activation monitoring in LLMs.
- [Generative Video Compression with One-Dimensional Latent Representation](https://sciencetostartup.com/paper/generative-video-compression-with-one-dimensional-latent-representation) (8/10) — GVC1D revolutionizes video compression by using a compact one-dimensional latent representation to enhance efficiency an
- [Urban Socio-Semantic Segmentation with Vision-Language Reasoning](https://sciencetostartup.com/paper/urban-socio-semantic-segmentation-with-vision-language-reasoning) (8/10) — Revolutionizing urban planning with advanced socio-semantic segmentation from satellite imagery using vision-language mo
- [GMT: Goal-Conditioned Multimodal Transformer for 6-DOF Object Trajectory Synthesis in 3D Scenes](https://sciencetostartup.com/paper/gmt-goal-conditioned-multimodal-transformer-for-6-dof-object-trajectory-synthesis-in-3d-scenes) (8/10) — GMT is a multimodal transformer that generates realistic 6-DOF object manipulation trajectories for robots in complex 3D
- [Gradually Excavating External Knowledge for Implicit Complex Question Answering](https://sciencetostartup.com/paper/gradually-excavating-external-knowledge-for-implicit-complex-question-answering) (8/10) — A framework for open-domain complex question answering that iteratively acquires external information and reasons based 
- [Halfway to 3D: Ensembling 2.5D and 3D Models for Robust COVID-19 CT Diagnosis](https://sciencetostartup.com/paper/halfway-to-3d-ensembling-2-5d-and-3d-models-for-robust-covid-19-ct-diagnosis) (8/10) — A deep learning framework that enhances COVID-19 detection from chest CT scans by integrating 2.5D and 3D models for imp
- [HG-Lane: High-Fidelity Generation of Lane Scenes under Adverse Weather and Lighting Conditions without Re-annotation](https://sciencetostartup.com/paper/hg-lane-high-fidelity-generation-of-lane-scenes-under-adverse-weather-and-lighting-conditions-without-re-annotation) (8/10) — HG-Lane generates high-fidelity lane scenes under adverse conditions to improve autonomous vehicle safety without re-ann
- [HIFICL: High-Fidelity In-Context Learning for Multimodal Tasks](https://sciencetostartup.com/paper/hificl-high-fidelity-in-context-learning-for-multimodal-tasks) (8/10) — HIFICL enhances In-Context Learning for multimodal tasks with a novel approach to context modeling.
- [HMR-1: Hierarchical Massage Robot with Vision-Language-Model for Embodied Healthcare](https://sciencetostartup.com/paper/hmr-1-hierarchical-massage-robot-with-vision-language-model-for-embodied-healthcare) (8/10) — A hierarchical massage robot leveraging vision-language models to enhance physical therapy and rehabilitation.
- [HSC-VLA: Hierarchical Scene-Clearing for Robust Bimanual Manipulation in Dense Clutter](https://sciencetostartup.com/paper/hsc-vla-hierarchical-scene-clearing-for-robust-bimanual-manipulation-in-dense-clutter) (8/10) — HSC-VLA is a hierarchical framework that improves robot manipulation in cluttered environments by decoupling high-level 
- [ID-LoRA: Identity-Driven Audio-Video Personalization with In-Context LoRA](https://sciencetostartup.com/paper/id-lora-identity-driven-audio-video-personalization-with-in-context-lora) (8/10) — ID-LoRA personalizes audio and video together using a single model driven by text prompts and reference media.
- [Impermanent: A Live Benchmark for Temporal Generalization in Time Series Forecasting](https://sciencetostartup.com/paper/impermanent-a-live-benchmark-for-temporal-generalization-in-time-series-forecasting) (8/10) — Impermanent provides a live benchmark and dashboard for evaluating time-series forecasting models, enabling real-time pe
- [Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification](https://sciencetostartup.com/paper/inference-time-scaling-of-verification-self-evolving-deep-research-agents-via-test-time-rubric-guided-verification) (8/10) — A test-time rubric-guided verification system for self-improving AI agents enhancing DRA performance.
- [InterEdit: Navigating Text-Guided Multi-Human 3D Motion Editing](https://sciencetostartup.com/paper/interedit-navigating-text-guided-multi-human-3d-motion-editing) (8/10) — InterEdit enables advanced text-guided multi-human 3D motion editing with a new dataset and state-of-the-art performance
- [Is this Idea Novel? An Automated Benchmark for Judgment of Research Ideas](https://sciencetostartup.com/paper/is-this-idea-novel-an-automated-benchmark-for-judgment-of-research-ideas) (8/10) — RINoBench offers an automated benchmark for evaluating the novelty of research ideas, streamlining the assessment proces
- [KAN-FIF: Spline-Parameterized Lightweight Physics-based Tropical Cyclone Estimation on Meteorological Satellite](https://sciencetostartup.com/paper/kan-fif-spline-parameterized-lightweight-physics-based-tropical-cyclone-estimation-on-meteorological-satellite) (8/10) — Develop a lightweight, high-performance AI tool for tropical cyclone monitoring on edge devices.
- [LaMoGen: Language to Motion Generation Through LLM-Guided Symbolic Inference](https://sciencetostartup.com/paper/lamogen-language-to-motion-generation-through-llm-guided-symbolic-inference) (8/10) — LaMoGen leverages symbolic reasoning to generate interpretable and linguistically grounded human motion from text.
- [Learn Structure, Adapt on the Fly: Multi-Scale Residual Learning and Online Adaptation for Aerial Manipulators](https://sciencetostartup.com/paper/learn-structure-adapt-on-the-fly-multi-scale-residual-learning-and-online-adaptation-for-aerial-manipulators) (8/10) — A predictive-adaptive framework for real-time modeling and compensation in autonomous aerial manipulators.
- [Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory](https://sciencetostartup.com/paper/learning-query-aware-budget-tier-routing-for-runtime-agent-memory) (8/10) — BudgetMem provides a runtime memory framework for LLMs with query-aware budget-tier routing to optimize performance-cost
- [Less is More: Decoder-Free Masked Modeling for Efficient Skeleton Representation Learning](https://sciencetostartup.com/paper/less-is-more-decoder-free-masked-modeling-for-efficient-skeleton-representation-learning) (8/10) — SLiM is a novel framework for efficient skeleton-based action representation learning that eliminates the need for decod
- [LLM-as-RNN: A Recurrent Language Model for Memory Updates and Sequence Prediction](https://sciencetostartup.com/paper/llm-as-rnn-a-recurrent-language-model-for-memory-updates-and-sequence-prediction) (8/10) — Turn frozen LLMs into error-correcting, recurrent sequence predictors with interpretable memory updates.
- [LLM-Augmented Release Intelligence: Automated Change Summarization and Impact Analysis in Cloud-Native CI/CD Pipelines](https://sciencetostartup.com/paper/llm-augmented-release-intelligence-automated-change-summarization-and-impact-analysis-in-cloud-native-ci-cd-pipelines) (8/10) — An AI-driven framework that automates change summarization and impact analysis for cloud-native CI/CD pipelines.
- [LLMs can Compress LLMs: Adaptive Pruning by Agents](https://sciencetostartup.com/paper/llms-can-compress-llms-adaptive-pruning-by-agents) (8/10) — AI agent-guided pruning enhances LLM compression without retraining, reducing costs while retaining performance.
- [M2P: Improving Visual Foundation Models with Mask-to-Point Weakly-Supervised Learning for Dense Point Tracking](https://sciencetostartup.com/paper/m2p-improving-visual-foundation-models-with-mask-to-point-weakly-supervised-learning-for-dense-point-tracking) (8/10) — M2P enhances visual foundation models for dense point tracking using weakly-supervised learning with video object segmen
- [MANSION: Multi-floor lANguage-to-3D Scene generatIOn for loNg-horizon tasks](https://sciencetostartup.com/paper/mansion-multi-floor-language-to-3d-scene-generation-for-long-horizon-tasks) (8/10) — MANSION is a language-driven framework for generating complex multi-floor 3D environments for robotic tasks.
- [MASFactory: A Graph-centric Framework for Orchestrating LLM-Based Multi-Agent Systems with Vibe Graphing](https://sciencetostartup.com/paper/masfactory-a-graph-centric-framework-for-orchestrating-llm-based-multi-agent-systems-with-vibe-graphing) (8/10) — MASFactory is a graph-centric framework that simplifies the creation and orchestration of LLM-based multi-agent systems 
- [MedSteer: Counterfactual Endoscopic Synthesis via Training-Free Activation Steering](https://sciencetostartup.com/paper/medsteer-counterfactual-endoscopic-synthesis-via-training-free-activation-steering) (8/10) — MedSteer is a training-free activation-steering framework for endoscopic synthesis, enabling counterfactual data generat
- [MER-Bench: A Comprehensive Benchmark for Multimodal Meme Reappraisal](https://sciencetostartup.com/paper/mer-bench-a-comprehensive-benchmark-for-multimodal-meme-reappraisal) (8/10) — MER-Bench enables the transformation of negative memes into constructive ones through emotion-controllable multimodal ge
- [MIL-PF: Multiple Instance Learning on Precomputed Features for Mammography Classification](https://sciencetostartup.com/paper/mil-pf-multiple-instance-learning-on-precomputed-features-for-mammography-classification) (8/10) — MIL-PF is a scalable framework for efficient mammography classification using precomputed features and lightweight aggre
- [MMAI Gym for Science: Training Liquid Foundation Models for Drug Discovery](https://sciencetostartup.com/paper/mmai-gym-for-science-training-liquid-foundation-models-for-drug-discovery) (8/10) — MMAI Gym for Science provides a tailored platform to efficiently train and deploy Liquid Foundation Models for high-perf
- [MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation](https://sciencetostartup.com/paper/molmob0t-large-scale-simulation-enables-zero-shot-manipulation) (8/10) — MolmoBot enables effective zero-shot manipulation in robotics using large-scale simulated data.
- [MSSR: Memory-Aware Adaptive Replay for Continual LLM Fine-Tuning](https://sciencetostartup.com/paper/mssr-memory-aware-adaptive-replay-for-continual-llm-fine-tuning) (8/10) — MSSR is an adaptive replay framework for continual fine-tuning of LLMs that mitigates catastrophic forgetting while ensu
- [NOIR: Neural Operator mapping for Implicit Representations](https://sciencetostartup.com/paper/noir-neural-operator-mapping-for-implicit-representations) (8/10) — NOIR revolutionizes medical imaging by using Neural Operators for resolution-independent transformations.
- [NSR-Boost: A Neuro-Symbolic Residual Boosting Framework for Industrial Legacy Models](https://sciencetostartup.com/paper/nsr-boost-a-neuro-symbolic-residual-boosting-framework-for-industrial-legacy-models) (8/10) — Upgrade legacy industrial models with NSR-Boost for smarter risk management without intrusive retraining costs.
- [Omni-I2C: A Holistic Benchmark for High-Fidelity Image-to-Code Generation](https://sciencetostartup.com/paper/omni-i2c-a-holistic-benchmark-for-high-fidelity-image-to-code-generation) (8/10) — Omni-I2C is a benchmark for evaluating Large Multimodal Models in generating executable code from complex digital graphi
- [One Model, Many Budgets: Elastic Latent Interfaces for Diffusion Transformers](https://sciencetostartup.com/paper/one-model-many-budgets-elastic-latent-interfaces-for-diffusion-transformers) (8/10) — ELIT enhances diffusion transformers by optimizing compute allocation through a dynamic latent interface.
- [Open-Source Reproduction and Explainability Analysis of Corrective Retrieval Augmented Generation](https://sciencetostartup.com/paper/open-source-reproduction-and-explainability-analysis-of-corrective-retrieval-augmented-generation) (8/10) — An open-source implementation of Corrective Retrieval Augmented Generation that enhances robustness and explainability i
- [Optimizing Prompts for Large Language Models: A Causal Approach](https://sciencetostartup.com/paper/optimizing-prompts-for-large-language-models-a-causal-approach) (8/10) — Causal Prompt Optimization offers a robust method to tailor LLM prompts for specific queries, enhancing enterprise workf
- [OSM-based Domain Adaptation for Remote Sensing VLMs](https://sciencetostartup.com/paper/osm-based-domain-adaptation-for-remote-sensing-vlms) (8/10) — OSMDA is a self-contained domain adaptation framework for Vision-Language Models that eliminates the need for costly ext
- [OV-DEIM: Real-time DETR-Style Open-Vocabulary Object Detection with GridSynthetic Augmentation](https://sciencetostartup.com/paper/ov-deim-real-time-detr-style-open-vocabulary-object-detection-with-gridsynthetic-augmentation) (8/10) — OV-DEIM is a real-time, DETR-style open-vocabulary object detector with state-of-the-art performance and available code,
- [Parallel-in-Time Nonlinear Optimal Control via GPU-native Sequential Convex Programming](https://sciencetostartup.com/paper/parallel-in-time-nonlinear-optimal-control-via-gpu-native-sequential-convex-programming) (8/10) — A GPU-native trajectory optimization framework for real-time nonlinear control in autonomous systems.
- [PathMem: Toward Cognition-Aligned Memory Transformation for Pathology MLLMs](https://sciencetostartup.com/paper/pathmem-toward-cognition-aligned-memory-transformation-for-pathology-mllms) (8/10) — PathMem is a memory-centric multimodal framework that enhances pathology MLLMs by integrating structured knowledge for i
- [PersianPunc: A Large-Scale Dataset and BERT-Based Approach for Persian Punctuation Restoration](https://sciencetostartup.com/paper/persianpunc-a-large-scale-dataset-and-bert-based-approach-for-persian-punctuation-restoration) (8/10) — PersianPunc restores punctuation in Persian text with a lightweight BERT model, outperforming LLMs in accuracy and effic
- [PhasorFlow: A Python Library for Unit Circle Based Computing](https://sciencetostartup.com/paper/phasorflow-a-python-library-for-unit-circle-based-computing) (8/10) — PhasorFlow is an open-source Python library for efficient unit circle-based computing, enabling advanced predictive lear
- [PolyFormer: learning efficient reformulations for scalable optimization under complex physical constraints](https://sciencetostartup.com/paper/polyformer-learning-efficient-reformulations-for-scalable-optimization-under-complex-physical-constraints) (8/10) — PolyFormer simplifies constrained optimization problems by learning geometric structures and transforming them into effi
- [ProAct: Agentic Lookahead in Interactive Environments](https://sciencetostartup.com/paper/proact-agentic-lookahead-in-interactive-environments) (8/10) — ProAct enables AI agents to excel in long-horizon planning with enhanced lookahead reasoning and stable decision-making.
- [Prune Redundancy, Preserve Essence: Vision Token Compression in VLMs via Synergistic Importance-Diversity](https://sciencetostartup.com/paper/prune-redundancy-preserve-essence-vision-token-compression-in-vlms-via-synergistic-importance-diversity) (8/10) — PruneSID optimizes visual token compression in vision-language models, enhancing efficiency and performance.
- [QUSR: Quality-Aware and Uncertainty-Guided Image Super-Resolution Diffusion Model](https://sciencetostartup.com/paper/qusr-quality-aware-and-uncertainty-guided-image-super-resolution-diffusion-model) (8/10) — QUSR is a novel diffusion model for high-quality image super-resolution that adapts noise levels based on uncertainty.
- [Real-Time Drone Detection in Event Cameras via Per-Pixel Frequency Analysis](https://sciencetostartup.com/paper/real-time-drone-detection-in-event-cameras-via-per-pixel-frequency-analysis) (8/10) — Real-time drone detection API using event camera data and frequency analysis, outperforming YOLO in accuracy and latency
- [Reasoning-guided Collaborative Filtering with Language Models for Explainable Recommendation](https://sciencetostartup.com/paper/reasoning-guided-collaborative-filtering-with-language-models-for-explainable-recommendation) (8/10) — Develop an efficient and scalable explainable recommendation system using reasoning-guided collaborative filtering with 
- [RedSage: A Cybersecurity Generalist LLM](https://sciencetostartup.com/paper/redsage-a-cybersecurity-generalist-llm) (8/10) — RedSage is an open-source cybersecurity assistant LLM with domain-aware capabilities, surpassing benchmarks and ensuring
- [ReTac-ACT: A State-Gated Vision-Tactile Fusion Transformer for Precision Assembly](https://sciencetostartup.com/paper/retac-act-a-state-gated-vision-tactile-fusion-transformer-for-precision-assembly) (8/10) — ReTac-ACT enhances precision in robotic assembly by seamlessly integrating vision and tactile feedback.
- [Revisiting Chebyshev Polynomial and Anisotropic RBF Models for Tabular Regression](https://sciencetostartup.com/paper/revisiting-chebyshev-polynomial-and-anisotropic-rbf-models-for-tabular-regression) (8/10) — Develop Scikit-learn-compatible smooth-basis models for improved generalization in CPU-constrained tabular regression ta
- [Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons](https://sciencetostartup.com/paper/robometer-scaling-general-purpose-robotic-reward-models-via-trajectory-comparisons) (8/10) — Robometer offers scalable robot reward modeling using trajectory comparisons for enhanced automation learning.
- [Rotation Equivariant Mamba for Vision Tasks](https://sciencetostartup.com/paper/rotation-equivariant-mamba-for-vision-tasks) (8/10) — EQ-VMamba introduces a rotation equivariant architecture for vision tasks, enhancing robustness and efficiency in visual
- [RulePlanner: All-in-One Reinforcement Learner for Unifying Design Rules in 3D Floorplanning](https://sciencetostartup.com/paper/ruleplanner-all-in-one-reinforcement-learner-for-unifying-design-rules-in-3d-floorplanning) (8/10) — Revolutionizing IC floorplanning with an all-in-one deep reinforcement learning tool for 3D design rule unification.
- [SaPaVe: Towards Active Perception and Manipulation in Vision-Language-Action Models for Robotics](https://sciencetostartup.com/paper/sapave-towards-active-perception-and-manipulation-in-vision-language-action-models-for-robotics) (8/10) — SaPaVe is an end-to-end framework that enhances robotic interaction through unified active perception and manipulation.
- [SceneAssistant: A Visual Feedback Agent for Open-Vocabulary 3D Scene Generation](https://sciencetostartup.com/paper/sceneassistant-a-visual-feedback-agent-for-open-vocabulary-3d-scene-generation) (8/10) — SceneAssistant transforms text commands into high-quality 3D scenes with minimal user input.
- [SecureRAG-RTL: A Retrieval-Augmented, Multi-Agent, Zero-Shot LLM-Driven Framework for Hardware Vulnerability Detection](https://sciencetostartup.com/paper/securerag-rtl-a-retrieval-augmented-multi-agent-zero-shot-llm-driven-framework-for-hardware-vulnerability-detection) (8/10) — SecureRAG-RTL enhances LLM-based hardware vulnerability detection by 30% using RAG and a curated HDL dataset, enabling s
- [See and Switch: Vision-Based Branching for Interactive Robot-Skill Programming](https://sciencetostartup.com/paper/see-and-switch-vision-based-branching-for-interactive-robot-skill-programming) (8/10) — See & Switch enables intuitive robot programming via vision-based branching, allowing for efficient in-situ recovery dem
- [SegviGen: Repurposing 3D Generative Model for Part Segmentation](https://sciencetostartup.com/paper/segvigen-repurposing-3d-generative-model-for-part-segmentation) (8/10) — SegviGen repurposes 3D generative models for efficient part segmentation with minimal training data.
- [Shape-of-You: Fused Gromov-Wasserstein Optimal Transport for Semantic Correspondence in-the-Wild](https://sciencetostartup.com/paper/shape-of-you-fused-gromov-wasserstein-optimal-transport-for-semantic-correspondence-in-the-wild) (8/10) — Shape-of-You offers a novel approach to semantic correspondence using Fused Gromov-Wasserstein optimal transport, achiev
- [ShuttleEnv: An Interactive Data-Driven RL Environment for Badminton Strategy Modeling](https://sciencetostartup.com/paper/shuttleenv-an-interactive-data-driven-rl-environment-for-badminton-strategy-modeling) (8/10) — ShuttleEnv is an interactive simulation environment for badminton that leverages reinforcement learning to model strateg
- [SignSparK: Efficient Multilingual Sign Language Production via Sparse Keyframe Learning](https://sciencetostartup.com/paper/signspark-efficient-multilingual-sign-language-production-via-sparse-keyframe-learning) (8/10) — Efficiently generate natural multilingual sign language avatars with sparse keyframe learning.
- [Facial Expression Recognition Using Residual Masking Network](https://sciencetostartup.com/paper/facial-expression-recognition-using-residual-masking-network) (8/10) — Residual Masking Network enhances facial expression recognition by using a segmentation network to refine feature maps, 
- [Explainable Deep Learning for Pediatric Pneumonia Detection in Chest X-Ray Images](https://sciencetostartup.com/paper/explainable-deep-learning-for-pediatric-pneumonia-detection-in-chest-x-ray-images) (8/10) — AI-based diagnostic tool for accurate pediatric pneumonia detection using explainable deep learning.
- [FactCorrector: A Graph-Inspired Approach to Long-Form Factuality Correction of Large Language Models](https://sciencetostartup.com/paper/factcorrector-a-graph-inspired-approach-to-long-form-factuality-correction-of-large-language-models) (8/10) — FactCorrector offers a domain-adaptive solution for correcting factual errors in LLM outputs, backed by the VELI5 benchm
- [EvolveReason: Self-Evolving Reasoning Paradigm for Explainable Deepfake Facial Image Identification](https://sciencetostartup.com/paper/evolvereason-self-evolving-reasoning-paradigm-for-explainable-deepfake-facial-image-identification) (8/10) — EvolveReason is an explainable deepfake detection system that uses reinforcement learning to iteratively improve its rea
- [EvolVE: Evolutionary Search for LLM-based Verilog Generation and Optimization](https://sciencetostartup.com/paper/evolve-evolutionary-search-for-llm-based-verilog-generation-and-optimization) (8/10) — EvolVE uses evolutionary algorithms to optimize Verilog generation, significantly improving hardware design efficiency.
- [Evolving Contextual Safety in Multi-Modal Large Language Models via Inference-Time Self-Reflective Memory](https://sciencetostartup.com/paper/evolving-contextual-safety-in-multi-modal-large-language-models-via-inference-time-self-reflective-memory) (8/10) — EchoSafe enhances safety in multi-modal large language models by leveraging a self-reflective memory framework for conte
- [Fair Lung Disease Diagnosis from Chest CT via Gender-Adversarial Attention Multiple Instance Learning](https://sciencetostartup.com/paper/fair-lung-disease-diagnosis-from-chest-ct-via-gender-adversarial-attention-multiple-instance-learning) (8/10) — A fairness-aware framework for diagnosing lung diseases from chest CT scans using gender-adversarial attention.
- [ESAinsTOD: A Unified End-to-End Schema-Aware Instruction-Tuning Framework for Task-Oriented Dialog Modeling](https://sciencetostartup.com/paper/esainstod-a-unified-end-to-end-schema-aware-instruction-tuning-framework-for-task-oriented-dialog-modeling) (8/10) — ESAinsTOD is a unified framework that enhances task-oriented dialog systems through schema-aware instruction tuning.
- [Emulating Clinician Cognition via Self-Evolving Deep Clinical Research](https://sciencetostartup.com/paper/emulating-clinician-cognition-via-self-evolving-deep-clinical-research) (8/10) — DxEvolve is a self-evolving diagnostic agent that enhances clinical diagnosis through continuous learning and improved a
- [Evaluating Generative Models via One-Dimensional Code Distributions](https://sciencetostartup.com/paper/evaluating-generative-models-via-one-dimensional-code-distributions) (8/10) — Evaluate generative models using discrete visual tokens for improved perceptual quality assessment, offering a training-
- [Emergent Dexterity via Diverse Resets and Large-Scale Reinforcement Learning](https://sciencetostartup.com/paper/emergent-dexterity-via-diverse-resets-and-large-scale-reinforcement-learning) (8/10) — A scalable framework for robust reinforcement learning in dexterous manipulation tasks using minimal human input.
- [Efficient Cross-Architecture Knowledge Transfer for Large-Scale Online User Response Prediction](https://sciencetostartup.com/paper/efficient-cross-architecture-knowledge-transfer-for-large-scale-online-user-response-prediction) (8/10) — CrossAdapt offers an efficient method for deploying new architectures in online prediction systems by minimizing retrain
- [Empowering Locally Deployable Medical Agent via State Enhanced Logical Skills for FHIR-based Clinical Tasks](https://sciencetostartup.com/paper/empowering-locally-deployable-medical-agent-via-state-enhanced-logical-skills-for-fhir-based-clinical-tasks) (8/10) — SELSM enhances locally deployable medical agents by distilling simulated clinical trajectories into entity-agnostic oper
- [EventGeM: Global-to-Local Feature Matching for Event-Based Visual Place Recognition](https://sciencetostartup.com/paper/eventgem-global-to-local-feature-matching-for-event-based-visual-place-recognition) (8/10) — EventGeM is a real-time, state-of-the-art visual place recognition pipeline for event cameras, enabling accurate robotic
- [FairGU: Fairness-aware Graph Unlearning in Social Network](https://sciencetostartup.com/paper/fairgu-fairness-aware-graph-unlearning-in-social-network) (8/10) — FairGU offers a fairness-aware graph unlearning solution for safeguarding privacy and maintaining algorithmic fairness i
- [From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation](https://sciencetostartup.com/paper/from-passive-observer-to-active-critic-reinforcement-learning-elicits-process-reasoning-for-robotic-manipulation) (8/10) — PRIMO R1 transforms video MLLMs into active critics for enhanced robotic manipulation through process reasoning.
- [DuFal: Dual-Frequency-Aware Learning for High-Fidelity Extremely Sparse-view CBCT Reconstruction](https://sciencetostartup.com/paper/dufal-dual-frequency-aware-learning-for-high-fidelity-extremely-sparse-view-cbct-reconstruction) (8/10) — DuFal offers a breakthrough system for high-fidelity sparse-view CBCT reconstruction, transforming medical imaging of fi
- [DSA-SRGS: Super-Resolution Gaussian Splatting for Dynamic Sparse-View DSA Reconstruction](https://sciencetostartup.com/paper/dsa-srgs-super-resolution-gaussian-splatting-for-dynamic-sparse-view-dsa-reconstruction) (8/10) — DSA-SRGS enhances resolution in dynamic 4D angiography models, improving cerebrovascular diagnosis precision.
- [DVD: Deterministic Video Depth Estimation with Generative Priors](https://sciencetostartup.com/paper/dvd-deterministic-video-depth-estimation-with-generative-priors) (8/10) — DVD is a state-of-the-art deterministic video depth estimation tool leveraging generative priors for 3D scene understand
- [Domain-Adaptation through Synthetic Data: Fine-Tuning Large Language Models for German Law](https://sciencetostartup.com/paper/domain-adaptation-through-synthetic-data-fine-tuning-large-language-models-for-german-law) (8/10) — Adapt large language models for German legal question answering using high-quality synthetic data.
- [dLLM: Simple Diffusion Language Modeling](https://sciencetostartup.com/paper/dllm-simple-diffusion-language-modeling) (8/10) — dLLM unifies diffusion language modeling components into a customizable, open-source framework for easy deployment and e
- [PureCLIP-Depth: Prompt-Free and Decoder-Free Monocular Depth Estimation within CLIP Embedding Space](https://sciencetostartup.com/paper/pureclip-depth-prompt-free-and-decoder-free-monocular-depth-estimation-within-clip-embedding-space) (8/10) — PureCLIP-Depth offers a novel, prompt-free method for monocular depth estimation leveraging CLIP embeddings.
- [DynHD: Hallucination Detection for Diffusion Large Language Models via Denoising Dynamics Deviation Learning](https://sciencetostartup.com/paper/dynhd-hallucination-detection-for-diffusion-large-language-models-via-denoising-dynamics-deviation-learning) (8/10) — DynHD offers a novel approach to detect hallucinations in diffusion large language models by analyzing token-level uncer
- [DINO-SAE: DINO Spherical Autoencoder for High-Fidelity Image Reconstruction and Generation](https://sciencetostartup.com/paper/dino-sae-dino-spherical-autoencoder-for-high-fidelity-image-reconstruction-and-generation) (8/10) — DINO-SAE is a high-fidelity image reconstruction and generation tool using hyperspherical model alignment.
- [DexGrasp-Zero: A Morphology-Aligned Policy for Zero-Shot Cross-Embodiment Dexterous Grasping](https://sciencetostartup.com/paper/dexgrasp-zero-a-morphology-aligned-policy-for-zero-shot-cross-embodiment-dexterous-grasping) (8/10) — DexGrasp-Zero enables zero-shot dexterous grasping across diverse robotic hands using a novel morphology-aligned policy.
- [Directing the Narrative: A Finetuning Method for Controlling Coherence and Style in Story Generation](https://sciencetostartup.com/paper/directing-the-narrative-a-finetuning-method-for-controlling-coherence-and-style-in-story-generation) (8/10) — A novel framework for generating coherent and stylistically consistent story visuals using advanced attention mechanisms
- [DermaFlux: Synthetic Skin Lesion Generation with Rectified Flows for Enhanced Image Classification](https://sciencetostartup.com/paper/dermaflux-synthetic-skin-lesion-generation-with-rectified-flows-for-enhanced-image-classification) (8/10) — DermaFlux generates synthetic skin lesion images to enhance classification accuracy in dermatology.
- [DepthCache: Depth-Guided Training-Free Visual Token Merging for Vision-Language-Action Model Inference](https://sciencetostartup.com/paper/depthcache-depth-guided-training-free-visual-token-merging-for-vision-language-action-model-inference) (8/10) — DepthCache is a training-free framework that optimizes visual token merging for faster robotic manipulation without degr
- [Designing Production-Scale OCR for India: Multilingual and Domain-Specific Systems](https://sciencetostartup.com/paper/designing-production-scale-ocr-for-india-multilingual-and-domain-specific-systems) (8/10) — Multilingual, domain-specific OCR system for India's diverse documents with state-of-the-art results.
- [Disentangling perception and reasoning for improving data efficiency in learning cloth manipulation without demonstrations](https://sciencetostartup.com/paper/disentangling-perception-and-reasoning-for-improving-data-efficiency-in-learning-cloth-manipulation-without-demonstratio) (8/10) — Develop a lightweight, efficient RL-based solution for robotic cloth manipulation, offering significant performance impr
- [E-MMKGR: A Unified Multimodal Knowledge Graph Framework for E-commerce Applications](https://sciencetostartup.com/paper/e-mmkgr-a-unified-multimodal-knowledge-graph-framework-for-e-commerce-applications) (8/10) — A new framework for e-commerce applications that unifies item representations using multimodal knowledge graphs to impro
- [Deep Learning-Based Early-Stage IR-Drop Estimation via CNN Surrogate Modeling](https://sciencetostartup.com/paper/deep-learning-based-early-stage-ir-drop-estimation-via-cnn-surrogate-modeling) (8/10) — Deep learning model for actionable early-stage IR-drop estimation in VLSI design.
- [DatedGPT: Preventing Lookahead Bias in Large Language Models with Time-Aware Pretraining](https://sciencetostartup.com/paper/datedgpt-preventing-lookahead-bias-in-large-language-models-with-time-aware-pretraining) (8/10) — DatedGPT offers a solution to lookahead bias in financial forecasting by using time-aware pretraining of large language 
- [DeepASMR: LLM-Based Zero-Shot ASMR Speech Generation for Anyone of Any Voice](https://sciencetostartup.com/paper/deepasmr-llm-based-zero-shot-asmr-speech-generation-for-anyone-of-any-voice) (8/10) — DeepASMR enables anyone to synthesize zero-shot ASMR speech from ordinary samples, leveraging a new dataset and advanced
- [Cut to the Chase: Training-free Multimodal Summarization via Chain-of-Events](https://sciencetostartup.com/paper/cut-to-the-chase-training-free-multimodal-summarization-via-chain-of-events) (8/10) — CoE is a training-free multimodal summarization framework that leverages a chain-of-events guided by a hierarchical even
- [CrossEarth-SAR: A SAR-Centric and Billion-Scale Geospatial Foundation Model for Domain Generalizable Semantic Segmentation](https://sciencetostartup.com/paper/crossearth-sar-a-sar-centric-and-billion-scale-geospatial-foundation-model-for-domain-generalizable-semantic-segmentatio) (8/10) — SAR imaging for domain-generalizable semantic segmentation with billion-scale SAR foundation model.
- [CVGL: Causal Learning and Geometric Topology](https://sciencetostartup.com/paper/cvgl-causal-learning-and-geometric-topology) (8/10) — CLGT is a framework that enhances cross-view geo-localization by integrating causal learning and geometric topology for 
- [DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing](https://sciencetostartup.com/paper/deepgen-1-0-a-lightweight-unified-multimodal-model-for-advancing-image-generation-and-editing) (8/10) — DeepGen 1.0 offers a lightweight but powerful multimodal model for image generation and editing, surpassing larger model
- [Cross-Modal Attention Network with Dual Graph Learning in Multimodal Recommendation](https://sciencetostartup.com/paper/cross-modal-attention-network-with-dual-graph-learning-in-multimodal-recommendation) (8/10) — Cross-modal recommendation engine enhancing user-item interactions for personalized content suggestions with dual graph 
- [CounterRefine: Answer-Conditioned Counterevidence Retrieval for Inference-Time Knowledge Repair in Factual Question Answering](https://sciencetostartup.com/paper/counterrefine-answer-conditioned-counterevidence-retrieval-for-inference-time-knowledge-repair-in-factual-question-answe) (8/10) — CounterRefine enhances factual question answering by refining answers through evidence retrieval and validation.
- [CRIMSON: A Clinically-Grounded LLM-Based Metric for Generative Radiology Report Evaluation](https://sciencetostartup.com/paper/crimson-a-clinically-grounded-llm-based-metric-for-generative-radiology-report-evaluation) (8/10) — CRIMSON is a clinically-grounded evaluation framework for chest X-ray report generation that prioritizes clinically cons
- [Controllable Egocentric Video Generation via Occlusion-Aware Sparse 3D Hand Joints](https://sciencetostartup.com/paper/controllable-egocentric-video-generation-via-occlusion-aware-sparse-3d-hand-joints) (8/10) — A novel framework for generating high-fidelity egocentric videos using sparse 3D hand joints for motion control.
- [A Family of LLMs Liberated from Static Vocabularies](https://sciencetostartup.com/paper/a-family-of-llms-liberated-from-static-vocabularies) (8/10) — A family of LLMs utilizing a novel hierarchical autoregressive transformer architecture to improve tokenization and lang
- [Conversational Demand Response: Bidirectional Aggregator-Prosumer Coordination through Agentic AI](https://sciencetostartup.com/paper/conversational-demand-response-bidirectional-aggregator-prosumer-coordination-through-agentic-ai) (8/10) — Conversational Demand Response uses agentic AI to enable bidirectional communication between energy aggregators and pros
- [Cross-Domain Policy Optimization via Bellman Consistency and Hybrid Critics](https://sciencetostartup.com/paper/cross-domain-policy-optimization-via-bellman-consistency-and-hybrid-critics) (8/10) — QAvatar enhances cross-domain reinforcement learning by effectively leveraging source-domain knowledge for improved tran
- [DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation](https://sciencetostartup.com/paper/deeppresenter-environment-grounded-reflection-for-agentic-presentation-generation) (8/10) — DeepPresenter offers an adaptive, feedback-driven AI framework for automated presentation generation and refinement.
- [EARCP: Self-Regulating Coherence-Aware Ensemble Architecture for Sequential Decision Making -- Ensemble Auto-Regule par Coherence et Performance](https://sciencetostartup.com/paper/earcp-self-regulating-coherence-aware-ensemble-architecture-for-sequential-decision-making-ensemble-auto-regule-par-cohe) (8/10) — EARCP is a self-regulating ensemble architecture that adapts model weights dynamically for improved sequential decision-
- [From Words to Worlds: Benchmarking Cross-Cultural Cultural Understanding in Machine Translation](https://sciencetostartup.com/paper/from-words-to-worlds-benchmarking-cross-cultural-cultural-understanding-in-machine-translation) (8/10) — CulT-Eval is a benchmark for evaluating machine translation of culturally grounded expressions, addressing gaps in curre
- [ComFree-Sim: A GPU-Parallelized Analytical Contact Physics Engine for Scalable Contact-Rich Robotics Simulation and Control](https://sciencetostartup.com/paper/comfree-sim-a-gpu-parallelized-analytical-contact-physics-engine-for-scalable-contact-rich-robotics-simulation-and-contr) (8/10) — ComFree-Sim is a GPU-parallelized contact physics engine that enhances robotics simulation and control with near-linear 
- [CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation](https://sciencetostartup.com/paper/coco-code-as-cot-for-text-to-image-preview-and-rare-concept-generation) (8/10) — CoCo is a code-driven text-to-image generation framework that uses executable code for precise and controllable image cr
- [CoMeT: Collaborative Memory Transformer for Efficient Long Context Modeling](https://sciencetostartup.com/paper/comet-collaborative-memory-transformer-for-efficient-long-context-modeling) (8/10) — CoMeT enables efficient long-context processing in existing Transformers with constant memory usage.
- [Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation](https://sciencetostartup.com/paper/cheers-decoupling-patch-details-from-semantic-representations-enables-unified-multimodal-comprehension-and-generation) (8/10) — CHEERS revolutionizes multimodal AI with efficient, high-quality text and image generation in a unified model.
- [CFEAR-Teach-and-Repeat: Fast and Accurate Radar-only Localization](https://sciencetostartup.com/paper/cfear-teach-and-repeat-fast-and-accurate-radar-only-localization) (8/10) — CFEAR-TR provides robust and accurate radar-only localization for autonomous navigation, offering a deployable solution 
- [CDF-Glove: A Cable-Driven Force Feedback Glove for Dexterous Teleoperation](https://sciencetostartup.com/paper/cdf-glove-a-cable-driven-force-feedback-glove-for-dexterous-teleoperation) (8/10) — A low-cost, cable-driven force feedback glove for dexterous teleoperation that significantly improves task success rates
- [CG-DMER: Hybrid Contrastive-Generative Framework for Disentangled Multimodal ECG Representation Learning](https://sciencetostartup.com/paper/cg-dmer-hybrid-contrastive-generative-framework-for-disentangled-multimodal-ecg-representation-learning) (8/10) — Develop a cutting-edge ECG analysis tool using a novel contrastive-generative framework to improve cardiovascular diagno
- [CHESS: Context-aware Hierarchical Efficient Semantic Selection for Long-Context LLM Inference](https://sciencetostartup.com/paper/chess-context-aware-hierarchical-efficient-semantic-selection-for-long-context-llm-inference) (8/10) — CHESS optimizes long-context LLM inference by drastically reducing KV cache demands, improving throughput by over 4x wit
- [CompactRAG: Reducing LLM Calls and Token Overhead in Multi-Hop Question Answering](https://sciencetostartup.com/paper/compactrag-reducing-llm-calls-and-token-overhead-in-multi-hop-question-answering) (8/10) — CompactRAG revolutionizes multi-hop question answering by reducing LLM calls and token overhead, offering a cost-efficie
- [Categorical Belief Propagation: Sheaf-Theoretic Inference via Descent and Holonomy](https://sciencetostartup.com/paper/categorical-belief-propagation-sheaf-theoretic-inference-via-descent-and-holonomy) (8/10) — Develop an advanced belief propagation tool utilizing sheaf-theoretic inference for faster and exact complex graph analy
- [CI4A: Semantic Component Interfaces for Agents Empowering Web Automation](https://sciencetostartup.com/paper/ci4a-semantic-component-interfaces-for-agents-empowering-web-automation) (8/10) — Leverage CI4A to empower web agents with enhanced semantic integration for efficient UI manipulation.
- [Invisible failures in human-AI interactions](https://sciencetostartup.com/paper/invisible-failures-in-human-ai-interactions) (8/10) — A taxonomy of invisible AI failures to enhance reliability in human-AI interactions.
- [Classifier Pooling for Modern Ordinal Classification](https://sciencetostartup.com/paper/classifier-pooling-for-modern-ordinal-classification) (8/10) — A model-agnostic method for ordinal classification that enhances performance using modern machine learning techniques.
- [COAD: Constant-Time Planning for Continuous Goal Manipulation with Compressed Library and Online Adaptation](https://sciencetostartup.com/paper/coad-constant-time-planning-for-continuous-goal-manipulation-with-compressed-library-and-online-adaptation) (8/10) — COAD enables constant-time planning for robotic manipulation tasks by using a compressed library and online adaptation.
- [Cognitively Layered Data Synthesis for Domain Adaptation of LLMs to Space Situational Awareness](https://sciencetostartup.com/paper/cognitively-layered-data-synthesis-for-domain-adaptation-of-llms-to-space-situational-awareness) (8/10) — A framework for generating high-quality fine-tuning datasets for LLMs in space situational awareness.
- [Coherent Human-Scene Reconstruction from Multi-Person Multi-View Video in a Single Pass](https://sciencetostartup.com/paper/coherent-human-scene-reconstruction-from-multi-person-multi-view-video-in-a-single-pass) (8/10) — CHROMM offers a unified framework for real-time human-scene reconstruction from multi-view videos without preprocessing.
- [ConceptCaps -- a Distilled Concept Dataset for Interpretability in Music Models](https://sciencetostartup.com/paper/conceptcaps-a-distilled-concept-dataset-for-interpretability-in-music-models) (8/10) — A new dataset, ConceptCaps, facilitates improved interpretability of music models using clearly labeled music-caption-au
- [Leveling3D: Leveling Up 3D Reconstruction with Feed-Forward 3D Gaussian Splatting and Geometry-Aware Generation](https://sciencetostartup.com/paper/leveling3d-leveling-up-3d-reconstruction-with-feed-forward-3d-gaussian-splatting-and-geometry-aware-generation) (8/10) — Leveling3D enhances 3D reconstruction by integrating geometry-aware generation for improved novel-view synthesis.
- [CONSTANT: Towards High-Quality One-Shot Handwriting Generation with Patch Contrastive Enhancement and Style-Aware Quantization](https://sciencetostartup.com/paper/constant-towards-high-quality-one-shot-handwriting-generation-with-patch-contrastive-enhancement-and-style-aware-quantiz) (8/10) — Generate realistic handwriting from a single reference image using a novel diffusion model with style-aware quantization
- [Controllable Complex Human Motion Video Generation via Text-to-Skeleton Cascades](https://sciencetostartup.com/paper/controllable-complex-human-motion-video-generation-via-text-to-skeleton-cascades) (8/10) — Generate controllable human motion videos from text using a cascaded text-to-skeleton and pose-conditioned diffusion mod
- [CORE-Acu: Structured Reasoning Traces and Knowledge Graph Safety Verification for Acupuncture Clinical Decision Support](https://sciencetostartup.com/paper/core-acu-structured-reasoning-traces-and-knowledge-graph-safety-verification-for-acupuncture-clinical-decision-support) (8/10) — CORE-Acu provides a safe and interpretable AI-powered clinical decision support system for acupuncture, leveraging struc
- [Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning](https://sciencetostartup.com/paper/cosmos-policy-fine-tuning-video-models-for-visuomotor-control-and-planning) (8/10) — Cosmos Policy transforms pretrained video models into efficient robot control policies, offering breakthrough visuomotor
- [CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification](https://sciencetostartup.com/paper/cove-training-interactive-tool-use-agents-via-constraint-guided-verification) (8/10) — CoVe offers a robust framework for generating high-quality training data for interactive tool-use agents, outperforming 
- [CRAFT: A Tendon-Driven Hand with Hybrid Hard-Soft Compliance](https://sciencetostartup.com/paper/craft-a-tendon-driven-hand-with-hybrid-hard-soft-compliance) (8/10) — CRAFT is an open-source tendon-driven anthropomorphic hand designed for efficient contact-rich manipulation.
- [Cross-modal learning for plankton recognition](https://sciencetostartup.com/paper/cross-modal-learning-for-plankton-recognition) (8/10) — A self-supervised cross-modal approach for efficient plankton recognition using minimal labeled data.
- [CrossADR: enhancing adverse drug reactions prediction for combination pharmacotherapy with cross-layer feature integration and cross-level associative learning](https://sciencetostartup.com/paper/crossadr-enhancing-adverse-drug-reactions-prediction-for-combination-pharmacotherapy-with-cross-layer-feature-integratio) (8/10) — CrossADR enhances adverse drug reactions prediction for combination pharmacotherapy using advanced graph neural networks
- [CUPID: A Plug-in Framework for Joint Aleatoric and Epistemic Uncertainty Estimation with a Single Model](https://sciencetostartup.com/paper/cupid-a-plug-in-framework-for-joint-aleatoric-and-epistemic-uncertainty-estimation-with-a-single-model) (8/10) — CUPID is a plug-in framework that enables joint estimation of aleatoric and epistemic uncertainty in deep learning model
- [Curriculum-DPO++: Direct Preference Optimization via Data and Model Curricula for Text-to-Image Generation](https://sciencetostartup.com/paper/curriculum-dpo-direct-preference-optimization-via-data-and-model-curricula-for-text-to-image-generation) (8/10) — Curriculum-DPO++ improves text-to-image AI by optimizing learning sequences for better preference alignment.
- [CycleRL: Sim-to-Real Deep Reinforcement Learning for Robust Autonomous Bicycle Control](https://sciencetostartup.com/paper/cyclerl-sim-to-real-deep-reinforcement-learning-for-robust-autonomous-bicycle-control) (8/10) — CycleRL is a sim-to-real deep reinforcement learning framework for robust autonomous bicycle control, leveraging advance
- [DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning](https://sciencetostartup.com/paper/datachef-cooking-up-optimal-data-recipes-for-llm-adaptation-via-reinforcement-learning) (8/10) — DataChef automates the creation of optimized data pipelines for LLM training, enhancing model adaptation and performance
- [Decoder-Free Distillation for Quantized Image Restoration](https://sciencetostartup.com/paper/decoder-free-distillation-for-quantized-image-restoration) (8/10) — A framework for edge-deployed image restoration that enhances visual quality through quantization-aware training and dec
- [Deep GraphRAG: A Balanced Approach to Hierarchical Retrieval and Adaptive Integration](https://sciencetostartup.com/paper/deep-graphrag-a-balanced-approach-to-hierarchical-retrieval-and-adaptive-integration) (8/10) — Deep GraphRAG optimizes hierarchical search and integration for efficient, accurate information retrieval.
- [Deja Vu in Plots: Leveraging Cross-Session Evidence with Retrieval-Augmented LLMs for Live Streaming Risk Assessment](https://sciencetostartup.com/paper/deja-vu-in-plots-leveraging-cross-session-evidence-with-retrieval-augmented-llms-for-live-streaming-risk-assessment) (8/10) — Revolutionizing live streaming risk assessment with CS-VAR, a real-time detection system powered by Retrieval-Augmented 
- [Depth-Recurrent Attention Mixtures: Giving Latent Reasoning the Attention it Deserves](https://sciencetostartup.com/paper/depth-recurrent-attention-mixtures-giving-latent-reasoning-the-attention-it-deserves) (8/10) — Depth-Recurrent Attention Mixtures optimize scalable depth-recurrent models to significantly surpass current state-of-th
- [DermaBench: A Clinician-Annotated Benchmark Dataset for Dermatology Visual Question Answering and Reasoning](https://sciencetostartup.com/paper/dermabench-a-clinician-annotated-benchmark-dataset-for-dermatology-visual-question-answering-and-reasoning) (8/10) — Develop a dermatology visual question answering tool utilizing the DermaBench dataset for enhanced clinical decision sup
- [Detecting and Correcting Hallucinations in LLM-Generated Code via Deterministic AST Analysis](https://sciencetostartup.com/paper/detecting-and-correcting-hallucinations-in-llm-generated-code-via-deterministic-ast-analysis) (8/10) — A deterministic AST-based tool to auto-correct semantic errors in LLM-generated code, enhancing reliability without runt
- [Detecting Fake Reviewer Groups in Dynamic Networks: An Adaptive Graph Learning Method](https://sciencetostartup.com/paper/detecting-fake-reviewer-groups-in-dynamic-networks-an-adaptive-graph-learning-method) (8/10) — Detect fake reviewer groups on e-commerce platforms with a graph learning model, improving trust and fair competition.
- [Devil is in Narrow Policy: Unleashing Exploration in Driving VLA Models](https://sciencetostartup.com/paper/devil-is-in-narrow-policy-unleashing-exploration-in-driving-vla-models) (8/10) — Curious-VLA unlocks the exploratory potential of autonomous driving models by addressing the exploit-explore dilemma, ac
- [DexViTac: Collecting Human Visuo-Tactile-Kinematic Demonstrations for Contact-Rich Dexterous Manipulation](https://sciencetostartup.com/paper/dexvitac-collecting-human-visuo-tactile-kinematic-demonstrations-for-contact-rich-dexterous-manipulation) (8/10) — DexViTac is a portable data collection system that captures high-fidelity visuo-tactile-kinematic demonstrations for imp
- [Differentiable Inverse Graphics for Zero-shot Scene Reconstruction and Robot Grasping](https://sciencetostartup.com/paper/differentiable-inverse-graphics-for-zero-shot-scene-reconstruction-and-robot-grasping) (8/10) — Develop a robot grasping system using differentiable inverse graphics for zero-shot scene reconstruction.
- [Driving on Registers](https://sciencetostartup.com/paper/driving-on-registers) (8/10) — DrivoR is a transformer-based autonomous driving system offering efficient, adaptive, end-to-end driving with high bench
- [DiT4DiT: Jointly Modeling Video Dynamics and Actions for Generalizable Robot Control](https://sciencetostartup.com/paper/dit4dit-jointly-modeling-video-dynamics-and-actions-for-generalizable-robot-control) (8/10) — DiT4DiT offers an enhanced robot control model leveraging video-action synthesis for superior robotic manipulation.
- [Do You See What I Am Pointing At? Gesture-Based Egocentric Video Question Answering](https://sciencetostartup.com/paper/do-you-see-what-i-am-pointing-at-gesture-based-egocentric-video-question-answering) (8/10) — EgoPointVQA enables AI assistants to understand and respond to user gestures in egocentric videos.
- [Does the Question Really Matter? Training-Free Data Selection for Vision-Language SFT](https://sciencetostartup.com/paper/does-the-question-really-matter-training-free-data-selection-for-vision-language-sft) (8/10) — CVS is a training-free data selection method that enhances vision-language model performance by identifying samples requ
- [DreamCAD: Scaling Multi-modal CAD Generation using Differentiable Parametric Surfaces](https://sciencetostartup.com/paper/dreamcad-scaling-multi-modal-cad-generation-using-differentiable-parametric-surfaces) (8/10) — DreamCAD is a multi-modal generative framework that directly produces editable CAD models from point clouds, text, or im
- [DroneVLA: VLA based Aerial Manipulation](https://sciencetostartup.com/paper/dronevla-vla-based-aerial-manipulation) (8/10) — DroneVLA: Enabling drones to autonomously understand and execute human language commands for object retrieval and delive
- [A practical artificial intelligence framework for legal age estimation using clavicle computed tomography scans](https://sciencetostartup.com/paper/a-practical-artificial-intelligence-framework-for-legal-age-estimation-using-clavicle-computed-tomography-scans) (8/10) — A robust AI framework for legal age estimation using clavicle CT scans, enhancing forensic decision-making.
- [DUCTILE: Agentic LLM Orchestration of Engineering Analysis in Product Development Practice](https://sciencetostartup.com/paper/ductile-agentic-llm-orchestration-of-engineering-analysis-in-product-development-practice) (8/10) — DUCTILE automates engineering analysis through LLM orchestration, adapting to evolving product requirements.
- [Early Warning of Intraoperative Adverse Events via Transformer-Driven Multi-Label Learning](https://sciencetostartup.com/paper/early-warning-of-intraoperative-adverse-events-via-transformer-driven-multi-label-learning) (8/10) — IAENet offers a Transformer-driven early warning system for predicting multiple intraoperative adverse events, enhancing
- [EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation](https://sciencetostartup.com/paper/effectmaker-unifying-reasoning-and-generation-for-customized-visual-effect-creation) (8/10) — EffectMaker is a unified reasoning-generation framework that enables reference-based VFX customization, offering a scala
- [Efficient Reasoning on the Edge](https://sciencetostartup.com/paper/efficient-reasoning-on-the-edge) (8/10) — A lightweight approach to enable efficient reasoning in small LLMs for mobile devices using LoRA adapters and reinforcem
- [ELLMob: Event-Driven Human Mobility Generation with Self-Aligned LLM Framework](https://sciencetostartup.com/paper/ellmob-event-driven-human-mobility-generation-with-self-aligned-llm-framework) (8/10) — ELLMob generates realistic human mobility trajectories during large-scale events by reconciling habitual patterns and ev
- [Towards Reliable Truth-Aligned Uncertainty Estimation in Large Language Models](https://sciencetostartup.com/paper/towards-reliable-truth-aligned-uncertainty-estimation-in-large-language-models) (8/10) — A post-hoc calibration method for large language models that improves the reliability of uncertainty estimation by align
- [End-to-End Dexterous Grasp Learning from Single-View Point Clouds via a Multi-Object Scene Dataset](https://sciencetostartup.com/paper/end-to-end-dexterous-grasp-learning-from-single-view-point-clouds-via-a-multi-object-scene-dataset) (8/10) — DGS-Net is an end-to-end grasp prediction network that learns dense grasp configurations from single-view point clouds i
- [EPOFusion: Exposure aware Progressive Optimization Method for Infrared and Visible Image Fusion](https://sciencetostartup.com/paper/epofusion-exposure-aware-progressive-optimization-method-for-infrared-and-visible-image-fusion) (8/10) — EPOFusion is an exposure-aware model that enhances infrared and visible image fusion, particularly in overexposed region
- [ESAA-Security: An Event-Sourced, Verifiable Architecture for Agent-Assisted Security Audits of AI-Generated Code](https://sciencetostartup.com/paper/esaa-security-an-event-sourced-verifiable-architecture-for-agent-assisted-security-audits-of-ai-generated-code) (8/10) — ESAA-Security provides a verifiable architecture for agent-assisted security audits of AI-generated code, ensuring trace
- [Do LLMs Know What Is Private Internally? Probing and Steering Contextual Privacy Norms in Large Language Model Representations](https://sciencetostartup.com/paper/do-llms-know-what-is-private-internally-probing-and-steering-contextual-privacy-norms-in-large-language-model-representa) (8/10) — A method to probe and steer LLMs' internal understanding of contextual privacy norms, enabling more reliable control ove
- [EvoDriveVLA: Evolving Autonomous Driving Vision-Language-Action Model via Collaborative Perception-Planning Distillation](https://sciencetostartup.com/paper/evodrivevla-evolving-autonomous-driving-vision-language-action-model-via-collaborative-perception-planning-distillation) (8/10) — EvoDriveVLA enhances autonomous driving with state-of-the-art Vision-Language-Action models through innovative perceptio
- [Heterogeneous Vertiport Selection Optimization for On-Demand Air Taxi Services: A Deep Reinforcement Learning Approach](https://sciencetostartup.com/paper/heterogeneous-vertiport-selection-optimization-for-on-demand-air-taxi-services-a-deep-reinforcement-learning-approach) (8/10) — Optimize on-demand air taxi routing with deep reinforcement learning to cut urban travel time.
- [Internal APIs Are All You Need: Shadow APIs, Shared Discovery, and the Case Against Browser-First Agent Architectures](https://sciencetostartup.com/paper/internal-apis-are-all-you-need-shadow-apis-shared-discovery-and-the-case-against-browser-first-agent-architectures) (8/10) — Unbrowse transforms web interaction for agents by converting redundant browser discoveries into a shared API index, vast
- [Exemplar Diffusion: Improving Medical Object Detection with Opportunistic Labels](https://sciencetostartup.com/paper/exemplar-diffusion-improving-medical-object-detection-with-opportunistic-labels) (8/10) — A framework that enhances medical object detection by utilizing existing labels at inference for improved accuracy and r
- [Experience-Guided Self-Adaptive Cascaded Agents for Breast Cancer Screening and Diagnosis with Reduced Biopsy Referrals](https://sciencetostartup.com/paper/experience-guided-self-adaptive-cascaded-agents-for-breast-cancer-screening-and-diagnosis-with-reduced-biopsy-referrals) (8/10) — A cascaded AI framework for enhanced breast cancer screening and diagnosis that reduces unnecessary biopsies, saving cos
- [GUIDE: GenAI Units In Digital Design Education](https://sciencetostartup.com/paper/guide-genai-units-in-digital-design-education) (8/10) — GUIDE is an open courseware repository that enhances digital design education through AI-assisted learning units and int
- [Exploring Open-Vocabulary Object Recognition in Images using CLIP](https://sciencetostartup.com/paper/exploring-open-vocabulary-object-recognition-in-images-using-clip) (8/10) — A streamlined open-vocabulary object recognition framework leveraging CLIP and CNN/MLP-based encoding for enhanced gener
- [MotionGrounder: Grounded Multi-Object Motion Transfer via Diffusion Transformer](https://sciencetostartup.com/paper/motiongrounder-grounded-multi-object-motion-transfer-via-diffusion-transformer) (8/10) — MotionGrounder is a Diffusion Transformer framework enabling multi-object motion transfer with fine-grained control, gro
- [Fanar 2.0: Arabic Generative AI Stack](https://sciencetostartup.com/paper/fanar-2-0-arabic-generative-ai-stack) (8/10) — Fanar 2.0 is a sovereign Arabic generative AI platform that delivers advanced language and multimodal capabilities.
- [FAR-Drive: Frame-AutoRegressive Video Generation in Closed-Loop Autonomous Driving](https://sciencetostartup.com/paper/far-drive-frame-autoregressive-video-generation-in-closed-loop-autonomous-driving) (8/10) — FAR-Drive is a closed-loop video generation framework for autonomous driving that ensures high fidelity and low latency.
- [Fast-HaMeR: Boosting Hand Mesh Reconstruction using Knowledge Distillation](https://sciencetostartup.com/paper/fast-hamer-boosting-hand-mesh-reconstruction-using-knowledge-distillation) (8/10) — Boost lightweight 3D hand reconstruction on mobile and VR devices with Fast-HaMeR.
- [FeasibleCap: Real-Time Embodiment Constraint Guidance for In-the-Wild Robot Demonstration Collection](https://sciencetostartup.com/paper/feasiblecap-real-time-embodiment-constraint-guidance-for-in-the-wild-robot-demonstration-collection) (8/10) — FeasibleCap enables real-time feedback during robot demonstration collection, improving replay success and reducing infe
- [FedBPrompt: Federated Domain Generalization Person Re-Identification via Body Distribution Aware Visual Prompts](https://sciencetostartup.com/paper/fedbprompt-federated-domain-generalization-person-re-identification-via-body-distribution-aware-visual-prompts) (8/10) — FedBPrompt enhances federated person re-identification by using learnable visual prompts to improve feature discriminati
- [Few-for-Many Personalized Federated Learning](https://sciencetostartup.com/paper/few-for-many-personalized-federated-learning) (8/10) — FedFew optimizes federated learning with minimal server models to personalize and scale efficiently.
- [Fighting Hallucinations with Counterfactuals: Diffusion-Guided Perturbations for LVLM Hallucination Suppression](https://sciencetostartup.com/paper/fighting-hallucinations-with-counterfactuals-diffusion-guided-perturbations-for-lvlm-hallucination-suppression) (8/10) — CIPHER is a training-free method that suppresses hallucinations in vision-language models using counterfactual image per
- [FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use](https://sciencetostartup.com/paper/fintoolbench-evaluating-llm-agents-for-real-world-financial-tool-use) (8/10) — FinToolBench provides a real-world benchmark and evaluation framework for LLM agents using financial tools, enabling aud
- [FLANS at SemEval-2026 Task 7: RAG with Open-Sourced Smaller LLMs for Everyday Knowledge Across Diverse Languages and Cultures](https://sciencetostartup.com/paper/flans-at-semeval-2026-task-7-rag-with-open-sourced-smaller-llms-for-everyday-knowledge-across-diverse-languages-and-cult) (8/10) — Culturally aware AI-driven question-answering system for multilingual contexts using open-sourced LLMs.
- [Fly, Track, Land: Infrastructure-less Magnetic Localization for Heterogeneous UAV-UGV Teaming](https://sciencetostartup.com/paper/fly-track-land-infrastructure-less-magnetic-localization-for-heterogeneous-uav-ugv-teaming) (8/10) — An infrastructure-less magnetic localization system for precise UAV-UGV docking and collaboration.
- [FLUX: Accelerating Cross-Embodiment Generative Navigation Policies via Rectified Flow and Static-to-Dynamic Learning](https://sciencetostartup.com/paper/flux-accelerating-cross-embodiment-generative-navigation-policies-via-rectified-flow-and-static-to-dynamic-learning) (8/10) — FLUX is a flow-based navigation policy that enhances autonomous navigation efficiency and robustness across diverse envi
- [ForceVLA2: Unleashing Hybrid Force-Position Control with Force Awareness for Contact-Rich Manipulation](https://sciencetostartup.com/paper/forcevla2-unleashing-hybrid-force-position-control-with-force-awareness-for-contact-rich-manipulation) (8/10) — ForceVLA2 enhances robotic manipulation by integrating hybrid force-position control with explicit force awareness for i
- [FraudFox: Adaptable Fraud Detection in the Real World](https://sciencetostartup.com/paper/fraudfox-adaptable-fraud-detection-in-the-real-world) (8/10) — FraudFox is an adaptable fraud detection tool for e-commerce platforms that leverages Kalman Filters to dynamically upda
- [From Fewer Samples to Fewer Bits: Reframing Dataset Distillation as Joint Optimization of Precision and Compactness](https://sciencetostartup.com/paper/from-fewer-samples-to-fewer-bits-reframing-dataset-distillation-as-joint-optimization-of-precision-and-compactness) (8/10) — QuADD offers a quantization-aware dataset distillation framework optimizing data compactness and precision for efficient
- [From Ideal to Real: Stable Video Object Removal under Imperfect Conditions](https://sciencetostartup.com/paper/from-ideal-to-real-stable-video-object-removal-under-imperfect-conditions) (8/10) — SVOR is a robust framework for removing objects from videos while maintaining visual consistency under real-world imperf
- [FunCineForge: A Unified Dataset Toolkit and Model for Zero-Shot Movie Dubbing in Diverse Cinematic Scenes](https://sciencetostartup.com/paper/funcineforge-a-unified-dataset-toolkit-and-model-for-zero-shot-movie-dubbing-in-diverse-cinematic-scenes) (8/10) — FunCineForge offers a groundbreaking end-to-end toolkit for improving movie dubbing with unmatched synthesis quality and
- [GATE-AD: Graph Attention Network Encoding For Few-Shot Industrial Visual Anomaly Detection](https://sciencetostartup.com/paper/gate-ad-graph-attention-network-encoding-for-few-shot-industrial-visual-anomaly-detection) (8/10) — A few-shot visual anomaly detection tool for industrial quality assurance using graph attention networks.
- [GazeShift: Unsupervised Gaze Estimation and Dataset for VR](https://sciencetostartup.com/paper/gazeshift-unsupervised-gaze-estimation-and-dataset-for-vr) (8/10) — GazeShift provides a real-time, unsupervised gaze estimation solution for VR, complete with a large-scale dataset and co
- [Generalist Multimodal LLMs Gain Biometric Expertise via Human Salience](https://sciencetostartup.com/paper/generalist-multimodal-llms-gain-biometric-expertise-via-human-salience) (8/10) — A multimodal LLM solution for iris presentation attack detection that respects privacy constraints and outperforms tradi
- [Geometric Autoencoder for Diffusion Models](https://sciencetostartup.com/paper/geometric-autoencoder-for-diffusion-models) (8/10) — Geometric Autoencoder (GAE) optimizes latent space in diffusion models for superior generative performance, surpassing s
- [GeoSolver: Scaling Test-Time Reasoning in Remote Sensing with Fine-Grained Process Supervision](https://sciencetostartup.com/paper/geosolver-scaling-test-time-reasoning-in-remote-sensing-with-fine-grained-process-supervision) (8/10) — GeoSolver enhances remote sensing interpretation through verifiable, process-supervised reasoning.
- [GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts](https://sciencetostartup.com/paper/glimprouter-efficient-collaborative-inference-by-glimpsing-one-token-of-thoughts) (8/10) — "GlimpRouter optimizes AI inference by routing tasks efficiently between small and large models, saving time and resourc
- [GlyphBanana: Advancing Precise Text Rendering Through Agentic Workflows](https://sciencetostartup.com/paper/glyphbanana-advancing-precise-text-rendering-through-agentic-workflows) (8/10) — GlyphBanana enhances text rendering precision through innovative agentic workflows and a dedicated benchmark.
- [GOT-JEPA: Generic Object Tracking with Model Adaptation and Occlusion Handling using Joint-Embedding Predictive Architecture](https://sciencetostartup.com/paper/got-jepa-generic-object-tracking-with-model-adaptation-and-occlusion-handling-using-joint-embedding-predictive-architect) (8/10) — An AI-powered object tracking framework enhancing visibility estimation for improved dynamic environment adaptation and 
- [Gradient Atoms: Unsupervised Discovery, Attribution and Steering of Model Behaviors via Sparse Decomposition of Training Gradients](https://sciencetostartup.com/paper/gradient-atoms-unsupervised-discovery-attribution-and-steering-of-model-behaviors-via-sparse-decomposition-of-training-g) (8/10) — Gradient Atoms offers an unsupervised method for discovering and steering model behaviors through sparse decomposition o
- [GreenRFM: Toward a resource-efficient radiology foundation model](https://sciencetostartup.com/paper/greenrfm-toward-a-resource-efficient-radiology-foundation-model) (8/10) — GreenRFM provides resource-efficient radiology foundation models that achieve state-of-the-art performance on a single G
- [Guiding Diffusion Models with Semantically Degraded Conditions](https://sciencetostartup.com/paper/guiding-diffusion-models-with-semantically-degraded-conditions) (8/10) — A novel guidance method for text-to-image models that enhances compositional accuracy by using strategically degraded co
- [HCVR Scene Generation: High Compatibility Virtual Reality Environment Generation for Extended Redirected Walking](https://sciencetostartup.com/paper/hcvr-scene-generation-high-compatibility-virtual-reality-environment-generation-for-extended-redirected-walking) (8/10) — HCVR generates VR environments optimized for redirected walking, significantly reducing physical collisions.
- [HeBA: Heterogeneous Bottleneck Adapters for Robust Vision-Language Models](https://sciencetostartup.com/paper/heba-heterogeneous-bottleneck-adapters-for-robust-vision-language-models) (8/10) — HeBA adapts Vision-Language Models efficiently with innovative architectural biases for enhanced downstream task perform
- [Hi-SAM: A Hierarchical Structure-Aware Multi-modal Framework for Large-Scale Recommendation](https://sciencetostartup.com/paper/hi-sam-a-hierarchical-structure-aware-multi-modal-framework-for-large-scale-recommendation) (8/10) — Hi-SAM leverages multi-modal data to enhance large-scale recommendation systems for improved user engagement.
- [Hierarchical Concept-to-Appearance Guidance for Multi-Subject Image Generation](https://sciencetostartup.com/paper/hierarchical-concept-to-appearance-guidance-for-multi-subject-image-generation) (8/10) — A framework for generating consistent multi-subject images from textual prompts, using hierarchical concept-to-appearanc
- [History-Conditioned Spatio-Temporal Visual Token Pruning for Efficient Vision-Language Navigation](https://sciencetostartup.com/paper/history-conditioned-spatio-temporal-visual-token-pruning-for-efficient-vision-language-navigation) (8/10) — A training-free token pruning framework that significantly improves the efficiency of vision-language navigation for rob
- [HiSync: Spatio-Temporally Aligning Hand Motion from Wearable IMU and On-Robot Camera for Command Source Identification in Long-Range HRI](https://sciencetostartup.com/paper/hisync-spatio-temporally-aligning-hand-motion-from-wearable-imu-and-on-robot-camera-for-command-source-identification-in) (8/10) — HiSync enhances command source identification in long-range human-robot interactions using a novel optical-inertial fusi
- [HOMURA: Taming the Sand-Glass for Time-Constrained LLM Translation via Reinforcement Learning](https://sciencetostartup.com/paper/homura-taming-the-sand-glass-for-time-constrained-llm-translation-via-reinforcement-learning) (8/10) — Reinforcement learning framework to optimize translations for time-constrained media with precise syllable-level duratio
- [How to Peel with a Knife: Aligning Fine-Grained Manipulation with Human Preference](https://sciencetostartup.com/paper/how-to-peel-with-a-knife-aligning-fine-grained-manipulation-with-human-preference) (8/10) — A robotic system that learns to peel fruits and vegetables with human-like precision and preference alignment.
- [HumDex:Humanoid Dexterous Manipulation Made Easy](https://sciencetostartup.com/paper/humdex-humanoid-dexterous-manipulation-made-easy) (8/10) — HumDex is a portable teleoperation system that simplifies humanoid dexterous manipulation through advanced motion tracki
- [Huntington Disease Automatic Speech Recognition with Biomarker Supervision](https://sciencetostartup.com/paper/huntington-disease-automatic-speech-recognition-with-biomarker-supervision) (8/10) — A specialized ASR system for Huntington's disease that leverages clinical speech data and biomarker supervision.
- [IGASA: Integrated Geometry-Aware and Skip-Attention Modules for Enhanced Point Cloud Registration](https://sciencetostartup.com/paper/igasa-integrated-geometry-aware-and-skip-attention-modules-for-enhanced-point-cloud-registration) (8/10) — IGASA is a novel framework for robust point cloud registration that enhances accuracy through advanced multi-scale featu
- [IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation](https://sciencetostartup.com/paper/immaculate-a-practical-llm-auditing-framework-via-verifiable-computation) (8/10) — IMMACULATE provides a practical framework for auditing LLM API services to detect economic abuses like model substitutio
- [InCoder-32B: Code Foundation Model for Industrial Scenarios](https://sciencetostartup.com/paper/incoder-32b-code-foundation-model-for-industrial-scenarios) (8/10) — InCoder-32B is a specialized code foundation model designed to enhance programming tasks in industrial scenarios.
- [INFA-Guard: Mitigating Malicious Propagation via Infection-Aware Safeguarding in LLM-Based Multi-Agent Systems](https://sciencetostartup.com/paper/infa-guard-mitigating-malicious-propagation-via-infection-aware-safeguarding-in-llm-based-multi-agent-systems) (8/10) — INFA-Guard is a security framework for LLM-based multi-agent systems that mitigates malicious influence propagation by a
- [InnoAds-Composer: Efficient Condition Composition for E-Commerce Poster Generation](https://sciencetostartup.com/paper/innoads-composer-efficient-condition-composition-for-e-commerce-poster-generation) (8/10) — InnoAds-Composer is a single-stage framework for e-commerce poster generation that efficiently controls subject, text, a
- [Interactive World Simulator for Robot Policy Training and Evaluation](https://sciencetostartup.com/paper/interactive-world-simulator-for-robot-policy-training-and-evaluation) (8/10) — Interactive World Simulator enables scalable robot policy training and evaluation by generating realistic, interaction-c
- [Invisible Safety Threat: Malicious Finetuning for LLM via Steganography](https://sciencetostartup.com/paper/invisible-safety-threat-malicious-finetuning-for-llm-via-steganography) (8/10) — Steg-AI provides a security layer for LLMs by detecting steganographically hidden malicious prompts and responses, preve
- [LEMMA: Laplacian pyramids for Efficient Marine SeMAntic Segmentation](https://sciencetostartup.com/paper/lemma-laplacian-pyramids-for-efficient-marine-semantic-segmentation) (8/10) — Deploy a lightweight semantic segmentation model for real-time marine environment analysis on resource-constrained devic
- [JOPP-3D: Joint Open Vocabulary Semantic Segmentation on Point Clouds and Panoramas](https://sciencetostartup.com/paper/jopp-3d-joint-open-vocabulary-semantic-segmentation-on-point-clouds-and-panoramas) (8/10) — JOPP-3D enables language-driven semantic segmentation on point clouds and panoramas, offering a unified scene understand
- [Kakugo: Distillation of Low-Resource Languages into Small Language Models](https://sciencetostartup.com/paper/kakugo-distillation-of-low-resource-languages-into-small-language-models) (8/10) — Kakugo: Cost-effective pipeline for developing AI models in low-resource languages using distillation under $50 per lang
- [KnowBias: Mitigating Social Bias in LLMs via Know-Bias Neuron Enhancement](https://sciencetostartup.com/paper/knowbias-mitigating-social-bias-in-llms-via-know-bias-neuron-enhancement) (8/10) — KnowBias reduces social biases in LLMs through neuron enhancement, preserving model performance.
- [KohakuRAG: A simple RAG framework with hierarchical document indexing](https://sciencetostartup.com/paper/kohakurag-a-simple-rag-framework-with-hierarchical-document-indexing) (8/10) — KohakuRAG is an open-source hierarchical RAG framework that achieves state-of-the-art performance with precise citation 
- [Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking](https://sciencetostartup.com/paper/latent-gaussian-splatting-for-4d-panoptic-occupancy-tracking) (8/10) — Innovative 4D panoptic occupancy tracking system for enhanced robotic perception in dynamic environments.
- [Layer Consistency Matters: Elegant Latent Transition Discrepancy for Generalizable Synthetic Image Detection](https://sciencetostartup.com/paper/layer-consistency-matters-elegant-latent-transition-discrepancy-for-generalizable-synthetic-image-detection) (8/10) — A novel approach for detecting synthetic images by analyzing latent transition discrepancies across network layers.
- [Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs](https://sciencetostartup.com/paper/learning-from-trials-and-errors-reflective-test-time-planning-for-embodied-llms) (8/10) — Reflective Test-Time Planning transforms embodied AI with self-improvement capabilities through real-time action reflect
- [Learning Neural Operators from Partial Observations via Latent Autoregressive Modeling](https://sciencetostartup.com/paper/learning-neural-operators-from-partial-observations-via-latent-autoregressive-modeling) (8/10) — A state-of-the-art framework for learning neural operators from partial observations, applicable to real-world scientifi
- [Learning to Share: Selective Memory for Efficient Parallel Agentic Systems](https://sciencetostartup.com/paper/learning-to-share-selective-memory-for-efficient-parallel-agentic-systems) (8/10) — Launch a system for efficient parallel agentic operations using selective memory to reduce computational cost and enhanc
- [Learning Whole-Body Human-Humanoid Interaction from Human-Human Demonstrations](https://sciencetostartup.com/paper/learning-whole-body-human-humanoid-interaction-from-human-human-demonstrations) (8/10) — Launch an advanced humanoid robot interaction system leveraging PAIR and D-STAR technologies to enhance synchronized hum
- [LLaTTE: Scaling Laws for Multi-Stage Sequence Modeling in Large-Scale Ads Recommendation](https://sciencetostartup.com/paper/llatte-scaling-laws-for-multi-stage-sequence-modeling-in-large-scale-ads-recommendation) (8/10) — LLaTTE leverages scaling laws for sequence modeling to enhance large-scale ads recommendations with a fast, scalable sol
- [VRUD: A Drone Dataset for Complex Vehicle-VRU Interactions within Mixed Traffic](https://sciencetostartup.com/paper/vrud-a-drone-dataset-for-complex-vehicle-vru-interactions-within-mixed-traffic) (8/10) — A novel drone-based dataset and method for capturing complex vehicle-VRU interactions in unstructured urban traffic, ena
- [LLM-Assisted Causal Structure Disambiguation and Factor Extraction for Legal Judgment Prediction](https://sciencetostartup.com/paper/llm-assisted-causal-structure-disambiguation-and-factor-extraction-for-legal-judgment-prediction) (8/10) — An LLM-based framework for improving legal judgment prediction through enhanced causal inference and factor extraction.
- [LLM BiasScope: A Real-Time Bias Analysis Platform for Comparative LLM Evaluation](https://sciencetostartup.com/paper/llm-biasscope-a-real-time-bias-analysis-platform-for-comparative-llm-evaluation) (8/10) — LLM BiasScope is a web application for real-time bias analysis and comparative evaluation of large language models.
- [Prompt Attack Detection with LLM-as-a-Judge and Mixture-of-Models](https://sciencetostartup.com/paper/prompt-attack-detection-with-llm-as-a-judge-and-mixture-of-models) (8/10) — Leveraging lightweight LLMs as low-latency judges to secure public chatbots against prompt attacks in real-time producti
- [Logos: An evolvable reasoning engine for rational molecular design](https://sciencetostartup.com/paper/logos-an-evolvable-reasoning-engine-for-rational-molecular-design) (8/10) — Logos is a compact molecular reasoning engine that integrates logical reasoning with chemical consistency for reliable m
- [M2IR: Proactive All-in-One Image Restoration via Mamba-style Modulation and Mixture-of-Experts](https://sciencetostartup.com/paper/m2ir-proactive-all-in-one-image-restoration-via-mamba-style-modulation-and-mixture-of-experts) (8/10) — M2IR is a proactive image restoration framework that enhances detail recovery by actively controlling degradation propag
- [M$^3$-ACE: Rectifying Visual Perception in Multimodal Math Reasoning via Multi-Agentic Context Engineering](https://sciencetostartup.com/paper/m-3-ace-rectifying-visual-perception-in-multimodal-math-reasoning-via-multi-agentic-context-engineering) (8/10) — M3-ACE is a multi-agent system that improves visual math reasoning by rectifying visual perception, achieving state-of-t
- [Making Avatars Interact: Towards Text-Driven Human-Object Interaction for Controllable Talking Avatars](https://sciencetostartup.com/paper/making-avatars-interact-towards-text-driven-human-object-interaction-for-controllable-talking-avatars) (8/10) — Create controllable talking avatars that interact with objects through text-driven animations.
- [Mango-GS: Enhancing Spatio-Temporal Consistency in Dynamic Scenes Reconstruction using Multi-Frame Node-Guided 4D Gaussian Splatting](https://sciencetostartup.com/paper/mango-gs-enhancing-spatio-temporal-consistency-in-dynamic-scenes-reconstruction-using-multi-frame-node-guided-4d-gaussia) (8/10) — Mango-GS offers a novel framework for high-fidelity 4D reconstruction of dynamic scenes with enhanced temporal consisten
- [MapViT: A Two-Stage ViT-Based Framework for Real-Time Radio Quality Map Prediction in Dynamic Environments](https://sciencetostartup.com/paper/mapvit-a-two-stage-vit-based-framework-for-real-time-radio-quality-map-prediction-in-dynamic-environments) (8/10) — MapViT enables real-time predictions of radio quality maps for autonomous mobile robots in dynamic environments.
- [MAXS: Meta-Adaptive Exploration with LLM Agents](https://sciencetostartup.com/paper/maxs-meta-adaptive-exploration-with-llm-agents) (8/10) — Develop a reasoning framework using LLM agents for stable, efficient multi-tool integration with proven performance gain
- [MedPruner: Training-Free Hierarchical Token Pruning for Efficient 3D Medical Image Understanding in Vision-Language Models](https://sciencetostartup.com/paper/medpruner-training-free-hierarchical-token-pruning-for-efficient-3d-medical-image-understanding-in-vision-language-model) (8/10) — MedPruner is a training-free framework for efficient 3D medical image understanding through hierarchical token pruning.
- [MedSAD-CLIP: Supervised CLIP with Token-Patch Cross-Attention for Medical Anomaly Detection and Segmentation](https://sciencetostartup.com/paper/medsad-clip-supervised-clip-with-token-patch-cross-attention-for-medical-anomaly-detection-and-segmentation) (8/10) — MedSAD-CLIP enhances medical anomaly detection and segmentation using supervised CLIP adaptation for improved localizati
- [MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning](https://sciencetostartup.com/paper/memocr-layout-aware-visual-memory-for-efficient-long-horizon-reasoning) (8/10) — MemOCR optimizes long-horizon reasoning by using adaptive visual layout memory to compress interaction histories efficie
- [MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering](https://sciencetostartup.com/paper/menvagent-scalable-polyglot-environment-construction-for-verifiable-software-engineering) (8/10) — MEnvAgent automates the creation of scalable, verifiable software engineering environments across multiple languages wit
- [Meta-Reinforcement Learning with Self-Reflection for Agentic Search](https://sciencetostartup.com/paper/meta-reinforcement-learning-with-self-reflection-for-agentic-search) (8/10) — MR-Search enhances agentic search through self-reflection and meta reinforcement learning for improved exploration strat
- [MetricAnything: Scaling Metric Depth Pretraining with Noisy Heterogeneous Sources](https://sciencetostartup.com/paper/metricanything-scaling-metric-depth-pretraining-with-noisy-heterogeneous-sources) (8/10) — MetricAnything provides a scalable pretraining framework for metric depth estimation from diverse 3D data sources, achie
- [MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants](https://sciencetostartup.com/paper/miniappbench-evaluating-the-shift-from-text-to-interactive-html-responses-in-llm-powered-assistants) (8/10) — MiniAppBench is a benchmark for evaluating LLM-generated interactive HTML applications, enhancing human-AI interaction.
- [MM-TS: Multi-Modal Temperature and Margin Schedules for Contrastive Learning with Long-Tail Data](https://sciencetostartup.com/paper/mm-ts-multi-modal-temperature-and-margin-schedules-for-contrastive-learning-with-long-tail-data) (8/10) — MM-TS dynamically adjusts temperature and margin in multi-modal contrastive learning to improve performance on long-tail
- [Model-Based and Neural-Aided Approaches for Dog Dead Reckoning](https://sciencetostartup.com/paper/model-based-and-neural-aided-approaches-for-dog-dead-reckoning) (8/10) — A lightweight, low-cost positioning solution for biological and robotic dogs using inertial sensors and neural networks,
- [Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding](https://sciencetostartup.com/paper/molmo2-open-weights-and-data-for-vision-language-models-with-video-understanding-and-grounding) (8/10) — Open-source video-language models with state-of-the-art video grounding capabilities for applications in security, video
- [MOSIV: Multi-Object System Identification from Videos](https://sciencetostartup.com/paper/mosiv-multi-object-system-identification-from-videos) (8/10) — MOSIV is a framework that identifies and simulates multi-object interactions from videos, enabling more accurate robotic
- [MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE](https://sciencetostartup.com/paper/motioncrafter-dense-geometry-and-motion-reconstruction-with-a-4d-vae) (8/10) — MotionCrafter enables state-of-the-art dense 4D geometry and motion reconstruction from monocular videos using a novel 4
- [ProbeFlow: Training-Free Adaptive Flow Matching for Vision-Language-Action Models](https://sciencetostartup.com/paper/probeflow-training-free-adaptive-flow-matching-for-vision-language-action-models) (8/10) — ProbeFlow accelerates action decoding in Vision-Language-Action models for responsive robotic control.
- [Natural Language-Driven Global Mapping of Martian Landforms](https://sciencetostartup.com/paper/natural-language-driven-global-mapping-of-martian-landforms) (8/10) — MarScope is a natural language-driven framework revolutionizing planet-scale geomorphic mapping by enabling rapid, label
- [NOTAI.AI: Explainable Detection of Machine-Generated Text via Curvature and Feature Attribution](https://sciencetostartup.com/paper/notai-ai-explainable-detection-of-machine-generated-text-via-curvature-and-feature-attribution) (8/10) — NOTAI.AI is an explainable AI-generated text detection tool with a user-friendly web interface, feature attribution, and
- [OddGridBench: Exposing the Lack of Fine-Grained Visual Discrepancy Sensitivity in Multimodal Large Language Models](https://sciencetostartup.com/paper/oddgridbench-exposing-the-lack-of-fine-grained-visual-discrepancy-sensitivity-in-multimodal-large-language-models) (8/10) — OddGridBench is a benchmark and framework designed to enhance the visual discrepancy sensitivity of multimodal large lan
- [OilSAM2: Memory-Augmented SAM2 for Scalable SAR Oil Spill Detection](https://sciencetostartup.com/paper/oilsam2-memory-augmented-sam2-for-scalable-sar-oil-spill-detection) (8/10) — OilSAM2 is a memory-augmented segmentation framework designed for accurate oil spill detection in SAR imagery.
- [Omanic: Towards Step-wise Evaluation of Multi-hop Reasoning in Large Language Models](https://sciencetostartup.com/paper/omanic-towards-step-wise-evaluation-of-multi-hop-reasoning-in-large-language-models) (8/10) — Omanic provides a structured approach to evaluate multi-hop reasoning in large language models through detailed annotati
- [OmniVLN: Omnidirectional 3D Perception and Token-Efficient LLM Reasoning for Visual-Language Navigation across Air and Ground Platforms](https://sciencetostartup.com/paper/omnivln-omnidirectional-3d-perception-and-token-efficient-llm-reasoning-for-visual-language-navigation-across-air-and-gr) (8/10) — OmniVLN enhances visual-language navigation for robots using efficient 3D perception and reasoning.
- [One-for-All Model Initialization with Frequency-Domain Knowledge](https://sciencetostartup.com/paper/one-for-all-model-initialization-with-frequency-domain-knowledge) (8/10) — FRONT enables efficient transfer learning by extracting and transferring task-agnostic knowledge from pre-trained models
- [OnFly: Onboard Zero-Shot Aerial Vision-Language Navigation toward Safety and Efficiency](https://sciencetostartup.com/paper/onfly-onboard-zero-shot-aerial-vision-language-navigation-toward-safety-and-efficiency) (8/10) — OnFly enables UAVs to navigate using natural language instructions with enhanced safety and efficiency through real-time
- [The Silicon Mirror: Dynamic Behavioral Gating for Anti-Sycophancy in LLM Agents](https://sciencetostartup.com/paper/the-silicon-mirror-dynamic-behavioral-gating-for-anti-sycophancy-in-llm-agents) (8/10) — The Silicon Mirror is an orchestration framework that dynamically detects user persuasion tactics and adjusts AI behavio
- [Optimistic Policy Regularization](https://sciencetostartup.com/paper/optimistic-policy-regularization) (8/10) — Optimistic Policy Regularization improves reinforcement learning sample efficiency by reinforcing successful trajectorie
- [Optimizing Multi-Modal Models for Image-Based Shape Retrieval: The Role of Pre-Alignment and Hard Contrastive Learning](https://sciencetostartup.com/paper/optimizing-multi-modal-models-for-image-based-shape-retrieval-the-role-of-pre-alignment-and-hard-contrastive-learning) (8/10) — Improve 3D model retrieval from images using pre-trained multi-modal encoders and hard contrastive learning, enabling ze
- [Organ-Aware Attention Improves CT Triage and Classification](https://sciencetostartup.com/paper/organ-aware-attention-improves-ct-triage-and-classification) (8/10) — Develop a CT triage system with organ-aware attention to improve radiology workflow efficiency.
- [OSExpert: Computer-Use Agents Learning Professional Skills via Exploration](https://sciencetostartup.com/paper/osexpert-computer-use-agents-learning-professional-skills-via-exploration) (8/10) — OSExpert enhances computer-use agents with a GUI-based exploration algorithm, achieving near-expert performance and clos
- [Ostrakon-VL: Towards Domain-Expert MLLM for Food-Service and Retail Stores](https://sciencetostartup.com/paper/ostrakon-vl-towards-domain-expert-mllm-for-food-service-and-retail-stores) (8/10) — Ostrakon-VL enhances retail and food-service operations with a domain-specific AI model for robust perception and decisi
- [OUTLINEFORGE: Hierarchical Reinforcement Learning with Explicit States for Scientific Writing](https://sciencetostartup.com/paper/outlineforge-hierarchical-reinforcement-learning-with-explicit-states-for-scientific-writing) (8/10) — A reinforcement learning framework optimizing scientific writing by enhancing document planning, coherence, and citation
- [PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery](https://sciencetostartup.com/paper/panovggt-feed-forward-3d-reconstruction-from-panoramic-imagery) (8/10) — PanoVGGT is a Transformer framework for accurate 3D reconstruction from panoramic imagery, leveraging a unique dataset a
- [PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR](https://sciencetostartup.com/paper/papersearchqa-learning-to-search-and-reason-over-scientific-papers-with-rlvr) (8/10) — Develop a search agent utilizing RLVR to enhance scientific paper QA in fields like biomedicine.
- [Parameter-Efficient Quality Estimation via Frozen Recursive Models](https://sciencetostartup.com/paper/parameter-efficient-quality-estimation-via-frozen-recursive-models) (8/10) — Parameter-efficient quality estimation for low-resource languages using frozen recursive models.
- [PatchDenoiser: Parameter-efficient multi-scale patch learning and fusion denoiser for medical images](https://sciencetostartup.com/paper/patchdenoiser-parameter-efficient-multi-scale-patch-learning-and-fusion-denoiser-for-medical-images) (8/10) — A lightweight AI denoiser for medical images that outperforms traditional methods while drastically reducing parameters 
- [PDD: Manifold-Prior Diverse Distillation for Medical Anomaly Detection](https://sciencetostartup.com/paper/pdd-manifold-prior-diverse-distillation-for-medical-anomaly-detection) (8/10) — PDD offers a novel manifold-prior diverse distillation framework for medical anomaly detection, significantly improving 
- [PerlAD: Towards Enhanced Closed-loop End-to-end Autonomous Driving with Pseudo-simulation-based Reinforcement Learning](https://sciencetostartup.com/paper/perlad-towards-enhanced-closed-loop-end-to-end-autonomous-driving-with-pseudo-simulation-based-reinforcement-learning) (8/10) — PerlAD revolutionizes autonomous driving with efficient closed-loop training using pseudo-simulation-based reinforcement
- [PET-F2I: A Comprehensive Benchmark and Parameter-Efficient Fine-Tuning of LLMs for PET/CT Report Impression Generation](https://sciencetostartup.com/paper/pet-f2i-a-comprehensive-benchmark-and-parameter-efficient-fine-tuning-of-llms-for-pet-ct-report-impression-generation) (8/10) — PET-F2I is a benchmark and fine-tuning method for generating diagnostic impressions from PET/CT reports using LLMs.
- [PhaseCoder: Microphone Geometry-Agnostic Spatial Audio Understanding for Multimodal LLMs](https://sciencetostartup.com/paper/phasecoder-microphone-geometry-agnostic-spatial-audio-understanding-for-multimodal-llms) (8/10) — PhaseCoder allows any device to perform spatial audio reasoning and transcription using a microphone-agnostic transforme
- [Prompt-Free Universal Region Proposal Network](https://sciencetostartup.com/paper/prompt-free-universal-region-proposal-network) (8/10) — A novel object detection network that identifies potential objects without relying on external prompts, enhancing flexib
- [POLAR:A Per-User Association Test in Embedding Space](https://sciencetostartup.com/paper/polar-a-per-user-association-test-in-embedding-space) (8/10) — POLAR offers a novel per-user lexical association test to analyze author-level variations in social media interactions.
- [Post-Training Fairness Control: A Single-Train Framework for Dynamic Fairness in Recommendation](https://sciencetostartup.com/paper/post-training-fairness-control-a-single-train-framework-for-dynamic-fairness-in-recommendation) (8/10) — Cofair offers dynamic, post-training fairness control in recommendation systems without retraining.
- [Preserving Continuous Symmetry in Discrete Spaces: Geometric-Aware Quantization for SO(3)-Equivariant GNNs](https://sciencetostartup.com/paper/preserving-continuous-symmetry-in-discrete-spaces-geometric-aware-quantization-for-so-3-equivariant-gnns) (8/10) — Geometric-Aware Quantization (GAQ) accelerates equivariant models while preserving continuous symmetry, enabling faster 
- [Prompting with the human-touch: evaluating model-sensitivity of foundation models for musculoskeletal CT segmentation](https://sciencetostartup.com/paper/prompting-with-the-human-touch-evaluating-model-sensitivity-of-foundation-models-for-musculoskeletal-ct-segmentation) (8/10) — A benchmarking tool for evaluating promptable foundation models in musculoskeletal CT segmentation.
- [ProvAgent: Threat Detection Based on Identity-Behavior Binding and Multi-Agent Collaborative Attack Investigation](https://sciencetostartup.com/paper/provagent-threat-detection-based-on-identity-behavior-binding-and-multi-agent-collaborative-attack-investigation) (8/10) — ProvAgent revolutionizes threat detection by combining multi-agent systems with traditional models for autonomous invest
- [PVminerLLM: Structured Extraction of Patient Voice from Patient-Generated Text using Large Language Models](https://sciencetostartup.com/paper/pvminerllm-structured-extraction-of-patient-voice-from-patient-generated-text-using-large-language-models) (8/10) — PVminerLLM extracts structured patient voice data from unstructured text, enabling scalable analysis of social and exper
- [Radiometric fingerprinting of object surfaces using mobile laser scanning and semantic 3D road space models](https://sciencetostartup.com/paper/radiometric-fingerprinting-of-object-surfaces-using-mobile-laser-scanning-and-semantic-3d-road-space-models) (8/10) — A system for creating radiometric fingerprints of urban surfaces using LiDAR data to enhance semantic 3D city models.
- [Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contract Security?](https://sciencetostartup.com/paper/re-evaluating-evmbench-are-ai-agents-ready-for-smart-contract-security) (8/10) — EVMbench enhances AI agents for smart contract security, providing a benchmark for vulnerability detection and exploitat
- [ReaDy-Go: Real-to-Sim Dynamic 3D Gaussian Splatting Simulation for Environment-Specific Visual Navigation with Moving Obstacles](https://sciencetostartup.com/paper/ready-go-real-to-sim-dynamic-3d-gaussian-splatting-simulation-for-environment-specific-visual-navigation-with-moving-obs) (8/10) — Develop a real-to-sim simulation tool for robust visual navigation in dynamic environments like households and factories
- [RealWonder: Real-Time Physical Action-Conditioned Video Generation](https://sciencetostartup.com/paper/realwonder-real-time-physical-action-conditioned-video-generation) (8/10) — RealWonder is a real-time system for action-conditioned video generation that uses physics simulation as an intermediate
- [Reason and Verify: A Framework for Faithful Retrieval-Augmented Generation](https://sciencetostartup.com/paper/reason-and-verify-a-framework-for-faithful-retrieval-augmented-generation) (8/10) — A domain-specific framework for enhancing the factuality of Retrieval-Augmented Generation in high-stakes domains throug
- [ReCoSplat: Autoregressive Feed-Forward Gaussian Splatting Using Render-and-Compare](https://sciencetostartup.com/paper/recosplat-autoregressive-feed-forward-gaussian-splatting-using-render-and-compare) (8/10) — ReCoSplat is an innovative model for online novel view synthesis that enhances scene reconstruction from unposed observa
- [Recurrent Structural Policy Gradient for Partially Observable Mean Field Games](https://sciencetostartup.com/paper/recurrent-structural-policy-gradient-for-partially-observable-mean-field-games) (8/10) — Develop advanced algorithms for optimizing large-scale multi-agent systems under uncertainty using Recurrent Structural 
- [Reinforcement Learning for Diffusion LLMs with Entropy-Guided Step Selection and Stepwise Advantages](https://sciencetostartup.com/paper/reinforcement-learning-for-diffusion-llms-with-entropy-guided-step-selection-and-stepwise-advantages) (8/10) — A novel reinforcement learning approach for optimizing diffusion language models with state-of-the-art performance on co
- [Resilient Routing: Risk-Aware Dynamic Routing in Smart Logistics via Spatiotemporal Graph Learning](https://sciencetostartup.com/paper/resilient-routing-risk-aware-dynamic-routing-in-smart-logistics-via-spatiotemporal-graph-learning) (8/10) — Optimize smart logistics with dynamic risk-aware routing using spatiotemporal graph learning.
- [Rethinking VLMs for Image Forgery Detection and Localization](https://sciencetostartup.com/paper/rethinking-vlms-for-image-forgery-detection-and-localization) (8/10) — AI system using vision-language models for advanced image forgery detection and localization.
- [RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback](https://sciencetostartup.com/paper/retroagent-from-solving-to-evolving-via-retrospective-dual-intrinsic-feedback) (8/10) — RetroAgent is an online RL framework that enables LLM-based agents to continuously adapt and improve in complex interact
- [RexDrug: Reliable Multi-Drug Combination Extraction through Reasoning-Enhanced LLMs](https://sciencetostartup.com/paper/rexdrug-reliable-multi-drug-combination-extraction-through-reasoning-enhanced-llms) (8/10) — RexDrug is a reasoning-enhanced LLM framework for extracting reliable multi-drug combinations from biomedical literature
- [RoboGene: Boosting VLA Pre-training via Diversity-Driven Agentic Framework for Real-World Task Generation](https://sciencetostartup.com/paper/robogene-boosting-vla-pre-training-via-diversity-driven-agentic-framework-for-real-world-task-generation) (8/10) — Automate diverse and feasible robotic task generation with RoboGene for enhanced model pre-training and real-world appli
- [RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation](https://sciencetostartup.com/paper/robovip-multi-view-video-generation-with-visual-identity-prompting-augments-robot-manipulation) (8/10) — Enhance robot manipulation datasets with multi-view video generation using visual identity prompts.
- [Rotated Robustness: A Training-Free Defense against Bit-Flip Attacks on Large Language Models](https://sciencetostartup.com/paper/rotated-robustness-a-training-free-defense-against-bit-flip-attacks-on-large-language-models) (8/10) — Rotated Robustness offers a training-free defense against bit-flip attacks on Large Language Models, ensuring reliabilit
- [RSGen: Enhancing Layout-Driven Remote Sensing Image Generation with Diverse Edge Guidance](https://sciencetostartup.com/paper/rsgen-enhancing-layout-driven-remote-sensing-image-generation-with-diverse-edge-guidance) (8/10) — RSGen enhances layout-driven remote sensing image generation with diverse edge guidance for improved control and accurac
- [RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning](https://sciencetostartup.com/paper/rubicap-rubric-guided-reinforcement-learning-for-dense-image-captioning) (8/10) — RubiCap leverages reinforcement learning with LLM-written rubrics to enhance diversity and quality in dense image captio
- [SafePickle: Robust and Generic ML Detection of Malicious Pickle-based ML Models](https://sciencetostartup.com/paper/safepickle-robust-and-generic-ml-detection-of-malicious-pickle-based-ml-models) (8/10) — SafePickle offers a machine learning-based solution to detect malicious Pickle files in model repositories, enhancing se
- [SAMoE-VLA: A Scene Adaptive Mixture-of-Experts Vision-Language-Action Model for Autonomous Driving](https://sciencetostartup.com/paper/samoe-vla-a-scene-adaptive-mixture-of-experts-vision-language-action-model-for-autonomous-driving) (8/10) — SAMoE-VLA is a scene-adaptive Vision-Language-Action model for autonomous driving that outperforms existing approaches w
- [SAVA-X: Ego-to-Exo Imitation Error Detection via Scene-Adaptive View Alignment and Bidirectional Cross View Fusion](https://sciencetostartup.com/paper/sava-x-ego-to-exo-imitation-error-detection-via-scene-adaptive-view-alignment-and-bidirectional-cross-view-fusion) (8/10) — SAVA-X enhances error detection in industrial training by aligning ego and exo video demonstrations for improved accurac
- [SCDP: Learning Humanoid Locomotion from Partial Observations via Mixed-Observation Distillation](https://sciencetostartup.com/paper/scdp-learning-humanoid-locomotion-from-partial-observations-via-mixed-observation-distillation) (8/10) — SCDP enables humanoid locomotion using only onboard sensors, eliminating the need for complex state estimation.
- [scPilot: Large Language Model Reasoning Toward Automated Single-Cell Analysis and Discovery](https://sciencetostartup.com/paper/scpilot-large-language-model-reasoning-toward-automated-single-cell-analysis-and-discovery) (8/10) — scPilot automates single-cell RNA-seq data analysis through LLM-driven reasoning, enhancing accuracy and interpretabilit
- [SecAgent: Efficient Mobile GUI Agent with Semantic Context](https://sciencetostartup.com/paper/secagent-efficient-mobile-gui-agent-with-semantic-context) (8/10) — SecAgent is a 3B-scale mobile GUI agent that automates smartphone tasks using a novel semantic context mechanism and a n
- [Security-by-Design for LLM-Based Code Generation: Leveraging Internal Representations for Concept-Driven Steering Mechanisms](https://sciencetostartup.com/paper/security-by-design-for-llm-based-code-generation-leveraging-internal-representations-for-concept-driven-steering-mechani) (8/10) — A mechanism to enhance security in LLM-based code generation by steering internal representations towards secure outputs
- [See, Act, Adapt: Active Perception for Unsupervised Cross-Domain Visual Adaptation via Personalized VLM-Guided Agent](https://sciencetostartup.com/paper/see-act-adapt-active-perception-for-unsupervised-cross-domain-visual-adaptation-via-personalized-vlm-guided-agent) (8/10) — Enhance perception model effectiveness in new domains with our light-touch adaptation solution.
- [BuildMamba: A Visual State-Space Based Model for Multi-Task Building Segmentation and Height Estimation from Satellite Images](https://sciencetostartup.com/paper/buildmamba-a-visual-state-space-based-model-for-multi-task-building-segmentation-and-height-estimation-from-satellite-im) (8/10) — BuildMamba offers a fast and accurate solution for building segmentation and height estimation from satellite imagery, e
- [Can Vision-Language Models Solve the Shell Game?](https://sciencetostartup.com/paper/can-vision-language-models-solve-the-shell-game) (8/10) — VET-Bench is a new benchmark and method (SGCoT) to improve VLMs' ability to track visually identical objects over time, 
- [Self-supervised Disentanglement of Disease Effects from Aging in 3D Medical Shapes](https://sciencetostartup.com/paper/self-supervised-disentanglement-of-disease-effects-from-aging-in-3d-medical-shapes) (8/10) — A framework for disentangling disease effects from aging in 3D medical shapes to enhance biomarker development.
- [Self-Supervised Multi-Modal World Model with 4D Space-Time Embedding](https://sciencetostartup.com/paper/self-supervised-multi-modal-world-model-with-4d-space-time-embedding) (8/10) — DeepEarth is a self-supervised multi-modal world model with a novel 4D space-time positional encoder, achieving state-of
- [ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation](https://sciencetostartup.com/paper/shotverse-advancing-cinematic-camera-control-for-text-driven-multi-shot-video-creation) (8/10) — ShotVerse revolutionizes cinematic video creation by automating camera control through a novel data-centric framework.
- [Show, Don't Tell: Detecting Novel Objects by Watching Human Videos](https://sciencetostartup.com/paper/show-don-t-tell-detecting-novel-objects-by-watching-human-videos) (8/10) — A self-supervised system that enables robots to recognize novel objects through human demonstrations without complex lan
- [SIA: A Synthesize-Inject-Align Framework for Knowledge-Grounded and Secure E-commerce Search LLMs with Industrial Deployment](https://sciencetostartup.com/paper/sia-a-synthesize-inject-align-framework-for-knowledge-grounded-and-secure-e-commerce-search-llms-with-industrial-deploym) (8/10) — A framework for building knowledgeable and secure e-commerce search LLMs to enhance intent-aware recommendations.
- [WarPGNN: A Parametric Thermal Warpage Analysis Framework with Physics-aware Graph Neural Network](https://sciencetostartup.com/paper/warpgnn-a-parametric-thermal-warpage-analysis-framework-with-physics-aware-graph-neural-network) (8/10) — A physics-informed Graph Neural Network framework that accelerates thermal warpage analysis for chiplet-based designs by
- [Bridging the Skill Gap in Clinical CBCT Interpretation with CBCTRepD](https://sciencetostartup.com/paper/bridging-the-skill-gap-in-clinical-cbct-interpretation-with-cbctrepd) (8/10) — CBCTRepD is an AI-driven system that enhances oral and maxillofacial CBCT reporting by integrating with radiologists to 
- [Bridging Discrete Marks and Continuous Dynamics: Dual-Path Cross-Interaction for Marked Temporal Point Processes](https://sciencetostartup.com/paper/bridging-discrete-marks-and-continuous-dynamics-dual-path-cross-interaction-for-marked-temporal-point-processes) (8/10) — NEXTPP is a dual-channel framework that enhances event sequence prediction by integrating discrete and continuous repres
- [ButterflyMoE: Sub-Linear Ternary Experts via Structured Butterfly Orbits](https://sciencetostartup.com/paper/butterflymoe-sub-linear-ternary-experts-via-structured-butterfly-orbits) (8/10) — Enable high-efficiency AI models on edge devices with ButterflyMoE's memory-reducing geometric parameterization.
- [CAPT: Confusion-Aware Prompt Tuning for Reducing Vision-Language Misalignment](https://sciencetostartup.com/paper/capt-confusion-aware-prompt-tuning-for-reducing-vision-language-misalignment) (8/10) — CAPT uses confusion-aware prompt tuning to enhance vision-language model accuracy by learning from misalignments in visu
- [Breaking the Martingale Curse: Multi-Agent Debate via Asymmetric Cognitive Potential Energy](https://sciencetostartup.com/paper/breaking-the-martingale-curse-multi-agent-debate-via-asymmetric-cognitive-potential-energy) (8/10) — AceMAD leverages cognitive potential energy in multi-agent debate to overcome the Martingale Curse, enabling more accura
- [BrainStack: Neuro-MoE with Functionally Guided Expert Routing for EEG-Based Language Decoding](https://sciencetostartup.com/paper/brainstack-neuro-moe-with-functionally-guided-expert-routing-for-eeg-based-language-decoding) (8/10) — BrainStack offers a novel neuro-inspired framework for EEG-based language decoding, outperforming state-of-the-art model
- [Breaking Training Bottlenecks: Effective and Stable Reinforcement Learning for Coding Models](https://sciencetostartup.com/paper/breaking-training-bottlenecks-effective-and-stable-reinforcement-learning-for-coding-models) (8/10) — MicroCoder-GRPO improves code generation model training with innovations for stability, diversity, and efficiency, outpe
- [Bootstrapping-based Regularisation for Reducing Individual Prediction Instability in Clinical Risk Prediction Models](https://sciencetostartup.com/paper/bootstrapping-based-regularisation-for-reducing-individual-prediction-instability-in-clinical-risk-prediction-models) (8/10) — Develop a regularisation framework for stabilizing clinical prediction models' outputs, enhancing reliability and interp
- [Benchmarking Interaction, Beyond Policy: a Reproducible Benchmark for Collaborative Instance Object Navigation](https://sciencetostartup.com/paper/benchmarking-interaction-beyond-policy-a-reproducible-benchmark-for-collaborative-instance-object-navigation) (8/10) — A benchmark and lightweight model for collaborative object navigation that separates navigation and question-asking asse
- [Bounded State in an Infinite Horizon: Proactive Hierarchical Memory for Ad-Hoc Recall over Streaming Dialogues](https://sciencetostartup.com/paper/bounded-state-in-an-infinite-horizon-proactive-hierarchical-memory-for-ad-hoc-recall-over-streaming-dialogues) (8/10) — ProStream offers a proactive hierarchical memory system enabling efficient ad-hoc recall in streaming dialogues for real
- [BrickSim: A Physics-Based Simulator for Manipulating Interlocking Brick Assemblies](https://sciencetostartup.com/paper/bricksim-a-physics-based-simulator-for-manipulating-interlocking-brick-assemblies) (8/10) — BRICKSIM offers a real-time simulator for realistic robotic manipulation of interlocking brick assemblies, integrating e
- [CarPLAN: Context-Adaptive and Robust Planning with Dynamic Scene Awareness for Autonomous Driving](https://sciencetostartup.com/paper/carplan-context-adaptive-and-robust-planning-with-dynamic-scene-awareness-for-autonomous-driving) (8/10) — CarPLAN enhances autonomous vehicle motion planning with context-adaptive decision-making for diverse traffic scenarios.
- [Communication-Free Collective Navigation for a Swarm of UAVs via LiDAR-Based Deep Reinforcement Learning](https://sciencetostartup.com/paper/communication-free-collective-navigation-for-a-swarm-of-uavs-via-lidar-based-deep-reinforcement-learning) (8/10) — Develop a communication-free drone swarm navigation system using DRL and LiDAR for complex and obstructive environments.
- [Bio-Inspired Self-Supervised Learning for Wrist-worn IMU Signals](https://sciencetostartup.com/paper/bio-inspired-self-supervised-learning-for-wrist-worn-imu-signals) (8/10) — A novel self-supervised learning approach for robust human activity recognition using wrist-worn IMU signals.
- [Bilevel Layer-Positioning LoRA for Real Image Dehazing](https://sciencetostartup.com/paper/bilevel-layer-positioning-lora-for-real-image-dehazing) (8/10) — A novel approach to real image dehazing using a bilevel layer-positioning strategy for targeted adaptation.
- [Bioalignment: Measuring and Improving LLM Disposition Toward Biological Systems for AI Safety](https://sciencetostartup.com/paper/bioalignment-measuring-and-improving-llm-disposition-toward-biological-systems-for-ai-safety) (8/10) — A fine-tuning approach to align large language models with biological solutions, enhancing AI safety.
- [Beyond Sequential Distance: Inter-Modal Distance Invariant Position Encoding](https://sciencetostartup.com/paper/beyond-sequential-distance-inter-modal-distance-invariant-position-encoding) (8/10) — A novel position encoding mechanism that enhances visual grounding in long-context multimodal language models.
- [Beyond Scattered Acceptance: Fast and Coherent Inference for DLMs via Longest Stable Prefixes](https://sciencetostartup.com/paper/beyond-scattered-acceptance-fast-and-coherent-inference-for-dlms-via-longest-stable-prefixes) (8/10) — Longest Stable Prefix (LSP) scheduler accelerates Diffusion Language Model inference by up to 3.4x by optimizing KV cach
- [Beyond Static Frames: Temporal Aggregate-and-Restore Vision Transformer for Human Pose Estimation](https://sciencetostartup.com/paper/beyond-static-frames-temporal-aggregate-and-restore-vision-transformer-for-human-pose-estimation) (8/10) — TAR-ViTPose enhances video-based human pose estimation by aggregating temporal cues, offering more robust and accurate p
- [BioGait-VLM: A Tri-Modal Vision-Language-Biomechanics Framework for Interpretable Clinical Gait Assessment](https://sciencetostartup.com/paper/biogait-vlm-a-tri-modal-vision-language-biomechanics-framework-for-interpretable-clinical-gait-assessment) (8/10) — BioGait-VLM offers interpretable clinical gait assessment by incorporating biomechanics into vision-language models, ach
- [Beyond Hungarian: Match-Free Supervision for End-to-End Object Detection](https://sciencetostartup.com/paper/beyond-hungarian-match-free-supervision-for-end-to-end-object-detection) (8/10) — A matching-free training scheme for DETR-based object detectors that eliminates the Hungarian algorithm, enhancing train
- [Beyond Benchmark Islands: Toward Representative Trustworthiness Evaluation for Agentic AI](https://sciencetostartup.com/paper/beyond-benchmark-islands-toward-representative-trustworthiness-evaluation-for-agentic-ai) (8/10) — A comprehensive framework for evaluating the trustworthiness of agentic AI systems in real-world scenarios.
- [Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models](https://sciencetostartup.com/paper/beyond-length-scaling-synergizing-breadth-and-depth-for-generative-reward-models) (8/10) — Mix-GRM enhances generative reward models through modular frameworks and verifiable reinforcement learning, outperformin
- [Bayesian Transformer for Probabilistic Load Forecasting in Smart Grids](https://sciencetostartup.com/paper/bayesian-transformer-for-probabilistic-load-forecasting-in-smart-grids) (8/10) — Bayesian Transformer provides calibrated probabilistic load forecasting for smart grids, enabling better risk management
- [BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search](https://sciencetostartup.com/paper/bapo-boundary-aware-policy-optimization-for-reliable-agentic-search) (8/10) — Boundary-Aware Policy Optimization enhances reliability for LLM-driven agentic search by teaching AI to recognize its kn
- [BayesianVLA: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries](https://sciencetostartup.com/paper/bayesianvla-bayesian-decomposition-of-vision-language-action-models-via-latent-action-queries) (8/10) — BayesianVLA enhances robot manipulation by robustly integrating language and vision through a novel Bayesian framework, 
- [Beyond Prompting: Efficient and Robust Contextual Biasing for Speech LLMs via Logit-Space Integration (LOGIC)](https://sciencetostartup.com/paper/beyond-prompting-efficient-and-robust-contextual-biasing-for-speech-llms-via-logit-space-integration-logic) (8/10) — Introducing LOGIC, an efficient framework for enhancing Speech LLMs with domain-specific term recognition, overcoming li
- [BlackMirror: Black-Box Backdoor Detection for Text-to-Image Models via Instruction-Response Deviation](https://sciencetostartup.com/paper/blackmirror-black-box-backdoor-detection-for-text-to-image-models-via-instruction-response-deviation) (8/10) — BlackMirror is a plug-and-play, training-free framework that detects backdoors in text-to-image models by identifying se
- [Bridging Online and Offline RL: Contextual Bandit Learning for Multi-Turn Code Generation](https://sciencetostartup.com/paper/bridging-online-and-offline-rl-contextual-bandit-learning-for-multi-turn-code-generation) (8/10) — Cobalt enhances code generation in LLMs using a cost-effective hybrid of online and offline RL.
- [Beyond Dominant Patches: Spatial Credit Redistribution For Grounded Vision-Language Models](https://sciencetostartup.com/paper/beyond-dominant-patches-spatial-credit-redistribution-for-grounded-vision-language-models) (8/10) — A practical solution to reduce hallucination in vision-language models through inference-time spatial credit redistribut
- [Avenir-Web: Human-Experience-Imitating Multimodal Web Agents with Mixture of Grounding Experts](https://sciencetostartup.com/paper/avenir-web-human-experience-imitating-multimodal-web-agents-with-mixture-of-grounding-experts) (8/10) — Avenir-Web: An open-source state-of-the-art agent for executing tasks on dynamic web interfaces using multimodal groundi
- [AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations](https://sciencetostartup.com/paper/autofigure-generating-and-refining-publication-ready-scientific-illustrations) (8/10) — AutoFigure automates the generation of publication-ready scientific illustrations from long-form texts, streamlining sci
- [Audio ControlNet for Fine-Grained Audio Generation and Editing](https://sciencetostartup.com/paper/audio-controlnet-for-fine-grained-audio-generation-and-editing) (8/10) — Audio ControlNet enhances text-to-audio models with precise control over audio attributes and editing capabilities.
- [Automated Rubrics for Reliable Evaluation of Medical Dialogue Systems](https://sciencetostartup.com/paper/automated-rubrics-for-reliable-evaluation-of-medical-dialogue-systems) (8/10) — Automated rubric generation for evaluating and refining medical dialogue systems.
- [Back to the Future: Look-ahead Augmentation and Parallel Self-Refinement for Time Series Forecasting](https://sciencetostartup.com/paper/back-to-the-future-look-ahead-augmentation-and-parallel-self-refinement-for-time-series-forecasting) (8/10) — BTTF enhances time series forecasting accuracy by leveraging look-ahead augmentation and self-refinement.
- [AsyncMDE: Real-Time Monocular Depth Estimation via Asynchronous Spatial Memory](https://sciencetostartup.com/paper/asyncmde-real-time-monocular-depth-estimation-via-asynchronous-spatial-memory) (8/10) — AsyncMDE is a real-time monocular depth estimation system that efficiently reduces computational costs for edge deployme
- [DeepHistoViT: An Interpretable Vision Transformer Framework for Histopathological Cancer Classification](https://sciencetostartup.com/paper/deephistovit-an-interpretable-vision-transformer-framework-for-histopathological-cancer-classification) (8/10) — DeepHistoViT is an interpretable Vision Transformer framework that automates cancer classification from histopathologica
- [Atlas 2 -- Foundation models for clinical deployment](https://sciencetostartup.com/paper/atlas-2-foundation-models-for-clinical-deployment) (8/10) — Atlas 2 offers state-of-the-art pathology vision models designed for clinical deployment with enhanced performance and e
- [Artificial Intelligence for Detecting Fetal Orofacial Clefts and Advancing Medical Education](https://sciencetostartup.com/paper/artificial-intelligence-for-detecting-fetal-orofacial-clefts-and-advancing-medical-education) (8/10) — AI-powered medical copilot for prenatal orofacial cleft detection, improving diagnostic accuracy and accelerating specia
- [Approximate Imitation Learning for Event-based Quadrotor Flight in Cluttered Environments](https://sciencetostartup.com/paper/approximate-imitation-learning-for-event-based-quadrotor-flight-in-cluttered-environments) (8/10) — An end-to-end neural network enables quadrotors to navigate cluttered environments at high speed using event camera data
- [Argument Reconstruction as Supervision for Critical Thinking in LLMs](https://sciencetostartup.com/paper/argument-reconstruction-as-supervision-for-critical-thinking-in-llms) (8/10) — A framework that enhances LLMs' critical thinking by teaching them to reconstruct arguments.
- [AtomicVLA: Unlocking the Potential of Atomic Skill Learning in Robots](https://sciencetostartup.com/paper/atomicvla-unlocking-the-potential-of-atomic-skill-learning-in-robots) (8/10) — AtomicVLA enables robots to learn and execute long-horizon tasks by decomposing them into atomic skills, offering a scal
- [BackdoorIDS: Zero-shot Backdoor Detection for Pretrained Vision Encoder](https://sciencetostartup.com/paper/backdoorids-zero-shot-backdoor-detection-for-pretrained-vision-encoder) (8/10) — BackdoorIDS offers a zero-shot method for detecting backdoor attacks in pretrained vision encoders, enhancing security i
- [BLooP: Zero-Shot Abstractive Summarization using Large Language Models with Bigram Lookahead Promotion](https://sciencetostartup.com/paper/bloop-zero-shot-abstractive-summarization-using-large-language-models-with-bigram-lookahead-promotion) (8/10) — BLooP enhances zero-shot abstractive summarization in LLMs by promoting bigram generation without any training.
- [Component-Aware Sketch-to-Image Generation Using Self-Attention Encoding and Coordinate-Preserving Fusion](https://sciencetostartup.com/paper/component-aware-sketch-to-image-generation-using-self-attention-encoding-and-coordinate-preserving-fusion) (8/10) — A novel framework for transforming sketches into photorealistic images using self-attention and coordinate-preserving te
- [An Efficient and Effective Evaluator for Text2SQL Models on Unseen and Unlabeled Data](https://sciencetostartup.com/paper/an-efficient-and-effective-evaluator-for-text2sql-models-on-unseen-and-unlabeled-data) (8/10) — FusionSQL enables rapid evaluation and monitoring of Text2SQL models on unseen data, ensuring quality and timely detecti
- [Alkaid: Resilience to Edit Errors in Provably Secure Steganography via Distance-Constrained Encoding](https://sciencetostartup.com/paper/alkaid-resilience-to-edit-errors-in-provably-secure-steganography-via-distance-constrained-encoding) (8/10) — Alkaid provides provably secure and robust steganography resilient to edit errors, enabling reliable message recovery in
- [An Exploration-Analysis-Disambiguation Reasoning Framework for Word Sense Disambiguation with Low-Parameter LLMs](https://sciencetostartup.com/paper/an-exploration-analysis-disambiguation-reasoning-framework-for-word-sense-disambiguation-with-low-parameter-llms) (8/10) — Fine-tuned small LLMs rival GPT-4 in word sense disambiguation, enabling efficient and accurate NLP solutions.
- [AgriWorld:A World Tools Protocol Framework for Verifiable Agricultural Reasoning with Code-Executing LLM Agents](https://sciencetostartup.com/paper/agriworld-a-world-tools-protocol-framework-for-verifiable-agricultural-reasoning-with-code-executing-llm-agents) (8/10) — AgriWorld: An agentic framework enabling LLMs to execute precise agricultural queries via a Python-based toolset.
- [Agentics 2.0: Logical Transduction Algebra for Agentic Data Workflows](https://sciencetostartup.com/paper/agentics-2-0-logical-transduction-algebra-for-agentic-data-workflows) (8/10) — Agentics 2.0 is a Python framework enabling reliable and scalable agentic data workflows with logical transduction algeb
- [AgenticSimLaw: A Juvenile Courtroom Multi-Agent Debate Simulation for Explainable High-Stakes Tabular Decision Making](https://sciencetostartup.com/paper/agenticsimlaw-a-juvenile-courtroom-multi-agent-debate-simulation-for-explainable-high-stakes-tabular-decision-making) (8/10) — AgenticSimLaw provides an explainable multi-agent debate framework for transparent high-stakes decision-making in juveni
- [Aligning What EEG Can See: Structural Representations for Brain-Vision Matching](https://sciencetostartup.com/paper/aligning-what-eeg-can-see-structural-representations-for-brain-vision-matching) (8/10) — Unlock non-invasive brain-computer interfaces with our EEG decoding method that achieves state-of-the-art accuracy by al
- [ASDA: Automated Skill Distillation and Adaptation for Financial Reasoning](https://sciencetostartup.com/paper/asda-automated-skill-distillation-and-adaptation-for-financial-reasoning) (8/10) — ASDA automates skill distillation for financial reasoning, enhancing LLMs without fine-tuning.
- [AegisUI: Behavioral Anomaly Detection for Structured User Interface Protocols in AI Agent Systems](https://sciencetostartup.com/paper/aegisui-behavioral-anomaly-detection-for-structured-user-interface-protocols-in-ai-agent-systems) (8/10) — AegisUI detects behavioral anomalies in AI-generated user interface protocols to prevent malicious actions from disguise
- [AEGIS: No Tool Call Left Unchecked -- A Pre-Execution Firewall and Audit Layer for AI Agents](https://sciencetostartup.com/paper/aegis-no-tool-call-left-unchecked-a-pre-execution-firewall-and-audit-layer-for-ai-agents) (8/10) — AEGIS is a pre-execution firewall for AI agents that ensures safe tool usage through real-time risk scanning and human a
- [AG-VAS: Anchor-Guided Zero-Shot Visual Anomaly Segmentation with Large Multimodal Models](https://sciencetostartup.com/paper/ag-vas-anchor-guided-zero-shot-visual-anomaly-segmentation-with-large-multimodal-models) (8/10) — AG-VAS offers advanced zero-shot visual anomaly segmentation for industrial and medical applications using multimodal mo
- [Advancing Model Refinement: Muon-Optimized Distillation and Quantization for LLM Deployment](https://sciencetostartup.com/paper/advancing-model-refinement-muon-optimized-distillation-and-quantization-for-llm-deployment) (8/10) — Deploy LLMs on edge devices using advanced Muon-optimized distillation and quantization for efficient inference.
- [AgenticSCR: An Autonomous Agentic Secure Code Review for Immature Vulnerabilities Detection](https://sciencetostartup.com/paper/agenticscr-an-autonomous-agentic-secure-code-review-for-immature-vulnerabilities-detection) (8/10) — AgenticSCR automates secure code review to catch immature vulnerabilities more accurately than traditional tools.
- [Advancing Automated Algorithm Design via Evolutionary Stagewise Design with LLMs](https://sciencetostartup.com/paper/advancing-automated-algorithm-design-via-evolutionary-stagewise-design-with-llms) (8/10) — EvoStage is an evolutionary algorithm design tool using LLMs that iteratively refines algorithms with real-time feedback
- [AHOY! Animatable Humans under Occlusion from YouTube Videos with Gaussian Splatting and Video Diffusion Priors](https://sciencetostartup.com/paper/ahoy-animatable-humans-under-occlusion-from-youtube-videos-with-gaussian-splatting-and-video-diffusion-priors) (8/10) — AHOY reconstructs animatable 3D avatars from occluded YouTube videos using advanced Gaussian splatting techniques.
- [ALIGNAgent: Adaptive Learner Intelligence for Gap Identification and Next-step guidance](https://sciencetostartup.com/paper/alignagent-adaptive-learner-intelligence-for-gap-identification-and-next-step-guidance) (8/10) — Personalized learning framework integrating skill-gap identification and targeted resource recommendations to improve ed
- [Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment](https://sciencetostartup.com/paper/alignment-pretraining-ai-discourse-causes-self-fulfilling-mis-alignment) (8/10) — Develop AI systems with inherent alignment by leveraging discourse-influenced pretraining techniques.
- [AlphaFace: High Fidelity and Real-time Face Swapper Robust to Facial Pose](https://sciencetostartup.com/paper/alphaface-high-fidelity-and-real-time-face-swapper-robust-to-facial-pose) (8/10) — AlphaFace offers a real-time, high-fidelity face-swapping tool robust to diverse facial poses, outperforming current sol
- [ALTER: Asymmetric LoRA for Token-Entropy-Guided Unlearning of LLMs](https://sciencetostartup.com/paper/alter-asymmetric-lora-for-token-entropy-guided-unlearning-of-llms) (8/10) — ALTER enables efficient unlearning in LLMs without compromising performance, using token-entropy-guided asymmetric LoRA.
- [Compact Keyframe-Optimized Multi-Agent Gaussian Splatting SLAM](https://sciencetostartup.com/paper/compact-keyframe-optimized-multi-agent-gaussian-splatting-slam) (8/10) — A compact multi-agent SLAM system using Gaussian Splatting that significantly reduces communication bandwidth for real-t
- [Anonymous-by-Construction: An LLM-Driven Framework for Privacy-Preserving Text](https://sciencetostartup.com/paper/anonymous-by-construction-an-llm-driven-framework-for-privacy-preserving-text) (8/10) — An LLM-driven framework that anonymizes text while preserving its utility, ensuring responsible AI deployment.
- [ArgLLM-App: An Interactive System for Argumentative Reasoning with Large Language Models](https://sciencetostartup.com/paper/argllm-app-an-interactive-system-for-argumentative-reasoning-with-large-language-models) (8/10) — ArgLLM-App is an interactive web tool enabling explainable decision-making with argumentative reasoning over large langu
- [DPD-Cancer: Explainable Graph-based Deep Learning for Small Molecule Anti-Cancer Activity Prediction](https://sciencetostartup.com/paper/dpd-cancer-explainable-graph-based-deep-learning-for-small-molecule-anti-cancer-activity-prediction) (8/10) — DPD-Cancer offers a state-of-the-art, explainable AI tool for predicting small molecule anti-cancer activity, enhancing 
- [TRIMS: Trajectory-Ranked Instruction Masked Supervision for Diffusion Language Models](https://sciencetostartup.com/paper/trims-trajectory-ranked-instruction-masked-supervision-for-diffusion-language-models) (8/10) — TRIMS is a trajectory-guided fine-tuning method for Diffusion Language Models that improves accuracy-parallelism trade-o
- [SGI: Structured 2D Gaussians for Efficient and Compact Large Image Representation](https://sciencetostartup.com/paper/sgi-structured-2d-gaussians-for-efficient-and-compact-large-image-representation) (8/10) — SGI offers a compact and efficient image representation framework using structured 2D Gaussians, enabling significant co
- [Advancing Visual Reliability: Color-Accurate Underwater Image Enhancement for Real-Time Underwater Missions](https://sciencetostartup.com/paper/advancing-visual-reliability-color-accurate-underwater-image-enhancement-for-real-time-underwater-missions) (8/10) — A lightweight, real-time underwater image enhancement framework that restores color accuracy for underwater missions.
- [Articulat3D: Reconstructing Articulated Digital Twins From Monocular Videos with Geometric and Motion Constraints](https://sciencetostartup.com/paper/articulat3d-reconstructing-articulated-digital-twins-from-monocular-videos-with-geometric-and-motion-constraints) (8/10) — Articulat3D reconstructs high-fidelity digital twins from monocular videos using advanced geometric and motion constrain
- [Perturb-and-Restore: Simulation-driven Structural Augmentation Framework for Imbalance Chromosomal Anomaly Detection](https://sciencetostartup.com/paper/perturb-and-restore-simulation-driven-structural-augmentation-framework-for-imbalance-chromosomal-anomaly-detection) (8/10) — A simulation-driven framework that generates synthetic chromosomal data to overcome severe imbalance and scarcity for st
- [Attribution as Retrieval: Model-Agnostic AI-Generated Image Attribution](https://sciencetostartup.com/paper/attribution-as-retrieval-model-agnostic-ai-generated-image-attribution) (8/10) — LIDA is a model-agnostic framework for efficient attribution of AI-generated images, addressing the challenges of tradit
- [Surprised by Attention: Predictable Query Dynamics for Time Series Anomaly Detection](https://sciencetostartup.com/paper/surprised-by-attention-predictable-query-dynamics-for-time-series-anomaly-detection) (8/10) — AxonAD is an unsupervised anomaly detection tool for multivariate time series that leverages predictable query dynamics 
- [AutoChecklist: Composable Pipelines for Checklist Generation and Scoring with LLM-as-a-Judge](https://sciencetostartup.com/paper/autochecklist-composable-pipelines-for-checklist-generation-and-scoring-with-llm-as-a-judge) (8/10) — AutoChecklist is an open-source library for composable checklist-based LLM evaluation, enabling fine-grained analysis an
- [Automatic End-to-End Data Integration using Large Language Models](https://sciencetostartup.com/paper/automatic-end-to-end-data-integration-using-large-language-models) (8/10) — An automatic data integration pipeline using GPT-5.2 that reduces manual effort and costs significantly.
- [Automatic Generation of High-Performance RL Environments](https://sciencetostartup.com/paper/automatic-generation-of-high-performance-rl-environments) (8/10) — A framework for automatically generating high-performance reinforcement learning environments with minimal engineering e
- [Automating Supply Chain Disruption Monitoring via an Agentic AI Approach](https://sciencetostartup.com/paper/automating-supply-chain-disruption-monitoring-via-an-agentic-ai-approach) (8/10) — Revolutionizing supply chain resilience with agentic AI for autonomous disruption monitoring and mitigation.
- [Autonomous Integration and Improvement of Robotic Assembly using Skill Graph Representations](https://sciencetostartup.com/paper/autonomous-integration-and-improvement-of-robotic-assembly-using-skill-graph-representations) (8/10) — A framework for autonomous integration and improvement of robotic assembly systems using Skill Graph representations.
- [Ontology-Constrained Neural Reasoning in Enterprise Agentic Systems: A Neurosymbolic Architecture for Domain-Grounded AI Agents](https://sciencetostartup.com/paper/ontology-constrained-neural-reasoning-in-enterprise-agentic-systems-a-neurosymbolic-architecture-for-domain-grounded-ai) (8/10) — A neurosymbolic architecture for enterprise AI agents that enforces regulatory compliance and domain grounding, outperfo
- [Bandwidth-Efficient Multi-Agent Communication through Information Bottleneck and Vector Quantization](https://sciencetostartup.com/paper/bandwidth-efficient-multi-agent-communication-through-information-bottleneck-and-vector-quantization) (8/10) — A bandwidth-efficient communication framework for multi-agent systems using information bottleneck theory and vector qua
- [BATQuant: Outlier-resilient MXFP4 Quantization via Learnable Block-wise Optimization](https://sciencetostartup.com/paper/batquant-outlier-resilient-mxfp4-quantization-via-learnable-block-wise-optimization) (8/10) — BATQuant optimizes quantization for multi-modal large language models, achieving state-of-the-art performance while mini
- [Bayesian Optimization for Design Parameters of 3D Image Data Analysis](https://sciencetostartup.com/paper/bayesian-optimization-for-design-parameters-of-3d-image-data-analysis) (8/10) — Optimize and automate 3D biomedical image analysis using Bayesian Optimization.
- [SODIUM: From Open Web Data to Queryable Databases](https://sciencetostartup.com/paper/sodium-from-open-web-data-to-queryable-databases) (8/10) — An AI agent that automatically queries the open web to build structured databases for analytical tasks, achieving over 9
- [BEACON: Language-Conditioned Navigation Affordance Prediction under Occlusion](https://sciencetostartup.com/paper/beacon-language-conditioned-navigation-affordance-prediction-under-occlusion) (8/10) — BEACON enhances robot navigation by predicting traversable locations in occluded environments using language instruction
- [Beyond End-to-End Video Models: An LLM-Based Multi-Agent System for Educational Video Generation](https://sciencetostartup.com/paper/beyond-end-to-end-video-models-an-llm-based-multi-agent-system-for-educational-video-generation) (8/10) — LASEV is a modular AI platform for automated, high-fidelity educational video production, with a 95% cost reduction.
- [Beyond Rule-Based Workflows: An Information-Flow-Orchestrated Multi-Agents Paradigm via Agent-to-Agent Communication from CORAL](https://sciencetostartup.com/paper/beyond-rule-based-workflows-an-information-flow-orchestrated-multi-agents-paradigm-via-agent-to-agent-communication-from) (8/10) — Transform workflows from rule-based decision trees to dynamic agent communication for better task handling and efficienc
- [Beyond Rows to Reasoning: Agentic Retrieval for Multimodal Spreadsheet Understanding and Editing](https://sciencetostartup.com/paper/beyond-rows-to-reasoning-agentic-retrieval-for-multimodal-spreadsheet-understanding-and-editing) (8/10) — BRTR is an agentic framework that enables LLMs to understand and edit complex spreadsheets through iterative tool-callin
- [Extending Precipitation Nowcasting Horizons via Spectral Fusion of Radar Observations and Foundation Model Priors](https://sciencetostartup.com/paper/extending-precipitation-nowcasting-horizons-via-spectral-fusion-of-radar-observations-and-foundation-model-priors) (8/10) — A novel frequency-domain fusion framework that integrates radar observations with weather foundation model forecasts to 
- [SHARP: Spectrum-aware Highly-dynamic Adaptation for Resolution Promotion in Remote Sensing Synthesis](https://sciencetostartup.com/paper/sharp-spectrum-aware-highly-dynamic-adaptation-for-resolution-promotion-in-remote-sensing-synthesis) (8/10) — A novel training-free method for high-resolution remote sensing image synthesis that dynamically adapts positional embed
- [Learning Trajectory-Aware Multimodal Large Language Models for Video Reasoning Segmentation](https://sciencetostartup.com/paper/learning-trajectory-aware-multimodal-large-language-models-for-video-reasoning-segmentation) (8/10) — A unified framework for video object segmentation that leverages bidirectional text-trajectory alignment within multimod
- [BiCLIP: Domain Canonicalization via Structured Geometric Transformation](https://sciencetostartup.com/paper/biclip-domain-canonicalization-via-structured-geometric-transformation) (8/10) — BiCLIP enhances cross-modal alignment in vision-language models through structured geometric transformations for special
- [A Multidisciplinary AI Board for Multimodal Dementia Characterization and Risk Assessment](https://sciencetostartup.com/paper/a-multidisciplinary-ai-board-for-multimodal-dementia-characterization-and-risk-assessment) (8/10) — An interactive multi-agent AI system that synthesizes patient data from EHR, notes, and imaging to provide clinicians wi
- [Binary Latent Protein Fitness Landscapes for Quantum Annealing Optimization](https://sciencetostartup.com/paper/binary-latent-protein-fitness-landscapes-for-quantum-annealing-optimization) (8/10) — Q-BIOLAT optimizes protein fitness landscapes using binary latent representations and quantum annealing techniques.
- [BinWalker: Development and Field Evaluation of a Quadruped Manipulator Platform for Sustainable Litter Collection](https://sciencetostartup.com/paper/binwalker-development-and-field-evaluation-of-a-quadruped-manipulator-platform-for-sustainable-litter-collection) (8/10) — A quadruped robotic platform designed for autonomous litter collection in challenging outdoor environments.
- [HamVision: Hamiltonian Dynamics as Inductive Bias for Medical Image Analysis](https://sciencetostartup.com/paper/hamvision-hamiltonian-dynamics-as-inductive-bias-for-medical-image-analysis) (8/10) — A medical image analysis framework leveraging Hamiltonian dynamics for improved segmentation and classification.
- [RefracGS: Novel View Synthesis Through Refractive Water Surfaces with 3D Gaussian Ray Tracing](https://sciencetostartup.com/paper/refracgs-novel-view-synthesis-through-refractive-water-surfaces-with-3d-gaussian-ray-tracing) (8/10) — A novel framework for high-fidelity novel view synthesis through refractive water surfaces by jointly modeling the water
- [Boosting Maximum Entropy Reinforcement Learning via One-Step Flow Matching](https://sciencetostartup.com/paper/boosting-maximum-entropy-reinforcement-learning-via-one-step-flow-matching) (8/10) — Accelerate RL with FLAME, delivering one-step flow matching for optimal policy efficiency and low latency.
- [LLM-Powered Workflow Optimization for Multidisciplinary Software Development: An Automotive Industry Case Study](https://sciencetostartup.com/paper/llm-powered-workflow-optimization-for-multidisciplinary-software-development-an-automotive-industry-case-study) (8/10) — An LLM-powered workflow optimization system for multidisciplinary software development that drastically reduces developm
- [Brain-WM: Brain Glioblastoma World Model](https://sciencetostartup.com/paper/brain-wm-brain-glioblastoma-world-model) (8/10) — Brain-WM is a brain glioblastoma world model that predicts optimal treatment plans and generates future MRI scans, offer
- [TIDE: Token-Informed Depth Execution for Per-Token Early Exit in LLM Inference](https://sciencetostartup.com/paper/tide-token-informed-depth-execution-for-per-token-early-exit-in-llm-inference) (8/10) — TIDE is a post-training system that significantly reduces LLM inference latency and increases throughput by enabling per
- [BrandFusion: A Multi-Agent Framework for Seamless Brand Integration in Text-to-Video Generation](https://sciencetostartup.com/paper/brandfusion-a-multi-agent-framework-for-seamless-brand-integration-in-text-to-video-generation) (8/10) — BrandFusion seamlessly integrates brands into text-to-video content, revolutionizing advertising possibilities in conten
- [Breaking the Blocks: Continuous Low-Rank Decomposed Scaling for Unified LLM Quantization and Adaptation](https://sciencetostartup.com/paper/breaking-the-blocks-continuous-low-rank-decomposed-scaling-for-unified-llm-quantization-and-adaptation) (8/10) — Unified LLM quantization and adaptation framework that significantly improves performance and efficiency.
- [Chameleons do not Forget: Prompt-Based Online Continual Learning for Next Activity Prediction](https://sciencetostartup.com/paper/chameleons-do-not-forget-prompt-based-online-continual-learning-for-next-activity-prediction) (8/10) — CNAPwP is a prompt-based continual learning approach for next activity prediction that mitigates catastrophic forgetting
- [BridgeDiff: Bridging Human Observations and Flat-Garment Synthesis for Virtual Try-Off](https://sciencetostartup.com/paper/bridgediff-bridging-human-observations-and-flat-garment-synthesis-for-virtual-try-off) (8/10) — BridgeDiff enhances virtual try-on experiences by accurately synthesizing flat-garment representations from dressed imag
- [Bridging Scene Generation and Planning: Driving with World Model via Unifying Vision and Motion Representation](https://sciencetostartup.com/paper/bridging-scene-generation-and-planning-driving-with-world-model-via-unifying-vision-and-motion-representation) (8/10) — WorldDrive unifies scene generation and motion planning for enhanced autonomous driving performance.
- [Probe-then-Plan: Environment-Aware Planning for Industrial E-commerce Search](https://sciencetostartup.com/paper/probe-then-plan-environment-aware-planning-for-industrial-e-commerce-search) (8/10) — A novel industrial e-commerce search framework that enhances user conversion by grounding search plans in real-time retr
- [BUSSARD: Normalizing Flows for Bijective Universal Scene-Specific Anomalous Relationship Detection](https://sciencetostartup.com/paper/bussard-normalizing-flows-for-bijective-universal-scene-specific-anomalous-relationship-detection) (8/10) — BUSSARD leverages normalizing flows for efficient and robust anomaly detection in scene graphs.
- [Building Production-Ready Probes For Gemini](https://sciencetostartup.com/paper/building-production-ready-probes-for-gemini) (8/10) — Deploy cost-effective AI misuse detection systems using flexible activation probes for context adaptation.
- [C$^2$-Explorer: Contiguity-Driven Task Allocation with Connectivity-Aware Task Representation for Decentralized Multi-UAV Exploration](https://sciencetostartup.com/paper/c-2-explorer-contiguity-driven-task-allocation-with-connectivity-aware-task-representation-for-decentralized-multi-uav-e) (8/10) — C$^2$-Explorer is a decentralized multi-UAV exploration framework that significantly reduces exploration time and path l
- [Caging the Agents: A Zero Trust Security Architecture for Autonomous AI in Healthcare](https://sciencetostartup.com/paper/caging-the-agents-a-zero-trust-security-architecture-for-autonomous-ai-in-healthcare) (8/10) — Zero Trust Security Architecture for AI agents in healthcare, protecting sensitive data from vulnerabilities.
- [DLWM: Dual Latent World Models enable Holistic Gaussian-centric Pre-training in Autonomous Driving](https://sciencetostartup.com/paper/dlwm-dual-latent-world-models-enable-holistic-gaussian-centric-pre-training-in-autonomous-driving) (8/10) — DLWM: A novel dual latent world model system for holistic Gaussian-centric pre-training in autonomous driving, significa
- [CAST-CKT: Chaos-Aware Spatio-Temporal and Cross-City Knowledge Transfer for Traffic Flow Prediction](https://sciencetostartup.com/paper/cast-ckt-chaos-aware-spatio-temporal-and-cross-city-knowledge-transfer-for-traffic-flow-prediction) (8/10) — CAST-CKT enhances traffic flow prediction across cities with a chaos-aware framework, offering significant performance g
- [CDRRM: Contrast-Driven Rubric Generation for Reliable and Interpretable Reward Modeling](https://sciencetostartup.com/paper/cdrrm-contrast-driven-rubric-generation-for-reliable-and-interpretable-reward-modeling) (8/10) — CDRRM offers a scalable, interpretable, and data-efficient solution for reward modeling by generating high-quality rubri
- [Agentic Multi-Source Grounding for Enhanced Query Intent Understanding: A DoorDash Case Study](https://sciencetostartup.com/paper/agentic-multi-source-grounding-for-enhanced-query-intent-understanding-a-doordash-case-study) (8/10) — A novel AI system for accurately understanding customer intent in multi-category marketplaces, boosting search accuracy 
- [Chain-of-Look Spatial Reasoning for Dense Surgical Instrument Counting](https://sciencetostartup.com/paper/chain-of-look-spatial-reasoning-for-dense-surgical-instrument-counting) (8/10) — Automated high-density surgical instrument counting using visual chain reasoning.
- [Characterization, Analytical Planning, and Hybrid Force Control for the Inspire RH56DFX Hand](https://sciencetostartup.com/paper/characterization-analytical-planning-and-hybrid-force-control-for-the-inspire-rh56dfx-hand) (8/10) — Transform the Inspire RH56DFX hand into a reliable research tool for dexterous manipulation with enhanced control and pl
- [ChatAD: Reasoning-Enhanced Time-Series Anomaly Detection with Multi-Turn Instruction Evolution](https://sciencetostartup.com/paper/chatad-reasoning-enhanced-time-series-anomaly-detection-with-multi-turn-instruction-evolution) (8/10) — A next-gen time-series anomaly detection platform leveraging LLMs for enhanced reasoning and dialogue capabilities.
- [Chatting with Images for Introspective Visual Thinking](https://sciencetostartup.com/paper/chatting-with-images-for-introspective-visual-thinking) (8/10) — ViLaVT enables more interactive and precise visual reasoning by dynamically integrating language guidance into vision pr
- [Chinese Labor Law Large Language Model Benchmark](https://sciencetostartup.com/paper/chinese-labor-law-large-language-model-benchmark) (8/10) — Specialized AI model optimized for Chinese labor law applications, enhancing legal practices' efficiency and accuracy.
- [CircuitLM: A Multi-Agent LLM-Aided Design Framework for Generating Circuit Schematics from Natural Language Prompts](https://sciencetostartup.com/paper/circuitlm-a-multi-agent-llm-aided-design-framework-for-generating-circuit-schematics-from-natural-language-prompts) (8/10) — CircuitLM enables non-experts to generate accurate circuit schematics from natural language prompts, bridging the gap be
- [CMHANet: A Cross-Modal Hybrid Attention Network for Point Cloud Registration](https://sciencetostartup.com/paper/cmhanet-a-cross-modal-hybrid-attention-network-for-point-cloud-registration) (8/10) — CMHANet enhances point cloud registration by integrating 2D image context with 3D geometric details for improved robustn
- [CN-CBF: Composite Neural Control Barrier Function for Safe Robot Navigation in Dynamic Environments](https://sciencetostartup.com/paper/cn-cbf-composite-neural-control-barrier-function-for-safe-robot-navigation-in-dynamic-environments) (8/10) — A neural control barrier function method for safe robot navigation in dynamic environments, demonstrated on both ground 
- [Code-Mix Sentiment Analysis on Hinglish Tweets](https://sciencetostartup.com/paper/code-mix-sentiment-analysis-on-hinglish-tweets) (8/10) — AI-powered sentiment analysis tailored for the Indian market, decoding code-mixed Hinglish on social media platforms.
- [CodePercept: Code-Grounded Visual STEM Perception for MLLMs](https://sciencetostartup.com/paper/codepercept-code-grounded-visual-stem-perception-for-mllms) (8/10) — CodePercept enhances visual reasoning in STEM for MLLMs by leveraging executable code as a perceptual medium.
- [CognitionCapturerPro: Towards High-Fidelity Visual Decoding from EEG/MEG via Multi-modal Information and Asymmetric Alignment](https://sciencetostartup.com/paper/cognitioncapturerpro-towards-high-fidelity-visual-decoding-from-eeg-meg-via-multi-modal-information-and-asymmetric-align) (8/10) — CognitionCapturerPro enhances visual stimuli reconstruction from EEG using multi-modal integration and advanced scoring 
- [Cognitively-Inspired Tokens Overcome Egocentric Bias in Multimodal Models](https://sciencetostartup.com/paper/cognitively-inspired-tokens-overcome-egocentric-bias-in-multimodal-models) (8/10) — Cognitively-Inspired Tokens enhance multimodal models by overcoming egocentric bias, enabling better spatial reasoning f
- [PC-SAM: Patch-Constrained Fine-Grained Interactive Road Segmentation in High-Resolution Remote Sensing Images](https://sciencetostartup.com/paper/pc-sam-patch-constrained-fine-grained-interactive-road-segmentation-in-high-resolution-remote-sensing-images) (8/10) — A unified framework for fine-grained interactive road segmentation in high-resolution remote sensing images, combining a
- [Compression Favors Consistency, Not Truth: When and Why Language Models Prefer Correct Information](https://sciencetostartup.com/paper/compression-favors-consistency-not-truth-when-and-why-language-models-prefer-correct-information) (8/10) — A study revealing how language models prioritize consistent information over truth, with implications for model training
- [Conditional Generative Framework with Peak-Aware Attention for Robust Chemical Detection under Interferences](https://sciencetostartup.com/paper/conditional-generative-framework-with-peak-aware-attention-for-robust-chemical-detection-under-interferences) (8/10) — A robust AI framework for enhancing GC-MS chemical detection accuracy under interference conditions.
- [CONE: Embeddings for Complex Numerical Data Preserving Unit and Variable Semantics](https://sciencetostartup.com/paper/cone-embeddings-for-complex-numerical-data-preserving-unit-and-variable-semantics) (8/10) — Develop CONE, a hybrid transformer model that improves numerical reasoning in large-scale datasets for various domains b
- [Adaptive Confidence Regularization for Multimodal Failure Detection](https://sciencetostartup.com/paper/adaptive-confidence-regularization-for-multimodal-failure-detection) (8/10) — ACR framework for reliable failure detection in multimodal AI systems, critical for safety in high-stakes domains like a
- [Constructing Synthetic Instruction Datasets for Improving Reasoning in Domain-Specific LLMs: A Case Study in the Japanese Financial Domain](https://sciencetostartup.com/paper/constructing-synthetic-instruction-datasets-for-improving-reasoning-in-domain-specific-llms-a-case-study-in-the-japanese) (8/10) — Build domain-specific datasets for improving reasoning in LLMs with demonstrated success in the Japanese financial secto
- [Contrastive Reasoning Alignment: Reinforcement Learning from Hidden Representations](https://sciencetostartup.com/paper/contrastive-reasoning-alignment-reinforcement-learning-from-hidden-representations) (8/10) — CRAFT is a robust alignment framework that enhances reasoning safety in AI models against jailbreak attacks.
- [Contribution-aware Token Compression for Efficient Video Understanding via Reinforcement Learning](https://sciencetostartup.com/paper/contribution-aware-token-compression-for-efficient-video-understanding-via-reinforcement-learning) (8/10) — Optimize video understanding efficiency through a contribution-aware token compression algorithm leveraging reinforcemen
- [MOON3.0: Reasoning-aware Multimodal Representation Learning for E-commerce Product Understanding](https://sciencetostartup.com/paper/moon3-0-reasoning-aware-multimodal-representation-learning-for-e-commerce-product-understanding) (8/10) — MOON3.0 is a reasoning-aware multimodal LLM for e-commerce that explicitly models fine-grained product attributes using 
- [CORAL: Scalable Multi-Task Robot Learning via LoRA Experts](https://sciencetostartup.com/paper/coral-scalable-multi-task-robot-learning-via-lora-experts) (8/10) — CORAL is a scalable framework for multi-task robotic learning that mitigates task interference using lightweight LoRA ex
- [CORE-Seg: Reasoning-Driven Segmentation for Complex Lesions via Reinforcement Learning](https://sciencetostartup.com/paper/core-seg-reasoning-driven-segmentation-for-complex-lesions-via-reinforcement-learning) (8/10) — CORE-Seg is an end-to-end framework for reasoning-driven complex lesion segmentation in medical images, leveraging reinf
- [CoRefine: Confidence-Guided Self-Refinement for Adaptive Test-Time Compute](https://sciencetostartup.com/paper/corefine-confidence-guided-self-refinement-for-adaptive-test-time-compute) (8/10) — CoRefine reduces compute costs for LLMs by leveraging confidence-guided self-refinement to achieve competitive accuracy.
- [CounterVid: Counterfactual Video Generation for Mitigating Action and Temporal Hallucinations in Video-Language Models](https://sciencetostartup.com/paper/countervid-counterfactual-video-generation-for-mitigating-action-and-temporal-hallucinations-in-video-language-models) (8/10) — CounterVid enhances video-language models by generating counterfactual videos to reduce action and temporal hallucinatio
- [Toward Physically Consistent Driving Video World Models under Challenging Trajectories](https://sciencetostartup.com/paper/toward-physically-consistent-driving-video-world-models-under-challenging-trajectories) (8/10) — A world model for autonomous driving that generates physically consistent videos even from challenging or invalid trajec
- [Coverage-Guided Multi-Agent Harness Generation for Java Library Fuzzing](https://sciencetostartup.com/paper/coverage-guided-multi-agent-harness-generation-for-java-library-fuzzing) (8/10) — Automated fuzz harness generation for Java libraries using LLM-powered agents, improving coverage and bug discovery.
- [Cybersecurity AI: Hacking Consumer Robots in the AI Era](https://sciencetostartup.com/paper/cybersecurity-ai-hacking-consumer-robots-in-the-ai-era) (8/10) — Democratizing robot cybersecurity assessments with an AI-powered vulnerability scanner that automates penetration testin
- [Adaptive Anchor Policies for Efficient 4D Gaussian Streaming](https://sciencetostartup.com/paper/adaptive-anchor-policies-for-efficient-4d-gaussian-streaming) (8/10) — Efficient Gaussian Streaming optimizes anchor selection for real-time rendering, enhancing quality and efficiency in dyn
- [RefineRL: Advancing Competitive Programming with Self-Refinement Reinforcement Learning](https://sciencetostartup.com/paper/refinerl-advancing-competitive-programming-with-self-refinement-reinforcement-learning) (8/10) — RefineRL enhances LLMs for competitive programming by enabling self-refinement through a skeptical agent and reinforceme
- [Risk-Aware Batch Testing for Performance Regression Detection](https://sciencetostartup.com/paper/risk-aware-batch-testing-for-performance-regression-detection) (8/10) — Building a CI tool to save over $490K annually in infrastructure costs by optimizing performance regression testing with
- [Cross-RAG: Zero-Shot Retrieval-Augmented Time Series Forecasting via Cross-Attention](https://sciencetostartup.com/paper/cross-rag-zero-shot-retrieval-augmented-time-series-forecasting-via-cross-attention) (8/10) — Cross-RAG enhances zero-shot time series forecasting by leveraging selective query-relevant retrieval.
- [CtrlCoT: Dual-Granularity Chain-of-Thought Compression for Controllable Reasoning](https://sciencetostartup.com/paper/ctrlcot-dual-granularity-chain-of-thought-compression-for-controllable-reasoning) (8/10) — Develop an AI tool for compressing reasoning chains in large language models to reduce latency and costs without sacrifi
- [CUA-Skill: Develop Skills for Computer Using Agent](https://sciencetostartup.com/paper/cua-skill-develop-skills-for-computer-using-agent) (8/10) — CUA-Skill provides a structured skills library for autonomous computer-using agents to enhance their efficiency and reli
- [CURE: Curriculum-guided Multi-task Training for Reliable Anatomy Grounded Report Generation](https://sciencetostartup.com/paper/cure-curriculum-guided-multi-task-training-for-reliable-anatomy-grounded-report-generation) (8/10) — CURE enhances medical report generation by improving visual grounding and factual consistency using a data-efficient cur
- [cuRoboV2: Dynamics-Aware Motion Generation with Depth-Fused Distance Fields for High-DoF Robots](https://sciencetostartup.com/paper/curobov2-dynamics-aware-motion-generation-with-depth-fused-distance-fields-for-high-dof-robots) (8/10) — cuRoboV2 is a unified, dynamics-aware motion generation stack that scales from single-arm manipulators to full humanoids
- [Adaptive Language-Aware Image Reflection Removal Network](https://sciencetostartup.com/paper/adaptive-language-aware-image-reflection-removal-network) (8/10) — ALANet removes complex image reflections using language guidance, even with inaccurate language descriptions, and provid
- [Adaptive Clinical-Aware Latent Diffusion for Multimodal Brain Image Generation and Missing Modality Imputation](https://sciencetostartup.com/paper/adaptive-clinical-aware-latent-diffusion-for-multimodal-brain-image-generation-and-missing-modality-imputation) (8/10) — AI framework using adaptive clinical-aware diffusion for generating complete brain imaging modalities in Alzheimer's dia
- [A 4D Representation for Training-Free Agentic Reasoning from Monocular Laparoscopic Video](https://sciencetostartup.com/paper/a-4d-representation-for-training-free-agentic-reasoning-from-monocular-laparoscopic-video) (8/10) — Enabling AI agents to perform spatiotemporal reasoning in surgery by grounding language models in a 4D representation of
- [Darwinian Memory: A Training-Free Self-Regulating Memory System for GUI Agent Evolution](https://sciencetostartup.com/paper/darwinian-memory-a-training-free-self-regulating-memory-system-for-gui-agent-evolution) (8/10) — Develop a training-free, self-evolving memory system for GUI automation that enhances MLLM agents' performance without a
- [Adaptive Activation Cancellation for Hallucination Mitigation in Large Language Models](https://sciencetostartup.com/paper/adaptive-activation-cancellation-for-hallucination-mitigation-in-large-language-models) (8/10) — Adaptive Activation Cancellation is a real-time framework that mitigates hallucinations in large language models without
- [daVinci-Env: Open SWE Environment Synthesis at Scale](https://sciencetostartup.com/paper/davinci-env-open-swe-environment-synthesis-at-scale) (8/10) — Building the largest open-source SWE environment for training scalable and verifiable software engineering agents.
- [Decoding the Human Factor: High Fidelity Behavioral Prediction for Strategic Foresight](https://sciencetostartup.com/paper/decoding-the-human-factor-high-fidelity-behavioral-prediction-for-strategic-foresight) (8/10) — Large Behavioral Model predicts individual strategic decisions for applications in foresight, negotiation, and decision 
- [Deep Expert Injection for Anchoring Retinal VLMs with Domain-Specific Knowledge](https://sciencetostartup.com/paper/deep-expert-injection-for-anchoring-retinal-vlms-with-domain-specific-knowledge) (8/10) — EyExIn anchors retinal VLMs with expert knowledge for precise ophthalmic diagnosis, outperforming proprietary systems an
- [AceTone: Bridging Words and Colors for Conditional Image Grading](https://sciencetostartup.com/paper/acetone-bridging-words-and-colors-for-conditional-image-grading) (8/10) — A unified framework for conditional image grading that bridges words and colors, producing visually pleasing and stylist
- [DeepRead: Document Structure-Aware Reasoning to Enhance Agentic Search](https://sciencetostartup.com/paper/deepread-document-structure-aware-reasoning-to-enhance-agentic-search) (8/10) — DeepRead enhances document question answering by utilizing structure-aware reasoning in LLMs for effective, human-like d
- [Deployment and Evaluation of an EHR-integrated, Large Language Model-Powered Tool to Triage Surgical Patients](https://sciencetostartup.com/paper/deployment-and-evaluation-of-an-ehr-integrated-large-language-model-powered-tool-to-triage-surgical-patients) (8/10) — An LLM-powered tool that automates the triage of surgical patients, integrating seamlessly with electronic health record
- [Deploying Semantic ID-based Generative Retrieval for Large-Scale Podcast Discovery at Spotify](https://sciencetostartup.com/paper/deploying-semantic-id-based-generative-retrieval-for-large-scale-podcast-discovery-at-spotify) (8/10) — GLIDE is a generative recommender system that enhances podcast discovery by combining semantic reasoning with user conte
- [Adaptive Radial Projection on Fourier Magnitude Spectrum for Document Image Skew Estimation](https://sciencetostartup.com/paper/adaptive-radial-projection-on-fourier-magnitude-spectrum-for-document-image-skew-estimation) (8/10) — A novel skew estimation method for document images with superior performance and available source code, ready for integr
- [Agentic Planning with Reasoning for Image Styling via Offline RL](https://sciencetostartup.com/paper/agentic-planning-with-reasoning-for-image-styling-via-offline-rl) (8/10) — Agentic planning with offline RL for image styling enables nuanced transformations via interpretable tool sequences, imp
- [Anchored Alignment: Preventing Positional Collapse in Multimodal Recommender Systems](https://sciencetostartup.com/paper/anchored-alignment-preventing-positional-collapse-in-multimodal-recommender-systems) (8/10) — AnchorRec is a multimodal recommendation framework that enhances item representations while preserving modality-specific
- [Design Behaviour Codes (DBCs): A Taxonomy-Driven Layered Governance Benchmark for Large Language Models](https://sciencetostartup.com/paper/design-behaviour-codes-dbcs-a-taxonomy-driven-layered-governance-benchmark-for-large-language-models) (8/10) — A governance layer to reduce risk exposure in large language models, enhancing compliance and safety.
- [ACD-U: Asymmetric co-teaching with machine unlearning for robust learning with noisy labels](https://sciencetostartup.com/paper/acd-u-asymmetric-co-teaching-with-machine-unlearning-for-robust-learning-with-noisy-labels) (8/10) — ACD-U is a robust noisy label learning framework that leverages asymmetric co-teaching and machine unlearning to achieve
- [Thinking Wrong in Silence: Backdoor Attacks on Continuous Latent Reasoning](https://sciencetostartup.com/paper/thinking-wrong-in-silence-backdoor-attacks-on-continuous-latent-reasoning) (8/10) — ThoughtSteer: A novel backdoor attack on continuous latent reasoning in language models that evades existing defenses an
- [AccelAes: Accelerating Diffusion Transformers for Training-Free Aesthetic-Enhanced Image Generation](https://sciencetostartup.com/paper/accelaes-accelerating-diffusion-transformers-for-training-free-aesthetic-enhanced-image-generation) (8/10) — AccelAes accelerates diffusion transformers for enhanced image generation by optimizing computation based on aesthetic d
- [Detection-Gated Glottal Segmentation with Zero-Shot Cross-Dataset Transfer and Clinical Feature Extraction](https://sciencetostartup.com/paper/detection-gated-glottal-segmentation-with-zero-shot-cross-dataset-transfer-and-clinical-feature-extraction) (8/10) — A zero-shot glottal segmentation AI for real-time clinical voice assessment using videoendoscopy.
- [ActiveUltraFeedback: Efficient Preference Data Generation using Active Learning](https://sciencetostartup.com/paper/activeultrafeedback-efficient-preference-data-generation-using-active-learning) (8/10) — ActiveUltraFeedback optimizes preference data generation for training language models using active learning techniques.
- [DexHiL: A Human-in-the-Loop Framework for Vision-Language-Action Model Post-Training in Dexterous Manipulation](https://sciencetostartup.com/paper/dexhil-a-human-in-the-loop-framework-for-vision-language-action-model-post-training-in-dexterous-manipulation) (8/10) — DexHiL is a human-in-the-loop framework that enhances dexterous manipulation in robotic systems through coordinated inte
- [A Unified Language Model for Large Scale Search, Recommendation, and Reasoning](https://sciencetostartup.com/paper/a-unified-language-model-for-large-scale-search-recommendation-and-reasoning) (8/10) — NEO is a unified language model that enhances recommendation, search, and reasoning across large catalogs with language-
- [Dial: A Knowledge-Grounded Dialect-Specific NL2SQL System](https://sciencetostartup.com/paper/dial-a-knowledge-grounded-dialect-specific-nl2sql-system) (8/10) — Dial is a knowledge-grounded NL2SQL system that translates natural language into dialect-specific SQL queries for hetero
- [A Swap-Adversarial Framework for Improving Domain Generalization in Electroencephalography-Based Parkinson's Disease Prediction](https://sciencetostartup.com/paper/a-swap-adversarial-framework-for-improving-domain-generalization-in-electroencephalography-based-parkinson-s-disease-pre) (8/10) — Swap-Adversarial Framework for enhanced Parkinson's prediction using ECoG data with strong domain generalization.
- [CARE: Privacy-Compliant Agentic Reasoning with Evidence Discordance](https://sciencetostartup.com/paper/care-privacy-compliant-agentic-reasoning-with-evidence-discordance) (8/10) — CARE is a privacy-compliant agentic reasoning framework that uses a remote LLM for guidance and a local LLM for decision
- [A Unified XAI-LLM Approach for EndotrachealSuctioning Activity Recognition](https://sciencetostartup.com/paper/a-unified-xai-llm-approach-for-endotrachealsuctioning-activity-recognition) (8/10) — Develop an AI-powered tool to improve nurse training in endotracheal suctioning through video-based activity recognition
- [Distilling LLM Reasoning into Graph of Concept Predictors](https://sciencetostartup.com/paper/distilling-llm-reasoning-into-graph-of-concept-predictors) (8/10) — GCP offers a reasoning-aware distillation framework to efficiently transfer LLM capabilities into lightweight, interpret
- [AdaClearGrasp: Learning Adaptive Clearing for Zero-Shot Robust Dexterous Grasping in Densely Cluttered Environments](https://sciencetostartup.com/paper/adacleargrasp-learning-adaptive-clearing-for-zero-shot-robust-dexterous-grasping-in-densely-cluttered-environments) (8/10) — AdaClearGrasp enables robots to adaptively decide between clearing obstacles or grasping targets in cluttered environmen
- [DNS-GT: A Graph-based Transformer Approach to Learn Embeddings of Domain Names from DNS Queries](https://sciencetostartup.com/paper/dns-gt-a-graph-based-transformer-approach-to-learn-embeddings-of-domain-names-from-dns-queries) (8/10) — DNS-GT leverages a Transformer-based model to enhance domain name embeddings for improved network intrusion detection.
- [A Novel Contrastive Loss for Zero-Day Network Intrusion Detection](https://sciencetostartup.com/paper/a-novel-contrastive-loss-for-zero-day-network-intrusion-detection) (8/10) — Revolutionize network security with a novel contrastive learning algorithm that excels in zero-day threat detection.
- [DocSage: An Information Structuring Agent for Multi-Doc Multi-Entity Question Answering](https://sciencetostartup.com/paper/docsage-an-information-structuring-agent-for-multi-doc-multi-entity-question-answering) (8/10) — DocSage is an advanced framework for multi-document multi-entity question answering that enhances relational reasoning a
- [A Novel Multi-Agent Architecture to Reduce Hallucinations of Large Language Models in Multi-Step Structural Modeling](https://sciencetostartup.com/paper/a-novel-multi-agent-architecture-to-reduce-hallucinations-of-large-language-models-in-multi-step-structural-modeling) (8/10) — Automate structural modeling and analysis with a multi-agent architecture that reduces hallucinations in LLMs, achieving
- [DrawSim-PD: Simulating Student Science Drawings to Support NGSS-Aligned Teacher Diagnostic Reasoning](https://sciencetostartup.com/paper/drawsim-pd-simulating-student-science-drawings-to-support-ngss-aligned-teacher-diagnostic-reasoning) (8/10) — DrawSim-PD is a generative framework for simulating student science drawings to enhance teacher diagnostic training unde
- [A Multi-task Large Reasoning Model for Molecular Science](https://sciencetostartup.com/paper/a-multi-task-large-reasoning-model-for-molecular-science) (8/10) — A multi-task reasoning model that enhances molecular science through structured reasoning and reflection.
- [DreamPlan: Efficient Reinforcement Fine-Tuning of Vision-Language Planners via Video World Models](https://sciencetostartup.com/paper/dreamplan-efficient-reinforcement-fine-tuning-of-vision-language-planners-via-video-world-models) (8/10) — DreamPlan enhances Vision-Language Models for robotic manipulation through efficient reinforcement fine-tuning using vid
- [PixelPrune: Pixel-Level Adaptive Visual Token Reduction via Predictive Coding](https://sciencetostartup.com/paper/pixelprune-pixel-level-adaptive-visual-token-reduction-via-predictive-coding) (8/10) — A training-free, pixel-level compression method that prunes redundant image patches before ViT encoding to accelerate do
- [DScheLLM: Enabling Dynamic Scheduling through a Fine-Tuned Dual-System Large language Model](https://sciencetostartup.com/paper/dschellm-enabling-dynamic-scheduling-through-a-fine-tuned-dual-system-large-language-model) (8/10) — DScheLLM revolutionizes dynamic production scheduling with a fine-tuned large language model for adaptive and intelligen
- [A Neuro-Symbolic Framework Combining Inductive and Deductive Reasoning for Autonomous Driving Planning](https://sciencetostartup.com/paper/a-neuro-symbolic-framework-combining-inductive-and-deductive-reasoning-for-autonomous-driving-planning) (8/10) — A neuro-symbolic framework for safe and interpretable trajectory planning in autonomous driving.
- [DT-BEHRT: Disease Trajectory-aware Transformer for Interpretable Patient Representation Learning](https://sciencetostartup.com/paper/dt-behrt-disease-trajectory-aware-transformer-for-interpretable-patient-representation-learning) (8/10) — DT-BEHRT leverages a graph-enhanced transformer for interpretable patient representation learning from electronic health
- [A Semi-Supervised Framework for Breast Ultrasound Segmentation with Training-Free Pseudo-Label Generation and Label Refinement](https://sciencetostartup.com/paper/a-semi-supervised-framework-for-breast-ultrasound-segmentation-with-training-free-pseudo-label-generation-and-label-refi) (8/10) — A semi-supervised breast ultrasound segmentation framework leveraging vision-language models for training-free pseudo-la
- [EmoScene: A Dual-space Dataset for Controllable Affective Image Generation](https://sciencetostartup.com/paper/emoscene-a-dual-space-dataset-for-controllable-affective-image-generation) (8/10) — EmoScene is a dual-space dataset and controllable generation framework for nuanced emotional image synthesis using diffu
- [Adaptation of Weakly Supervised Localization in Histopathology by Debiasing Predictions](https://sciencetostartup.com/paper/adaptation-of-weakly-supervised-localization-in-histopathology-by-debiasing-predictions) (8/10) — A novel method for improving weakly supervised localization in histopathology by debiasing predictions to enhance perfor
- [ECSEL: Explainable Classification via Signomial Equation Learning](https://sciencetostartup.com/paper/ecsel-explainable-classification-via-signomial-equation-learning) (8/10) — ECSEL provides an efficient, explainable classification tool for exposing biases and supporting counterfactual reasoning
- [A Lightweight Modular Framework for Constructing Autonomous Agents Driven by Large Language Models: Design, Implementation, and Applications in AgentForge](https://sciencetostartup.com/paper/a-lightweight-modular-framework-for-constructing-autonomous-agents-driven-by-large-language-models-design-implementation) (8/10) — AgentForge is an open-source Python framework simplifying the creation and deployment of LLM-driven autonomous agents.
- [Efficient Reasoning with Balanced Thinking](https://sciencetostartup.com/paper/efficient-reasoning-with-balanced-thinking) (8/10) — ReBalance is a training-free framework that enhances reasoning efficiency in Large Reasoning Models by balancing overthi
- [A Hybrid Vision Transformer Approach for Mathematical Expression Recognition](https://sciencetostartup.com/paper/a-hybrid-vision-transformer-approach-for-mathematical-expression-recognition) (8/10) — A novel Hybrid Vision Transformer for mathematical expression recognition that outperforms state-of-the-art methods, ena
- [Efficiently Aligning Draft Models via Parameter- and Data-Efficient Adaptation](https://sciencetostartup.com/paper/efficiently-aligning-draft-models-via-parameter-and-data-efficient-adaptation) (8/10) — Efficient Draft Adaptation (EDA) optimizes LLM fine-tuning by reducing training costs while enhancing performance throug
- [A Lightweight Multi-Cancer Tumor Localization Framework for Deployable Digital Pathology](https://sciencetostartup.com/paper/a-lightweight-multi-cancer-tumor-localization-framework-for-deployable-digital-pathology) (8/10) — A robust multi-cancer tumor localization framework that enhances digital pathology workflows.
- [A Learnable Wavelet Transformer for Long-Short Equity Trading and Risk-Adjusted Return Optimization](https://sciencetostartup.com/paper/a-learnable-wavelet-transformer-for-long-short-equity-trading-and-risk-adjusted-return-optimization) (8/10) — A cutting-edge AI system for optimizing intraday equity trading strategies using wavelet-based transformations.
- [A Guideline-Aware AI Agent for Zero-Shot Target Volume Auto-Delineation](https://sciencetostartup.com/paper/a-guideline-aware-ai-agent-for-zero-shot-target-volume-auto-delineation) (8/10) — OncoAgent is a guideline-aware AI agent that automates target volume delineation in radiotherapy without the need for re
- [Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs](https://sciencetostartup.com/paper/claudini-autoresearch-discovers-state-of-the-art-adversarial-attack-algorithms-for-llms) (8/10) — Claudini autonomously discovers advanced adversarial attacks on LLMs, offering cutting-edge cybersecurity solutions.
- [Enhancing TableQA through Verifiable Reasoning Trace Reward](https://sciencetostartup.com/paper/enhancing-tableqa-through-verifiable-reasoning-trace-reward) (8/10) — RE-Tab enhances TableQA with a plug-and-play framework that boosts model reasoning using verifiable reward feedback, off
- [AirDDE: Multifactor Neural Delay Differential Equations for Air Quality Forecasting](https://sciencetostartup.com/paper/airdde-multifactor-neural-delay-differential-equations-for-air-quality-forecasting) (8/10) — AirDDE leverages neural delay differential equations for improved air quality forecasting by integrating delay modeling 
- [A protocol for evaluating robustness to H&E staining variation in computational pathology models](https://sciencetostartup.com/paper/a-protocol-for-evaluating-robustness-to-h-e-staining-variation-in-computational-pathology-models) (8/10) — A protocol for evaluating the robustness of computational pathology models to H&E staining variations.
- [When AI Meets Early Childhood Education: Large Language Models as Assessment Teammates in Chinese Preschools](https://sciencetostartup.com/paper/when-ai-meets-early-childhood-education-large-language-models-as-assessment-teammates-in-chinese-preschools) (8/10) — AI tool automating teacher-child interaction quality assessments in Chinese preschools for scalable, continuous monitori
- [A Multi-Objective Optimization Approach for Sustainable AI-Driven Entrepreneurship in Resilient Economies](https://sciencetostartup.com/paper/a-multi-objective-optimization-approach-for-sustainable-ai-driven-entrepreneurship-in-resilient-economies) (8/10) — EcoAI-Resilience framework optimizes AI deployment for sustainability, economic resilience, and environmental cost minim
- [Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing](https://sciencetostartup.com/paper/boosting-document-parsing-efficiency-and-performance-with-coarse-to-fine-visual-processing) (8/10) — PaddleOCR-VL enhances document parsing efficiency by focusing on semantically relevant regions with a coarse-to-fine pro
- [A Brain-inspired Embodied Intelligence for Fluid and Fast Reflexive Robotics Control](https://sciencetostartup.com/paper/a-brain-inspired-embodied-intelligence-for-fluid-and-fast-reflexive-robotics-control) (8/10) — NeuroVLA is a neuromorphic robotics framework offering energy-efficient, biologically inspired motor control for advance
- [EvoGuard: An Extensible Agentic RL-based Framework for Practical and Evolving AI-Generated Image Detection](https://sciencetostartup.com/paper/evoguard-an-extensible-agentic-rl-based-framework-for-practical-and-evolving-ai-generated-image-detection) (8/10) — EvoGuard is an extensible framework for detecting AI-generated images using a dynamic orchestration of multimodal detect
- [A 360-degree Multi-camera System for Blue Emergency Light Detection Using Color Attention RT-DETR and the ABLDataset](https://sciencetostartup.com/paper/a-360-degree-multi-camera-system-for-blue-emergency-light-detection-using-color-attention-rt-detr-and-the-abldataset) (8/10) — Develop a multi-camera system for detecting blue emergency lights to enhance ADAS and road safety.
- [LLM for Large-Scale Optimization Model Auto-Formulation: A Lightweight Few-Shot Learning Approach](https://sciencetostartup.com/paper/llm-for-large-scale-optimization-model-auto-formulation-a-lightweight-few-shot-learning-approach) (8/10) — Streamline large-scale business optimization with an LLM-driven auto-formulation tool enhanced by benchmarks and real-wo
- [A Closed-Loop Multi-Agent System Driven by LLMs for Meal-Level Personalized Nutrition Management](https://sciencetostartup.com/paper/a-closed-loop-multi-agent-system-driven-by-llms-for-meal-level-personalized-nutrition-management) (8/10) — A mobile app using LLM-driven agents to provide personalized nutrition by analyzing meal photos and adapting meal plans.
- [Efficient Protein Optimization via Structure-aware Hamiltonian Dynamics](https://sciencetostartup.com/paper/efficient-protein-optimization-via-structure-aware-hamiltonian-dynamics) (8/10) — HADES uses Hamiltonian dynamics for efficient protein sequence optimization, enhancing drug and enzyme development.
- [4DEquine: Disentangling Motion and Appearance for 4D Equine Reconstruction from Monocular Video](https://sciencetostartup.com/paper/4dequine-disentangling-motion-and-appearance-for-4d-equine-reconstruction-from-monocular-video) (8/10) — 4DEquine offers a novel approach to 4D reconstruction of equines from monocular video, enhancing animal welfare through 
- [EvoScientist: Towards Multi-Agent Evolving AI Scientists for End-to-End Scientific Discovery](https://sciencetostartup.com/paper/evoscientist-towards-multi-agent-evolving-ai-scientists-for-end-to-end-scientific-discovery) (8/10) — EvoScientist is a multi-agent AI scientist framework that evolves research strategies through persistent memory, enablin
- [Signals: Trajectory Sampling and Triage for Agentic Interactions](https://sciencetostartup.com/paper/signals-trajectory-sampling-and-triage-for-agentic-interactions) (8/10) — A lightweight, signal-based framework for triaging agentic interaction trajectories to improve post-deployment optimizat
- [Exp-Force: Experience-Conditioned Pre-Grasp Force Selection with Vision-Language Models](https://sciencetostartup.com/paper/exp-force-experience-conditioned-pre-grasp-force-selection-with-vision-language-models) (8/10) — Exp-Force uses a vision-language model conditioned on prior grasping experiences to predict the minimum feasible graspin
- [98$\times$ Faster LLM Routing Without a Dedicated GPU: Flash Attention, Prompt Compression, and Near-Streaming for the vLLM Semantic Router](https://sciencetostartup.com/paper/98-times-faster-llm-routing-without-a-dedicated-gpu-flash-attention-prompt-compression-and-near-streaming-for-the-vllm-s) (8/10) — A high-performance semantic router for LLMs that dramatically reduces latency and memory usage without needing a dedicat
- [Explainable Innovation Engine: Dual-Tree Agent-RAG with Methods-as-Nodes and Verifiable Write-Back](https://sciencetostartup.com/paper/explainable-innovation-engine-dual-tree-agent-rag-with-methods-as-nodes-and-verifiable-write-back) (8/10) — A novel Explainable Innovation Engine that enhances retrieval-augmented generation with methods-as-nodes for improved co
- [A Contrastive Learning Framework Empowered by Attention-based Feature Adaptation for Street-View Image Classification](https://sciencetostartup.com/paper/a-contrastive-learning-framework-empowered-by-attention-based-feature-adaptation-for-street-view-image-classification) (8/10) — "CLIP-MHAdapter offers efficient and accurate street-view image classification by leveraging an adaptive contrastive lea
- [A Multi-scale Linear-time Encoder for Whole-Slide Image Analysis](https://sciencetostartup.com/paper/a-multi-scale-linear-time-encoder-for-whole-slide-image-analysis) (8/10) — MARBLE offers a scalable, efficient tool for multi-scale whole-slide image analysis with significant accuracy improvemen
- [AdaptEvolve: Improving Efficiency of Evolutionary AI Agents through Adaptive Model Selection](https://sciencetostartup.com/paper/adaptevolve-improving-efficiency-of-evolutionary-ai-agents-through-adaptive-model-selection) (8/10) — AdaptEvolve optimizes AI agent efficiency by dynamically selecting the best-suited LLM for each decision point, cutting 
- [A^3: Towards Advertising Aesthetic Assessment](https://sciencetostartup.com/paper/a-3-towards-advertising-aesthetic-assessment) (8/10) — A framework and multimodal LLM for objective, scalable, and interpretable assessment of advertising image aesthetics to 
- [AnchorVLA4D: an Anchor-Based Spatial-Temporal Vision-Language-Action Model for Robotic Manipulation](https://sciencetostartup.com/paper/anchorvla4d-an-anchor-based-spatial-temporal-vision-language-action-model-for-robotic-manipulation) (8/10) — AnchorVLA4D enhances robotic manipulation by integrating visual anchors for improved spatial-temporal reasoning.
- [Schema on the Inside: A Two-Phase Fine-Tuning Method for High-Efficiency Text-to-SQL at Scale](https://sciencetostartup.com/paper/schema-on-the-inside-a-two-phase-fine-tuning-method-for-high-efficiency-text-to-sql-at-scale) (8/10) — A self-hosted, specialized text-to-SQL model that drastically cuts API costs and latency for conversational data queryin
- [Decoding the Critique Mechanism in Large Reasoning Models](https://sciencetostartup.com/paper/decoding-the-critique-mechanism-in-large-reasoning-models) (8/10) — A study revealing the hidden critique ability in Large Reasoning Models to enhance error detection and self-correction.
- [Leave No Stone Unturned: Uncovering Holistic Audio-Visual Intrinsic Coherence for Deepfake Detection](https://sciencetostartup.com/paper/leave-no-stone-unturned-uncovering-holistic-audio-visual-intrinsic-coherence-for-deepfake-detection) (8/10) — A novel deepfake detection system that leverages intrinsic audio-visual coherence, outperforming state-of-the-art with a
- [CritiSense: Critical Digital Literacy and Resilience Against Misinformation](https://sciencetostartup.com/paper/critisense-critical-digital-literacy-and-resilience-against-misinformation) (8/10) — CritiSense boosts digital literacy to combat misinformation through a multilingual mobile app with interactive challenge
- [Uncertainty-Aware Vision-based Risk Object Identification via Conformal Risk Tube Prediction](https://sciencetostartup.com/paper/uncertainty-aware-vision-based-risk-object-identification-via-conformal-risk-tube-prediction) (8/10) — A novel AI system for hazard detection in intelligent driving that quantifies risk uncertainty to improve safety and red
- [DISCOVER: A Solver for Distributional Counterfactual Explanations](https://sciencetostartup.com/paper/discover-a-solver-for-distributional-counterfactual-explanations) (8/10) — DISCOVER is a model-agnostic solver that enhances distributional counterfactual explanations for non-differentiable mode
- [Semantic Iterative Reconstruction: One-Shot Universal Anomaly Detection](https://sciencetostartup.com/paper/semantic-iterative-reconstruction-one-shot-universal-anomaly-detection) (8/10) — A universal AI model that detects anomalies in medical images with minimal normal samples, outperforming existing method
- [Point-to-Mask: From Arbitrary Point Annotations to Mask-Level Infrared Small Target Detection](https://sciencetostartup.com/paper/point-to-mask-from-arbitrary-point-annotations-to-mask-level-infrared-small-target-detection) (8/10) — Point-to-Mask revolutionizes infrared small target detection by transforming low-cost point annotations into accurate ma
- [Mind the Hitch: Dynamic Calibration and Articulated Perception for Autonomous Trucks](https://sciencetostartup.com/paper/mind-the-hitch-dynamic-calibration-and-articulated-perception-for-autonomous-trucks) (8/10) — A vision-based framework for autonomous trucks that dynamically calibrates perception systems to handle articulated trai
- [Ablate and Rescue: A Causal Analysis of Residual Stream Hyper-Connections](https://sciencetostartup.com/paper/ablate-and-rescue-a-causal-analysis-of-residual-stream-hyper-connections) (8/10) — An open-source multi-stream transformer model that addresses representation collapse through causal analysis of residual
- [FeedAIde: Guiding App Users to Submit Rich Feedback Reports by Asking Context-Aware Follow-Up Questions](https://sciencetostartup.com/paper/feedaide-guiding-app-users-to-submit-rich-feedback-reports-by-asking-context-aware-follow-up-questions) (8/10) — FeedAIde enriches mobile app feedback by guiding users with smart follow-up questions, improving interaction between dev
- [ReFORM: Review-aggregated Profile Generation via LLM with Multi-Factor Attention for Restaurant Recommendation](https://sciencetostartup.com/paper/reform-review-aggregated-profile-generation-via-llm-with-multi-factor-attention-for-restaurant-recommendation) (8/10) — ReFORM enhances restaurant recommendations by generating user and item profiles from reviews using LLMs and multi-factor
- [FG-CLTP: Fine-Grained Contrastive Language Tactile Pretraining for Robotic Manipulation](https://sciencetostartup.com/paper/fg-cltp-fine-grained-contrastive-language-tactile-pretraining-for-robotic-manipulation) (8/10) — FG-CLTP enhances robotic manipulation by integrating fine-grained tactile sensing with vision-language-action models.
- [LLM-Powered Flood Depth Estimation from Social Media Imagery: A Vision-Language Model Framework with Mechanistic Interpretability for Transportation Resilience](https://sciencetostartup.com/paper/llm-powered-flood-depth-estimation-from-social-media-imagery-a-vision-language-model-framework-with-mechanistic-interpre) (8/10) — FloodLlama is an open-source vision-language model for real-time flood depth estimation from social media imagery, enhan
- [Open, Reliable, and Collective: A Community-Driven Framework for Tool-Using AI Agents](https://sciencetostartup.com/paper/open-reliable-and-collective-a-community-driven-framework-for-tool-using-ai-agents) (8/10) — A community-driven framework for reliable tool-using AI agents with standardized schemas, plug-and-play wrappers, and au
- [ECHO: Edge-Cloud Humanoid Orchestration for Language-to-Motion Control](https://sciencetostartup.com/paper/echo-edge-cloud-humanoid-orchestration-for-language-to-motion-control) (8/10) — ECHO enables language-driven control of humanoid robots through an innovative edge-cloud framework.
- [FL-MedSegBench: A Comprehensive Benchmark for Federated Learning on Medical Image Segmentation](https://sciencetostartup.com/paper/fl-medsegbench-a-comprehensive-benchmark-for-federated-learning-on-medical-image-segmentation) (8/10) — FL-MedSegBench is a benchmark toolkit for evaluating federated learning methods in medical image segmentation.
- [NavThinker: Action-Conditioned World Models for Coupled Prediction and Planning in Social Navigation](https://sciencetostartup.com/paper/navthinker-action-conditioned-world-models-for-coupled-prediction-and-planning-in-social-navigation) (8/10) — NavThinker offers a future-aware framework for social navigation using action-conditioned world models and reinforcement
- [FlashSampling: Fast and Memory-Efficient Exact Sampling](https://sciencetostartup.com/paper/flashsampling-fast-and-memory-efficient-exact-sampling) (8/10) — FlashSampling optimizes large-vocabulary decoding by integrating exact sampling directly into the matrix multiplication 
- [Fine-tuning RoBERTa for CVE-to-CWE Classification: A 125M Parameter Model Competitive with LLMs](https://sciencetostartup.com/paper/fine-tuning-roberta-for-cve-to-cwe-classification-a-125m-parameter-model-competitive-with-llms) (8/10) — A lightweight model for precise CVE-to-CWE classification enhancing cybersecurity vulnerability management.
- [Architecture-Agnostic Feature Synergy for Universal Defense Against Heterogeneous Generative Threats](https://sciencetostartup.com/paper/architecture-agnostic-feature-synergy-for-universal-defense-against-heterogeneous-generative-threats) (8/10) — A framework for universal defense against diverse generative threats using architecture-agnostic feature synergy.
- [LLMind: Bio-inspired Training-free Adaptive Visual Representations for Vision-Language Models](https://sciencetostartup.com/paper/llmind-bio-inspired-training-free-adaptive-visual-representations-for-vision-language-models) (8/10) — LLMind offers a training-free framework for adaptive visual representations in Vision-Language Models, enhancing efficie
- [AutoMoT: A Unified Vision-Language-Action Model with Asynchronous Mixture-of-Transformers for End-to-End Autonomous Driving](https://sciencetostartup.com/paper/automot-a-unified-vision-language-action-model-with-asynchronous-mixture-of-transformers-for-end-to-end-autonomous-drivi) (8/10) — A unified vision-language-action model for enhancing autonomous driving performance through efficient reasoning and acti
- [ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors](https://sciencetostartup.com/paper/expertgen-scalable-sim-to-real-expert-policy-learning-from-imperfect-behavior-priors) (8/10) — ExpertGen automates expert policy learning in simulation for scalable sim-to-real transfer in robotics.
- [ForgeDreamer: Industrial Text-to-3D Generation with Multi-Expert LoRA and Cross-View Hypergraph](https://sciencetostartup.com/paper/forgedreamer-industrial-text-to-3d-generation-with-multi-expert-lora-and-cross-view-hypergraph) (8/10) — ForgeDreamer revolutionizes industrial text-to-3D generation by leveraging a Multi-Expert LoRA Ensemble and Cross-View H
- [VectorWorld: Efficient Streaming World Model via Diffusion Flow on Vector Graphs](https://sciencetostartup.com/paper/vectorworld-efficient-streaming-world-model-via-diffusion-flow-on-vector-graphs) (8/10) — VectorWorld offers real-time, high-fidelity autonomous driving simulation using novel vector graph diffusion flows.
- [Frequency-Modulated Visual Restoration for Matryoshka Large Multimodal Models](https://sciencetostartup.com/paper/frequency-modulated-visual-restoration-for-matryoshka-large-multimodal-models) (8/10) — FMVR enhances visual semantics in large multimodal models while reducing computational load.
- [MoE-ACT: Scaling Multi-Task Bimanual Manipulation with Sparse Language-Conditioned Mixture-of-Experts Transformers](https://sciencetostartup.com/paper/moe-act-scaling-multi-task-bimanual-manipulation-with-sparse-language-conditioned-mixture-of-experts-transformers) (8/10) — MoE-ACT enhances robotic manipulation by integrating language-conditioned Mixture-of-Experts into a lightweight multi-ta
- [From Horizontal to Rotated: Cross-View Object Geo-Localization with Orientation Awareness](https://sciencetostartup.com/paper/from-horizontal-to-rotated-cross-view-object-geo-localization-with-orientation-awareness) (8/10) — OSGeo revolutionizes cross-view object geo-localization by using Rotated Bounding Boxes for high precision with lower an
- [HYDRA: Unifying Multi-modal Generation and Understanding via Representation-Harmonized Tokenization](https://sciencetostartup.com/paper/hydra-unifying-multi-modal-generation-and-understanding-via-representation-harmonized-tokenization) (8/10) — HYDRA-TOK unifies visual understanding and generation through a novel representation-harmonized approach.
- [When AI and Experts Agree on Error: Intrinsic Ambiguity in Dermatoscopic Images](https://sciencetostartup.com/paper/when-ai-and-experts-agree-on-error-intrinsic-ambiguity-in-dermatoscopic-images) (8/10) — This research identifies intrinsic ambiguity in dermatoscopic images that challenges both AI and human experts, suggesti
- [PiGRAND: Physics-informed Graph Neural Diffusion for Intelligent Additive Manufacturing](https://sciencetostartup.com/paper/pigrand-physics-informed-graph-neural-diffusion-for-intelligent-additive-manufacturing) (8/10) — PiGRAND leverages physics-informed graph neural diffusion to optimize heat transport in 3D printing applications.
- [FusionNet: a frame interpolation network for 4D heart models](https://sciencetostartup.com/paper/fusionnet-a-frame-interpolation-network-for-4d-heart-models) (8/10) — FusionNet enhances cardiac imaging by providing high-resolution 4D heart models from short CMR scans.
- [Intelligent Co-Design: An Interactive LLM Framework for Interior Spatial Design via Multi-Modal Agents](https://sciencetostartup.com/paper/intelligent-co-design-an-interactive-llm-framework-for-interior-spatial-design-via-multi-modal-agents) (8/10) — An interactive LLM framework that transforms natural language and imagery into optimized 3D interior designs, enhancing 
- [GazeMoE: Perception of Gaze Target with Mixture-of-Experts](https://sciencetostartup.com/paper/gazemoe-perception-of-gaze-target-with-mixture-of-experts) (8/10) — GazeMoE is an end-to-end framework that selectively leverages gaze-target-related cues from a frozen foundation model th
- [AdaRubric: Task-Adaptive Rubrics for LLM Agent Evaluation](https://sciencetostartup.com/paper/adarubric-task-adaptive-rubrics-for-llm-agent-evaluation) (8/10) — Develops a dynamic rubric generation system for LLM agents that significantly improves evaluation accuracy and agent per
- [GeM-VG: Towards Generalized Multi-image Visual Grounding with Multimodal Large Language Models](https://sciencetostartup.com/paper/gem-vg-towards-generalized-multi-image-visual-grounding-with-multimodal-large-language-models) (8/10) — GeM-VG offers superior multi-image visual grounding capabilities, leveraging a novel dataset and hybrid reinforcement fi
- [DreamControl-v2: Simpler and Scalable Autonomous Humanoid Skills via Trainable Guided Diffusion Priors](https://sciencetostartup.com/paper/dreamcontrol-v2-simpler-and-scalable-autonomous-humanoid-skills-via-trainable-guided-diffusion-priors) (8/10) — Scalable autonomous humanoid skills using trainable guided diffusion models trained on diverse motion data.
- [GenHOI: Towards Object-Consistent Hand-Object Interaction with Temporally Balanced and Spatially Selective Object Injection](https://sciencetostartup.com/paper/genhoi-towards-object-consistent-hand-object-interaction-with-temporally-balanced-and-spatially-selective-object-injecti) (8/10) — GenHOI enhances video generation models with object-consistent hand-object interaction by injecting reference object inf
- [ShortCoder: Knowledge-Augmented Syntax Optimization for Token-Efficient Code Generation](https://sciencetostartup.com/paper/shortcoder-knowledge-augmented-syntax-optimization-for-token-efficient-code-generation) (8/10) — ShortCoder optimizes code generation by reducing token usage while maintaining functionality and readability.
- [GigaWorld-Policy: An Efficient Action-Centered World--Action Model](https://sciencetostartup.com/paper/gigaworld-policy-an-efficient-action-centered-world-action-model) (8/10) — GigaWorld-Policy revolutionizes robot policy learning with an efficient action-centered model that enhances performance 
- [GIAT: A Geologically-Informed Attention Transformer for Lithology Identification](https://sciencetostartup.com/paper/giat-a-geologically-informed-attention-transformer-for-lithology-identification) (8/10) — GIAT is a novel Transformer framework that enhances lithology identification by integrating geological priors for improv
- [GIST: Gauge-Invariant Spectral Transformers for Scalable Graph Neural Operators](https://sciencetostartup.com/paper/gist-gauge-invariant-spectral-transformers-for-scalable-graph-neural-operators) (8/10) — GIST is a novel graph transformer architecture that achieves scalable, gauge-invariant learning for graph-structured dat
- [IRIS: Intersection-aware Ray-based Implicit Editable Scenes](https://sciencetostartup.com/paper/iris-intersection-aware-ray-based-implicit-editable-scenes) (8/10) — IRIS enables efficient and interactive editing of 3D scenes using advanced ray-based techniques.
- [Global Cross-Modal Geo-Localization: A Million-Scale Dataset and a Physical Consistency Learning Framework](https://sciencetostartup.com/paper/global-cross-modal-geo-localization-a-million-scale-dataset-and-a-physical-consistency-learning-framework) (8/10) — CORE is a million-scale dataset for cross-modal geo-localization, enabling a physical-law-aware network (PLANET) that si
- [TextOVSR: Text-Guided Real-World Opera Video Super-Resolution](https://sciencetostartup.com/paper/textovsr-text-guided-real-world-opera-video-super-resolution) (8/10) — TextOVSR leverages text prompts to enhance the super-resolution of degraded opera videos, outperforming existing methods
- [Q-Mask: Query-driven Causal Masks for Text Anchoring in OCR-Oriented Vision-Language Models](https://sciencetostartup.com/paper/q-mask-query-driven-causal-masks-for-text-anchoring-in-ocr-oriented-vision-language-models) (8/10) — Q-Mask is an OCR framework using query-driven causal masks for accurate text anchoring in vision-language models, traine
- [Real-Time Oriented Object Detection Transformer in Remote Sensing Images](https://sciencetostartup.com/paper/real-time-oriented-object-detection-transformer-in-remote-sensing-images) (8/10) — A real-time oriented object detection transformer that improves angle representation and training stability for remote s
- [GR-SAP: Generative Replay for Safety Alignment Preservation during Fine-Tuning](https://sciencetostartup.com/paper/gr-sap-generative-replay-for-safety-alignment-preservation-during-fine-tuning) (8/10) — GR-SAP is a framework that synthesizes domain-specific alignment data to preserve safety alignment in fine-tuning large 
- [InterveneBench: Benchmarking LLMs for Intervention Reasoning and Causal Study Design in Real Social Systems](https://sciencetostartup.com/paper/intervenebench-benchmarking-llms-for-intervention-reasoning-and-causal-study-design-in-real-social-systems) (8/10) — InterveneBench benchmarks LLMs for intervention reasoning in social science, enhancing causal study design.
- [Graph-Native Cognitive Memory for AI Agents: Formal Belief Revision Semantics for Versioned Memory Architectures](https://sciencetostartup.com/paper/graph-native-cognitive-memory-for-ai-agents-formal-belief-revision-semantics-for-versioned-memory-architectures) (8/10) — Kumiho is a graph-native cognitive memory architecture that enhances AI agents' memory capabilities through formal belie
- [KG-Hopper: Empowering Compact Open LLMs with Knowledge Graph Reasoning via Reinforcement Learning](https://sciencetostartup.com/paper/kg-hopper-empowering-compact-open-llms-with-knowledge-graph-reasoning-via-reinforcement-learning) (8/10) — A reinforcement learning framework that enables compact open LLMs to perform multi-hop knowledge graph reasoning in a si
- [Grounding Synthetic Data Generation With Vision and Language Models](https://sciencetostartup.com/paper/grounding-synthetic-data-generation-with-vision-and-language-models) (8/10) — A vision-language framework for interpretable synthetic data generation and evaluation in remote sensing.
- [HARMONI: Multimodal Personalization of Multi-User Human-Robot Interactions with LLMs](https://sciencetostartup.com/paper/harmoni-multimodal-personalization-of-multi-user-human-robot-interactions-with-llms) (8/10) — HARMONI enhances human-robot interactions with personalized, multimodal capabilities for multi-user environments.
- [Harnessing the Power of Foundation Models for Accurate Material Classification](https://sciencetostartup.com/paper/harnessing-the-power-of-foundation-models-for-accurate-material-classification) (8/10) — A novel framework leveraging foundation models to enhance material classification accuracy through innovative dataset ge
- [HeartAgent: An Autonomous Agent System for Explainable Differential Diagnosis in Cardiology](https://sciencetostartup.com/paper/heartagent-an-autonomous-agent-system-for-explainable-differential-diagnosis-in-cardiology) (8/10) — HeartAgent is an autonomous agent system that enhances differential diagnosis in cardiology with explainable AI.
- [Health Facility Location in Ethiopia: Leveraging LLMs to Integrate Expert Knowledge into Algorithmic Planning](https://sciencetostartup.com/paper/health-facility-location-in-ethiopia-leveraging-llms-to-integrate-expert-knowledge-into-algorithmic-planning) (8/10) — A framework integrating LLMs and optimization to enhance health facility location planning in Ethiopia.
- [FecalFed: Privacy-Preserving Poultry Disease Detection via Federated Learning](https://sciencetostartup.com/paper/fecalfed-privacy-preserving-poultry-disease-detection-via-federated-learning) (8/10) — A privacy-preserving federated learning framework for poultry disease detection using fecal imaging, with a curated and 
- [HGP-Mamba: Integrating Histology and Generated Protein Features for Mamba-based Multimodal Survival Risk Prediction](https://sciencetostartup.com/paper/hgp-mamba-integrating-histology-and-generated-protein-features-for-mamba-based-multimodal-survival-risk-prediction) (8/10) — HGP-Mamba integrates histology and generated protein features for advanced cancer survival risk prediction.
- [EditHF-1M: A Million-Scale Rich Human Preference Feedback for Image Editing](https://sciencetostartup.com/paper/edithf-1m-a-million-scale-rich-human-preference-feedback-for-image-editing) (8/10) — A million-scale human preference dataset and evaluation model for optimizing text-guided image editing.
- [Hierarchical Orthogonal Residual Spread for Precise Massive Editing in Large Language Models](https://sciencetostartup.com/paper/hierarchical-orthogonal-residual-spread-for-precise-massive-editing-in-large-language-models) (8/10) — HORSE offers a groundbreaking method for precise, massive, and stable editing of large language models.
- [SlovKE: A Large-Scale Dataset and LLM Evaluation for Slovak Keyphrase Extraction](https://sciencetostartup.com/paper/slovke-a-large-scale-dataset-and-llm-evaluation-for-slovak-keyphrase-extraction) (8/10) — SlovKE provides a large-scale dataset and LLM evaluation for keyphrase extraction in Slovak, addressing a critical gap i
- [TR-ICRL: Test-Time Rethinking for In-Context Reinforcement Learning](https://sciencetostartup.com/paper/tr-icrl-test-time-rethinking-for-in-context-reinforcement-learning) (8/10) — A framework for LLMs to learn from external rewards during inference by using retrieved instances and pseudo-labels for 
- [Token Coherence: Adapting MESI Cache Protocols to Minimize Synchronization Overhead in Multi-Agent LLM Systems](https://sciencetostartup.com/paper/token-coherence-adapting-mesi-cache-protocols-to-minimize-synchronization-overhead-in-multi-agent-llm-systems) (8/10) — A system that minimizes synchronization overhead in multi-agent LLMs by adapting MESI cache protocols.
- [History Is Not Enough: An Adaptive Dataflow System for Financial Time-Series Synthesis](https://sciencetostartup.com/paper/history-is-not-enough-an-adaptive-dataflow-system-for-financial-time-series-synthesis) (8/10) — An adaptive dataflow system that improves financial trading model robustness and performance through dynamic data manage
- [$Ψ_0$: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation](https://sciencetostartup.com/paper/0-an-open-foundation-model-towards-universal-humanoid-loco-manipulation) (8/10) — Psi-Zero open sources a superior foundation model for humanoid robot loco-manipulation tasks with state-of-the-art perfo
- [Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence](https://sciencetostartup.com/paper/holi-spatial-evolving-video-streams-into-holistic-3d-spatial-intelligence) (8/10) — Holi-Spatial is a large-scale, automatically generated 3D spatial dataset that significantly improves performance on spa
- [How to Build AI Agents by Augmenting LLMs with Codified Human Expert Domain Knowledge? A Software Engineering Framework](https://sciencetostartup.com/paper/how-to-build-ai-agents-by-augmenting-llms-with-codified-human-expert-domain-knowledge-a-software-engineering-framework) (8/10) — Transform specialized domain knowledge into AI agents for expert-level visualization generation.
- [How to Model Your Crazyflie Brushless](https://sciencetostartup.com/paper/how-to-model-your-crazyflie-brushless) (8/10) — An open-source dynamics model for the Crazyflie Brushless, enabling rapid development and testing of agile nano-quadcopt
- [EnterpriseLab: A Full-Stack Platform for developing and deploying agents in Enterprises](https://sciencetostartup.com/paper/enterpriselab-a-full-stack-platform-for-developing-and-deploying-agents-in-enterprises) (8/10) — EnterpriseLab is a full-stack platform enabling enterprises to develop and deploy specialized, cost-effective AI agents 
- [HumanDiffusion: A Vision-Based Diffusion Trajectory Planner with Human-Conditioned Goals for Search and Rescue UAV](https://sciencetostartup.com/paper/humandiffusion-a-vision-based-diffusion-trajectory-planner-with-human-conditioned-goals-for-search-and-rescue-uav) (8/10) — Develop a UAV trajectory planner that uses vision-based diffusion for delivering medical aid in disaster scenarios.
- [Iterative Refinement Improves Compositional Image Generation](https://sciencetostartup.com/paper/iterative-refinement-improves-compositional-image-generation) (8/10) — Revolutionizing text-to-image generation by implementing iterative refinement with vision-language model feedback for hi
- [Hunt Globally: Deep Research AI Agents for Drug Asset Scouting in Investing, Business Development, and Search & Evaluation](https://sciencetostartup.com/paper/hunt-globally-deep-research-ai-agents-for-drug-asset-scouting-in-investing-business-development-and-search-evaluation) (8/10) — AI agents for comprehensive global drug asset scouting in biopharma investments.
- [DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing](https://sciencetostartup.com/paper/dreamlite-a-lightweight-on-device-unified-model-for-image-generation-and-editing) (8/10) — DreamLite provides efficient on-device image generation and editing within a single compact model.
- [ILV: Iterative Latent Volumes for Fast and Accurate Sparse-View CT Reconstruction](https://sciencetostartup.com/paper/ilv-iterative-latent-volumes-for-fast-and-accurate-sparse-view-ct-reconstruction) (8/10) — ILV is a novel framework for fast and accurate 3D reconstruction from sparse-view CT projections, enhancing clinical ima
- [CiQi-Agent: Aligning Vision, Tools and Aesthetics in Multimodal Agent for Cultural Reasoning on Chinese Porcelains](https://sciencetostartup.com/paper/ciqi-agent-aligning-vision-tools-and-aesthetics-in-multimodal-agent-for-cultural-reasoning-on-chinese-porcelains) (8/10) — Develop an AI-powered platform for accessible Chinese porcelain connoisseurship using multimodal reasoning and fine-grai
- [IgPose: A Generative Data-Augmented Pipeline for Robust Immunoglobulin-Antigen Binding Prediction](https://sciencetostartup.com/paper/igpose-a-generative-data-augmented-pipeline-for-robust-immunoglobulin-antigen-binding-prediction) (8/10) — IgPose is a generative data-augmented framework for robust immunoglobulin-antigen binding prediction, enhancing antibody
- [ProCap: Projection-Aware Captioning for Spatial Augmented Reality](https://sciencetostartup.com/paper/procap-projection-aware-captioning-for-spatial-augmented-reality) (8/10) — A framework for spatial augmented reality that separates projected content from physical scenes, enabling intelligent in
- [In-Context Autonomous Network Incident Response: An End-to-End Large Language Model Agent Approach](https://sciencetostartup.com/paper/in-context-autonomous-network-incident-response-an-end-to-end-large-language-model-agent-approach) (8/10) — An end-to-end LLM agent for faster and smarter autonomous network incident response.
- [MarkushGrapher-2: End-to-end Multimodal Recognition of Chemical Structures](https://sciencetostartup.com/paper/markushgrapher-2-end-to-end-multimodal-recognition-of-chemical-structures) (8/10) — MarkushGrapher-2 enables automated extraction of complex chemical structures from patents using multimodal recognition, 
- [Inducing Epistemological Humility in Large Language Models: A Targeted SFT Approach to Reducing Hallucination](https://sciencetostartup.com/paper/inducing-epistemological-humility-in-large-language-models-a-targeted-sft-approach-to-reducing-hallucination) (8/10) — A targeted fine-tuning approach to reduce hallucinations in large language models by teaching epistemological humility.
- [Knowledge Restoration-driven Prompt Optimization: Unlocking LLM Potential for Open-Domain Relational Triplet Extraction](https://sciencetostartup.com/paper/knowledge-restoration-driven-prompt-optimization-unlocking-llm-potential-for-open-domain-relational-triplet-extraction) (8/10) — Optimizing prompts for LLMs to enhance open-domain relational triplet extraction.
- [Infusion: Shaping Model Behavior by Editing Training Data via Influence Functions](https://sciencetostartup.com/paper/infusion-shaping-model-behavior-by-editing-training-data-via-influence-functions) (8/10) — Infusion leverages influence functions to craft subtle training data perturbations that reshape AI model behavior withou
- [InstantHDR: Single-forward Gaussian Splatting for High Dynamic Range 3D Reconstruction](https://sciencetostartup.com/paper/instanthdr-single-forward-gaussian-splatting-for-high-dynamic-range-3d-reconstruction) (8/10) — InstantHDR offers a fast, feed-forward solution for reconstructing high dynamic range 3D scenes from low dynamic range i
- [IntelliSA: An Intelligent Static Analyzer for IaC Security Smell Detection Using Symbolic Rules and Neural Inference](https://sciencetostartup.com/paper/intellisa-an-intelligent-static-analyzer-for-iac-security-smell-detection-using-symbolic-rules-and-neural-inference) (8/10) — IntelliSA is an efficient static analyzer that detects security smells in Infrastructure as Code with high accuracy and 
- [Miner:Mining Intrinsic Mastery for Data-Efficient RL in Large Reasoning Models](https://sciencetostartup.com/paper/miner-mining-intrinsic-mastery-for-data-efficient-rl-in-large-reasoning-models) (8/10) — Our RL solution, Miner, leverages intrinsic uncertainty for data-efficient training in large reasoning models, significa
- [IntroSVG: Learning from Rendering Feedback for Text-to-SVG Generation via an Introspective Generator-Critic Framework](https://sciencetostartup.com/paper/introsvg-learning-from-rendering-feedback-for-text-to-svg-generation-via-an-introspective-generator-critic-framework) (8/10) — IntroSVG enhances text-to-SVG generation by integrating visual feedback into an introspective generator-critic framework
- [Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute](https://sciencetostartup.com/paper/scaling-atomistic-protein-binder-design-with-generative-pretraining-and-test-time-compute) (8/10) — A novel generative AI method for atomistic protein binder design that unifies generative modeling and sequence optimizat
- [PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference](https://sciencetostartup.com/paper/packforcing-short-video-training-suffices-for-long-video-sampling-and-long-context-inference) (8/10) — PackForcing: Efficient long-video generation using short-video training with reduced memory footprint and improved tempo
- [Event6D: Event-based Novel Object 6D Pose Tracking](https://sciencetostartup.com/paper/event6d-event-based-novel-object-6d-pose-tracking) (8/10) — A real-time 6D object pose tracking system for dynamic scenes using event cameras, capable of tracking novel objects wit
- [KA2L: A Knowledge-Aware Active Learning Framework for LLMs](https://sciencetostartup.com/paper/ka2l-a-knowledge-aware-active-learning-framework-for-llms) (8/10) — KA2L is a framework that enhances LLMs' performance through targeted active learning by identifying knowledge gaps.
- [Just-In-Time Reinforcement Learning: Continual Learning in LLM Agents Without Gradient Updates](https://sciencetostartup.com/paper/just-in-time-reinforcement-learning-continual-learning-in-llm-agents-without-gradient-updates) (8/10) — JitRL offers cost-effective continual learning for LLM agents by optimizing policies without gradient updates, drastical
- [Just-in-Time: Training-Free Spatial Acceleration for Diffusion Transformers](https://sciencetostartup.com/paper/just-in-time-training-free-spatial-acceleration-for-diffusion-transformers) (8/10) — JiT is a training-free framework that accelerates Diffusion Transformers by optimizing spatial computations for faster i
- [Automated Generation of Cybersecurity Exercise Scenarios](https://sciencetostartup.com/paper/automated-generation-of-cybersecurity-exercise-scenarios) (8/10) — An automated system for generating diverse cybersecurity exercise scenarios, complete with a simulation environment and 
- [KLong: Training LLM Agent for Extremely Long-horizon Tasks](https://sciencetostartup.com/paper/klong-training-llm-agent-for-extremely-long-horizon-tasks) (8/10) — KLong offers a high-performance LLM agent designed for tackling extremely long-horizon tasks in AI research and developm
- [KnowDiffuser: A Knowledge-Guided Diffusion Planner with LM Reasoning and Prior-Informed Trajectory Initialization](https://sciencetostartup.com/paper/knowdiffuser-a-knowledge-guided-diffusion-planner-with-lm-reasoning-and-prior-informed-trajectory-initialization) (8/10) — KnowDiffuser integrates language models and diffusion models for advanced motion planning in autonomous driving.
- [Knowing without Acting: The Disentangled Geometry of Safety Mechanisms in Large Language Models](https://sciencetostartup.com/paper/knowing-without-acting-the-disentangled-geometry-of-safety-mechanisms-in-large-language-models) (8/10) — Surgical attacks on LLM safety mechanisms enable novel jailbreaking and reveal architectural vulnerabilities, paving the
- [Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization](https://sciencetostartup.com/paper/kernel-smith-a-unified-recipe-for-evolutionary-kernel-optimization) (8/10) — Kernel-Smith optimizes GPU kernels for enhanced performance using an evolutionary approach, surpassing state-of-the-art 
- [LAP: A Language-Aware Planning Model For Procedure Planning In Instructional Videos](https://sciencetostartup.com/paper/lap-a-language-aware-planning-model-for-procedure-planning-in-instructional-videos) (8/10) — LAP is a language-aware planning model that enhances procedure planning in instructional videos by leveraging language d
- [AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents](https://sciencetostartup.com/paper/androtmem-from-interaction-trajectories-to-anchored-memory-in-long-horizon-gui-agents) (8/10) — A diagnostic framework and memory mechanism for long-horizon GUI agents that significantly improves task completion rate
- [LatentRefusal: Latent-Signal Refusal for Unanswerable Text-to-SQL Queries](https://sciencetostartup.com/paper/latentrefusal-latent-signal-refusal-for-unanswerable-text-to-sql-queries) (8/10) — LatentRefusal ensures safe deployment of text-to-SQL systems by preemptively refusing unanswerable queries using interna
- [Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills](https://sciencetostartup.com/paper/trace2skill-distill-trajectory-local-lessons-into-transferable-agent-skills) (8/10) — Automatically distill transferable agent skills from execution experience, enabling LLM agents to tackle complex tasks w
- [Learning Flexible Job Shop Scheduling under Limited Buffers and Material Kitting Constraints](https://sciencetostartup.com/paper/learning-flexible-job-shop-scheduling-under-limited-buffers-and-material-kitting-constraints) (8/10) — AI-driven scheduling optimization for manufacturing with cutting-edge constraint handling.
- [$AutoDrive\text{-}P^3$: Unified Chain of Perception-Prediction-Planning Thought via Reinforcement Fine-Tuning](https://sciencetostartup.com/paper/autodrive-text-p-3-unified-chain-of-perception-prediction-planning-thought-via-reinforcement-fine-tuning) (8/10) — A unified framework for autonomous driving that integrates perception, prediction, and planning through chain-of-thought
- [Learning Latency-Aware Orchestration for Parallel Multi-Agent Systems](https://sciencetostartup.com/paper/learning-latency-aware-orchestration-for-parallel-multi-agent-systems) (8/10) — Optimize multi-agent system execution for reduced latency in parallel environments.
- [AdaptToken: Entropy-based Adaptive Token Selection for MLLM Long Video Understanding](https://sciencetostartup.com/paper/adapttoken-entropy-based-adaptive-token-selection-for-mllm-long-video-understanding) (8/10) — AdaptToken: Efficient token selection for MLLMs to enhance long video understanding by leveraging entropy for global con
- [Learning to Present: Inverse Specification Rewards for Agentic Slide Generation](https://sciencetostartup.com/paper/learning-to-present-inverse-specification-rewards-for-agentic-slide-generation) (8/10) — Automate professional slide deck creation using LLMs and a novel inverse specification reward system.
- [BinaryAttention: One-Bit QK-Attention for Vision and Diffusion Transformers](https://sciencetostartup.com/paper/binaryattention-one-bit-qk-attention-for-vision-and-diffusion-transformers) (8/10) — BinaryAttention offers a highly efficient binary quantization method for Transformers, doubling speed for vision tasks w
- [Learning Transferable Temporal Primitives for Video Reasoning via Synthetic Videos](https://sciencetostartup.com/paper/learning-transferable-temporal-primitives-for-video-reasoning-via-synthetic-videos) (8/10) — SynRL is a post-training framework that enhances video understanding by teaching models fundamental temporal primitives 
- [LibScan: Smart Contract Library Misuse Detection with Iterative Feedback and Static Verification](https://sciencetostartup.com/paper/libscan-smart-contract-library-misuse-detection-with-iterative-feedback-and-static-verification) (8/10) — LibScan is an automated framework that detects smart contract library misuse by combining LLM reasoning with static code
- [Live or Lie: Action-Aware Capsule Multiple Instance Learning for Risk Assessment in Live Streaming Platforms](https://sciencetostartup.com/paper/live-or-lie-action-aware-capsule-multiple-instance-learning-for-risk-assessment-in-live-streaming-platforms) (8/10) — A platform using advanced AI techniques to assess and mitigate risks in live streaming environments.
- [Adaptive Learned Image Compression with Graph Neural Networks](https://sciencetostartup.com/paper/adaptive-learned-image-compression-with-graph-neural-networks) (8/10) — A novel graph neural network approach for adaptive image compression that significantly outperforms state-of-the-art met
- [Separate Before You Compress: The WWHO Tokenization Architecture](https://sciencetostartup.com/paper/separate-before-you-compress-the-wwho-tokenization-architecture) (8/10) — A novel tokenization architecture and algorithm that significantly reduces token count and inference costs for complex s
- [Ruka-v2: Tendon Driven Open-Source Dexterous Hand with Wrist and Abduction for Robot Learning](https://sciencetostartup.com/paper/ruka-v2-tendon-driven-open-source-dexterous-hand-with-wrist-and-abduction-for-robot-learning) (8/10) — Ruka-v2 is an affordable open-source humanoid robotic hand with advanced dexterity and wrist mobility enabling efficient
- [LLM Augmented Intervenable Multimodal Adaptor for Post-operative Complication Prediction in Lung Cancer Surgery](https://sciencetostartup.com/paper/llm-augmented-intervenable-multimodal-adaptor-for-post-operative-complication-prediction-in-lung-cancer-surgery) (8/10) — MIRACLE predicts postoperative complications in lung cancer surgeries using multimodal data and LLM explanations for act
- [Thinking with Tables: Enhancing Multi-Modal Tabular Understanding via Neuro-Symbolic Reasoning](https://sciencetostartup.com/paper/thinking-with-tables-enhancing-multi-modal-tabular-understanding-via-neuro-symbolic-reasoning) (8/10) — A neuro-symbolic reasoning system that significantly enhances multi-modal understanding of tabular data, outperforming e
- [LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning](https://sciencetostartup.com/paper/llm-autodp-automatic-data-processing-via-llm-agents-for-model-fine-tuning) (8/10) — Automate data processing for LLM fine-tuning with minimal human intervention, enhancing model performance and efficiency
- [Ghost-FWL: A Large-Scale Full-Waveform LiDAR Dataset for Ghost Detection and Removal](https://sciencetostartup.com/paper/ghost-fwl-a-large-scale-full-waveform-lidar-dataset-for-ghost-detection-and-removal) (8/10) — A new large-scale dataset and self-supervised learning method for accurate ghost point removal in full-waveform LiDAR, s
- [SOMA: Strategic Orchestration and Memory-Augmented System for Vision-Language-Action Model Robustness via In-Context Adaptation](https://sciencetostartup.com/paper/soma-strategic-orchestration-and-memory-augmented-system-for-vision-language-action-model-robustness-via-in-context-adap) (8/10) — SOMA enhances existing robotic vision-language-action models for robust performance in challenging, out-of-distribution 
- [Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models](https://sciencetostartup.com/paper/loc3r-vlm-language-based-localization-and-3d-reasoning-with-vision-language-models) (8/10) — Loc3R-VLM enhances Vision-Language Models with advanced 3D understanding for improved spatial reasoning.
- [Local-Global Prompt Learning via Sparse Optimal Transport](https://sciencetostartup.com/paper/local-global-prompt-learning-via-sparse-optimal-transport) (8/10) — Improve few-shot classification and OOD detection by learning shared global prompts and class-specific local prompts wit
- [LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation](https://sciencetostartup.com/paper/lookaheadkv-fast-and-accurate-kv-cache-eviction-by-glimpsing-into-the-future-without-generation) (8/10) — LOOKAHEADKV enables efficient and optimized key-value cache eviction for transformer models without high latency.
- [M$^2$-Miner: Multi-Agent Enhanced MCTS for Mobile GUI Agent Data Mining](https://sciencetostartup.com/paper/m-2-miner-multi-agent-enhanced-mcts-for-mobile-gui-agent-data-mining) (8/10) — Automate high-quality GUI agent data mining with a multi-agent MCTS framework for improved mobile interface interaction.
- [$M^2$-Occ: Resilient 3D Semantic Occupancy Prediction for Autonomous Driving with Incomplete Camera Inputs](https://sciencetostartup.com/paper/m-2-occ-resilient-3d-semantic-occupancy-prediction-for-autonomous-driving-with-incomplete-camera-inputs) (8/10) — M^2-Occ enhances 3D semantic occupancy prediction for autonomous driving by effectively handling incomplete camera input
- [LLaVA-LE: Large Language-and-Vision Assistant for Lunar Exploration](https://sciencetostartup.com/paper/llava-le-large-language-and-vision-assistant-for-lunar-exploration) (8/10) — A specialized vision-language model and dataset for lunar exploration, enabling detailed terrain characterization and an
- [PanoAir: A Panoramic Visual-Inertial SLAM with Cross-Time Real-World UAV Dataset](https://sciencetostartup.com/paper/panoair-a-panoramic-visual-inertial-slam-with-cross-time-real-world-uav-dataset) (8/10) — A panoramic Visual-Inertial SLAM framework with a novel real-world UAV dataset, offering superior accuracy and robustnes
- [Machines acquire scientific taste from institutional traces](https://sciencetostartup.com/paper/machines-acquire-scientific-taste-from-institutional-traces) (8/10) — A fine-tuned language model that automates the evaluation of research pitches, enhancing decision-making in scientific p
- [Making LLMs Optimize Multi-Scenario CUDA Kernels Like Experts](https://sciencetostartup.com/paper/making-llms-optimize-multi-scenario-cuda-kernels-like-experts) (8/10) — CUDAMaster automates GPU kernel optimization across diverse scenarios, rivaling hand-tuned libraries and offering a demo
- [COIN: Collaborative Interaction-Aware Multi-Agent Reinforcement Learning for Self-Driving Systems](https://sciencetostartup.com/paper/coin-collaborative-interaction-aware-multi-agent-reinforcement-learning-for-self-driving-systems) (8/10) — A novel MARL framework for self-driving systems that significantly improves safety and efficiency in dense urban traffic
- [OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models](https://sciencetostartup.com/paper/omnivoice-towards-omnilingual-zero-shot-text-to-speech-with-diffusion-language-models) (8/10) — OmniVoice is a zero-shot text-to-speech model supporting over 600 languages, achieved through a novel diffusion language
- [Martingale Foresight Sampling: A Principled Approach to Inference-Time LLM Decoding](https://sciencetostartup.com/paper/martingale-foresight-sampling-a-principled-approach-to-inference-time-llm-decoding) (8/10) — Martingale Foresight Sampling optimizes language model decoding with principled probability theory, improving accuracy a
- [Mask Is What DLLM Needs: A Masked Data Training Paradigm for Diffusion LLMs](https://sciencetostartup.com/paper/mask-is-what-dllm-needs-a-masked-data-training-paradigm-for-diffusion-llms) (8/10) — A novel masked data training paradigm that enhances reasoning in diffusion language models through information density-d
- [MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching](https://sciencetostartup.com/paper/matchtir-fine-grained-supervision-for-tool-integrated-reasoning-via-bipartite-matching) (8/10) — A fine-grained supervision framework for improving tool-integrated reasoning in large language models, outperforming lar
- [MedKCO: Medical Vision-Language Pretraining via Knowledge-Driven Cognitive Orchestration](https://sciencetostartup.com/paper/medkco-medical-vision-language-pretraining-via-knowledge-driven-cognitive-orchestration) (8/10) — MedKCO enhances medical vision-language models through a knowledge-driven approach for improved feature representation.
- [SurgPhase: Time efficient pituitary tumor surgery phase recognition via an interactive web platform](https://sciencetostartup.com/paper/surgphase-time-efficient-pituitary-tumor-surgery-phase-recognition-via-an-interactive-web-platform) (8/10) — An AI-powered web platform that automatically recognizes surgical phases in pituitary tumor surgeries, enabling data-dri
- [Lookalike3D: Seeing Double in 3D](https://sciencetostartup.com/paper/lookalike3d-seeing-double-in-3d) (8/10) — A novel 3D object understanding system that leverages repeated objects to improve reconstruction and perception quality,
- [S0 Tuning: Zero-Overhead Adaptation of Hybrid Recurrent-Attention Models](https://sciencetostartup.com/paper/s0-tuning-zero-overhead-adaptation-of-hybrid-recurrent-attention-models) (8/10) — S0 Tuning offers zero-inference overhead adaptation for hybrid recurrent-attention LLMs by tuning a single initial state
- [MedVL-SAM2: A unified 3D medical vision-language model for multimodal reasoning and prompt-driven segmentation](https://sciencetostartup.com/paper/medvl-sam2-a-unified-3d-medical-vision-language-model-for-multimodal-reasoning-and-prompt-driven-segmentation) (8/10) — A unified 3D medical vision-language model for advanced multimodal reasoning and precise 3D segmentation.
- [MeMix: Writing Less, Remembering More for Streaming 3D Reconstruction](https://sciencetostartup.com/paper/memix-writing-less-remembering-more-for-streaming-3d-reconstruction) (8/10) — MeMix is a plug-and-play module that enhances streaming 3D reconstruction by mitigating catastrophic forgetting without 
- [Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory](https://sciencetostartup.com/paper/memory-v2v-augmenting-video-to-video-diffusion-models-with-memory) (8/10) — Introducing Memory-V2V, a video editing tool that enhances consistency in multi-turn edits through memory-augmented diff
- [AVControl: Efficient Framework for Training Audio-Visual Controls](https://sciencetostartup.com/paper/avcontrol-efficient-framework-for-training-audio-visual-controls) (8/10) — A lightweight, extendable framework for efficient audio-visual control in video generation, enabling modular training of
- [Meta Context Engineering via Agentic Skill Evolution](https://sciencetostartup.com/paper/meta-context-engineering-via-agentic-skill-evolution) (8/10) — Meta Context Engineering optimizes large language model outputs through bi-level skill evolution.
- [EntropyCache: Decoded Token Entropy Guided KV Caching for Diffusion Language Models](https://sciencetostartup.com/paper/entropycache-decoded-token-entropy-guided-kv-caching-for-diffusion-language-models) (8/10) — EntropyCache offers a training-free KV caching method for diffusion language models that significantly speeds up inferen
- [EgoSim: Egocentric World Simulator for Embodied Interaction Generation](https://sciencetostartup.com/paper/egosim-egocentric-world-simulator-for-embodied-interaction-generation) (8/10) — EgoSim is a closed-loop egocentric world simulator that generates spatially consistent interaction videos and updates 3D
- [QuadFM: Foundational Text-Driven Quadruped Motion Dataset for Generation and Control](https://sciencetostartup.com/paper/quadfm-foundational-text-driven-quadruped-motion-dataset-for-generation-and-control) (8/10) — A foundational dataset and unified framework for text-driven quadruped motion generation and control, enabling real-time
- [MindfulAgents: Personalizing Mindfulness Meditation via an Expert-Aligned Multi-Agent System](https://sciencetostartup.com/paper/mindfulagents-personalizing-mindfulness-meditation-via-an-expert-aligned-multi-agent-system) (8/10) — MindfulAgents is a personalized mindfulness meditation app using LLMs to improve user engagement and mental well-being.
- [Accurate Point Measurement in 3DGS -- A New Alternative to Traditional Stereoscopic-View Based Measurements](https://sciencetostartup.com/paper/accurate-point-measurement-in-3dgs-a-new-alternative-to-traditional-stereoscopic-view-based-measurements) (8/10) — A web-based tool leveraging 3D Gaussian Splatting for highly accurate and accessible 3D point measurements, outperformin
- [MipSLAM: Alias-Free Gaussian Splatting SLAM](https://sciencetostartup.com/paper/mipslam-alias-free-gaussian-splatting-slam) (8/10) — MipSLAM delivers high-fidelity, anti-aliased novel view synthesis and robust pose estimation, offering a superior SLAM s
- [Mixture-of-Depths Attention](https://sciencetostartup.com/paper/mixture-of-depths-attention) (8/10) — Mixture-of-depths attention enhances large language models by improving feature recovery in deeper layers while maintain
- [OmniMem: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory](https://sciencetostartup.com/paper/omnimem-autoresearch-guided-discovery-of-lifelong-multimodal-agent-memory) (8/10) — OmniMem is an autonomous multimodal memory system enhancing AI agents' lifelong memory with a 23-stage autoresearch pipe
- [MO-Playground: Massively Parallelized Multi-Objective Reinforcement Learning for Robotics](https://sciencetostartup.com/paper/mo-playground-massively-parallelized-multi-objective-reinforcement-learning-for-robotics) (8/10) — MO-Playground accelerates multi-objective reinforcement learning for robotics with a GPU-native algorithm and a user-fri
- [Modeling Endogenous Logic: Causal Neuro-Symbolic Reasoning Model for Explainable Multi-Behavior Recommendation](https://sciencetostartup.com/paper/modeling-endogenous-logic-causal-neuro-symbolic-reasoning-model-for-explainable-multi-behavior-recommendation) (8/10) — Neuro-Symbolic Reasoning Model providing explainable recommendations by operationalizing endogenous logic through causal
- [Molecular Identifier Visual Prompt and Verifiable Reinforcement Learning for Chemical Reaction Diagram Parsing](https://sciencetostartup.com/paper/molecular-identifier-visual-prompt-and-verifiable-reinforcement-learning-for-chemical-reaction-diagram-parsing) (8/10) — A novel approach to enhance chemical reaction diagram parsing using visual prompts and reinforcement learning.
- [MoLoRA: Composable Specialization via Per-Token Adapter Routing](https://sciencetostartup.com/paper/molora-composable-specialization-via-per-token-adapter-routing) (8/10) — MoLoRA enables efficient per-token routing for multimodal and mixed-capability tasks, enhancing model specialization wit
- [MoMaStage: Skill-State Graph Guided Planning and Closed-Loop Execution for Long-Horizon Indoor Mobile Manipulation](https://sciencetostartup.com/paper/momastage-skill-state-graph-guided-planning-and-closed-loop-execution-for-long-horizon-indoor-mobile-manipulation) (8/10) — MoMaStage is a structured vision-language framework for long-horizon mobile manipulation that uses a skill-state graph t
- [MoST: Mixing Speech and Text with Modality-Aware Mixture of Experts](https://sciencetostartup.com/paper/most-mixing-speech-and-text-with-modality-aware-mixture-of-experts) (8/10) — MoST integrates speech and text processing into an efficient open-source modality-aware language model, outpacing existi
- [MotionBits: Video Segmentation through Motion-Level Analysis of Rigid Bodies](https://sciencetostartup.com/paper/motionbits-video-segmentation-through-motion-level-analysis-of-rigid-bodies) (8/10) — MotionBits provides a novel, learning-free approach to video segmentation for robotic manipulation by identifying moving
- [A global dataset of continuous urban dashcam driving](https://sciencetostartup.com/paper/a-global-dataset-of-continuous-urban-dashcam-driving) (8/10) — CROWD is a large-scale, manually curated urban dashcam dataset with temporal continuity and diverse global coverage, des
- [MultiVis-Agent: A Multi-Agent Framework with Logic Rules for Reliable and Comprehensive Cross-Modal Data Visualization](https://sciencetostartup.com/paper/multivis-agent-a-multi-agent-framework-with-logic-rules-for-reliable-and-comprehensive-cross-modal-data-visualization) (8/10) — A reliable multi-agent framework for cross-modal data visualization outperforming existing solutions.
- [MWM: Mobile World Models for Action-Conditioned Consistent Prediction](https://sciencetostartup.com/paper/mwm-mobile-world-models-for-action-conditioned-consistent-prediction) (8/10) — MWM is a mobile world model that improves action-conditioned rollout consistency for planning-based image-goal navigatio
- [NanoVDR: Distilling a 2B Vision-Language Retriever into a 70M Text-Only Encoder for Visual Document Retrieval](https://sciencetostartup.com/paper/nanovdr-distilling-a-2b-vision-language-retriever-into-a-70m-text-only-encoder-for-visual-document-retrieval) (8/10) — NanoVDR distills a large vision-language model into a lightweight text-only encoder for efficient visual document retrie
- [Not All Candidates are Created Equal: A Heterogeneity-Aware Approach to Pre-ranking in Recommender Systems](https://sciencetostartup.com/paper/not-all-candidates-are-created-equal-a-heterogeneity-aware-approach-to-pre-ranking-in-recommender-systems) (8/10) — Develop a heterogeneity-aware pre-ranking system for recommender systems to enhance efficiency and accuracy without addi
- [NovaPlan: Zero-Shot Long-Horizon Manipulation via Closed-Loop Video Language Planning](https://sciencetostartup.com/paper/novaplan-zero-shot-long-horizon-manipulation-via-closed-loop-video-language-planning) (8/10) — NovaPlan enables robots to perform zero-shot, long-horizon manipulations using video language planning, achieving state-
- [YieldSAT: A Multimodal Benchmark Dataset for High-Resolution Crop Yield Prediction](https://sciencetostartup.com/paper/yieldsat-a-multimodal-benchmark-dataset-for-high-resolution-crop-yield-prediction) (8/10) — YieldSAT provides high-resolution crop yield predictions using a multimodal dataset to improve agricultural productivity
- [NOVA: Next-step Open-Vocabulary Autoregression for 3D Multi-Object Tracking in Autonomous Driving](https://sciencetostartup.com/paper/nova-next-step-open-vocabulary-autoregression-for-3d-multi-object-tracking-in-autonomous-driving) (8/10) — NOVA leverages LLMs for 3D multi-object tracking, achieving significant performance gains in novel categories, making it
- [Online Reasoning Calibration: Test-Time Training Enables Generalizable Conformal LLM Reasoning](https://sciencetostartup.com/paper/online-reasoning-calibration-test-time-training-enables-generalizable-conformal-llm-reasoning) (8/10) — ORCA calibrates LLM sampling at test-time using conformal prediction and meta-learning, providing valid confidence estim
- [LaMP: Learning Vision-Language-Action Policies with 3D Scene Flow as Latent Motion Prior](https://sciencetostartup.com/paper/lamp-learning-vision-language-action-policies-with-3d-scene-flow-as-latent-motion-prior) (8/10) — LaMP offers a cutting-edge robotic manipulation framework leveraging 3D scene flow for enhanced vision-language-action a
- [Agent Control Protocol: Admission Control for Agent Actions](https://sciencetostartup.com/paper/agent-control-protocol-admission-control-for-agent-actions) (8/10) — A formal protocol for secure and auditable admission control of autonomous agents in enterprise environments.
- [Click-to-Ask: An AI Live Streaming Assistant with Offline Copywriting and Online Interactive QA](https://sciencetostartup.com/paper/click-to-ask-an-ai-live-streaming-assistant-with-offline-copywriting-and-online-interactive-qa) (8/10) — An AI assistant that automates product copywriting and provides real-time Q&A for live streamers, boosting sales and eng
- [OmniForcing: Unleashing Real-time Joint Audio-Visual Generation](https://sciencetostartup.com/paper/omniforcing-unleashing-real-time-joint-audio-visual-generation) (8/10) — OmniForcing is a real-time joint audio-visual generation framework that achieves state-of-the-art performance with low l
- [Omnilingual MT: Machine Translation for 1,600 Languages](https://sciencetostartup.com/paper/omnilingual-mt-machine-translation-for-1-600-languages) (8/10) — Omnilingual MT offers high-quality machine translation for over 1,600 languages, significantly expanding multilingual ca
- [One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment](https://sciencetostartup.com/paper/one-adapts-to-any-meta-reward-modeling-for-personalized-llm-alignment) (8/10) — Meta Reward Modeling enables personalized alignment of LLMs to individual user preferences through meta-learning.
- [One-Eval: An Agentic System for Automated and Traceable LLM Evaluation](https://sciencetostartup.com/paper/one-eval-an-agentic-system-for-automated-and-traceable-llm-evaluation) (8/10) — One-Eval automates and streamlines the evaluation of large language models through customizable workflows based on natur
- [JAMMEval: A Refined Collection of Japanese Benchmarks for Reliable VLM Evaluation](https://sciencetostartup.com/paper/jammeval-a-refined-collection-of-japanese-benchmarks-for-reliable-vlm-evaluation) (8/10) — A refined collection of Japanese benchmarks for reliable vision-language model evaluation, addressing issues in existing
- [Memento-Skills: Let Agents Design Agents](https://sciencetostartup.com/paper/memento-skills-let-agents-design-agents) (8/10) — An agent that autonomously designs, adapts, and improves task-specific agents using a memory-based reinforcement learnin
- [ORACAL: A Robust and Explainable Multimodal Framework for Smart Contract Vulnerability Detection with Causal Graph Enrichment](https://sciencetostartup.com/paper/oracal-a-robust-and-explainable-multimodal-framework-for-smart-contract-vulnerability-detection-with-causal-graph-enrich) (8/10) — ORACAL is a multimodal graph learning framework that uses RAG and LLMs to detect smart contract vulnerabilities with exp
- [OnlineHMR: Video-based Online World-Grounded Human Mesh Recovery](https://sciencetostartup.com/paper/onlinehmr-video-based-online-world-grounded-human-mesh-recovery) (8/10) — OnlineHMR enables real-time 3D human mesh recovery from monocular videos for interactive applications like AR/VR.
- [Open-World Motion Forecasting](https://sciencetostartup.com/paper/open-world-motion-forecasting) (8/10) — A novel framework for open-world motion forecasting that enables autonomous vehicles to adapt to new object classes in r
- [OpenACMv2: An Accuracy-Constrained Co-Optimization Framework for Approximate DCiM](https://sciencetostartup.com/paper/openacmv2-an-accuracy-constrained-co-optimization-framework-for-approximate-dcim) (8/10) — OpenACMv2 is an open framework for optimizing power-performance-area in approximate DCiM with accuracy constraints.
- [PicoSAM3: Real-Time In-Sensor Region-of-Interest Segmentation](https://sciencetostartup.com/paper/picosam3-real-time-in-sensor-region-of-interest-segmentation) (8/10) — PicoSAM3 is a lightweight, real-time visual segmentation model optimized for edge devices, enabling efficient on-device 
- [Optimizing Mission Planning for Multi-Debris Rendezvous Using Reinforcement Learning with Refueling and Adaptive Collision Avoidance](https://sciencetostartup.com/paper/optimizing-mission-planning-for-multi-debris-rendezvous-using-reinforcement-learning-with-refueling-and-adaptive-collisi) (8/10) — Develop a reinforcement learning platform for efficient and safe multi-debris removal using small satellites.
- [Orcheo: A Modular Full-Stack Platform for Conversational Search](https://sciencetostartup.com/paper/orcheo-a-modular-full-stack-platform-for-conversational-search) (8/10) — Orcheo is an open-source, full-stack platform enabling quick development and deployment of conversational search applica
- [Orchestrating Intelligence: Confidence-Aware Routing for Efficient Multi-Agent Collaboration across Multi-Scale Models](https://sciencetostartup.com/paper/orchestrating-intelligence-confidence-aware-routing-for-efficient-multi-agent-collaboration-across-multi-scale-models) (8/10) — Revolutionizing multi-agent systems with adaptive model selection for efficient and cost-effective AI collaboration.
- [OSCAR: Occupancy-based Shape Completion via Acoustic Neural Implicit Representations](https://sciencetostartup.com/paper/oscar-occupancy-based-shape-completion-via-acoustic-neural-implicit-representations) (8/10) — Reconstruct complete 3D anatomical geometry from partial ultrasound observations using acoustic neural implicit represen
- [AU Codes, Language, and Synthesis: Translating Anatomy to Text for Facial Behavior Synthesis](https://sciencetostartup.com/paper/au-codes-language-and-synthesis-translating-anatomy-to-text-for-facial-behavior-synthesis) (8/10) — Synthesize anatomically plausible and behaviorally rich facial expressions from natural language descriptions of Action 
- [When Names Change Verdicts: Intervention Consistency Reveals Systematic Bias in LLM Decision-Making](https://sciencetostartup.com/paper/when-names-change-verdicts-intervention-consistency-reveals-systematic-bias-in-llm-decision-making) (8/10) — ICE-Guard provides a framework to detect and mitigate systematic bias in LLMs across high-stakes decisions, with code an
- [OSS-CRS: Liberating AIxCC Cyber Reasoning Systems for Real-World Open-Source Security](https://sciencetostartup.com/paper/oss-crs-liberating-aixcc-cyber-reasoning-systems-for-real-world-open-source-security) (8/10) — OSS-CRS is an open, locally deployable framework for running and combining AI-based cyber reasoning techniques against r
- [Think, Act, Build: An Agentic Framework with Vision Language Models for Zero-Shot 3D Visual Grounding](https://sciencetostartup.com/paper/think-act-build-an-agentic-framework-with-vision-language-models-for-zero-shot-3d-visual-grounding) (8/10) — An agentic framework using Vision Language Models to perform zero-shot 3D visual grounding by dynamically reconstructing
- [Recolour What Matters: Region-Aware Colour Editing via Token-Level Diffusion](https://sciencetostartup.com/paper/recolour-what-matters-region-aware-colour-editing-via-token-level-diffusion) (8/10) — A unified diffusion framework for precise, region-aware image color editing using token-level fusion and a novel dataset
- [LiPS: Lightweight Panoptic Segmentation for Resource-Constrained Robotics](https://sciencetostartup.com/paper/lips-lightweight-panoptic-segmentation-for-resource-constrained-robotics) (8/10) — LiPS: A lightweight panoptic segmentation model for resource-constrained robotics, offering comparable accuracy with sig
- [SEAR: Simple and Efficient Adaptation of Visual Geometric Transformers for RGB+Thermal 3D Reconstruction](https://sciencetostartup.com/paper/sear-simple-and-efficient-adaptation-of-visual-geometric-transformers-for-rgb-thermal-3d-reconstruction) (8/10) — A simple fine-tuning strategy that adapts existing visual geometry models for accurate RGB-thermal 3D reconstruction, ou
- [Paper Title: LoV3D: Grounding Cognitive Prognosis Reasoning in Longitudinal 3D Brain MRI via Regional Volume Assessments](https://sciencetostartup.com/paper/paper-title-lov3d-grounding-cognitive-prognosis-reasoning-in-longitudinal-3d-brain-mri-via-regional-volume-assessments) (8/10) — Revolutionize dementia diagnosis with LoV3D, a verifiable AI model for interpreting longitudinal 3D brain MRI.
- [Paper2Rebuttal: A Multi-Agent Framework for Transparent Author Response Assistance](https://sciencetostartup.com/paper/paper2rebuttal-a-multi-agent-framework-for-transparent-author-response-assistance) (8/10) — RebuttalAgent assists researchers in crafting evidence-based responses to peer reviews, reducing cognitive load and impr
- [Parallelised Differentiable Straightest Geodesics for 3D Meshes](https://sciencetostartup.com/paper/parallelised-differentiable-straightest-geodesics-for-3d-meshes) (8/10) — A parallel GPU implementation for differentiable geodesics on 3D meshes, enhancing learning and optimization pipelines.
- [Parameter-Efficient Modality-Balanced Symmetric Fusion for Multimodal Remote Sensing Semantic Segmentation](https://sciencetostartup.com/paper/parameter-efficient-modality-balanced-symmetric-fusion-for-multimodal-remote-sensing-semantic-segmentation) (8/10) — MoBaNet is a parameter-efficient framework for multimodal remote sensing semantic segmentation that balances modality co
- [Parametric Social Identity Injection and Diversification in Public Opinion Simulation](https://sciencetostartup.com/paper/parametric-social-identity-injection-and-diversification-in-public-opinion-simulation) (8/10) — A framework for enhancing diversity in public opinion simulations using large language models.
- [PashtoCorp: A 1.25-Billion-Word Corpus, Evaluation Suite, and Reproducible Pipeline for Low-Resource Language Development](https://sciencetostartup.com/paper/pashtocorp-a-1-25-billion-word-corpus-evaluation-suite-and-reproducible-pipeline-for-low-resource-language-development) (8/10) — Build cutting-edge NLP models for Pashto using the largest available Pashto language corpus, PashtoCorp.
- [PCFEx: Point Cloud Feature Extraction for Graph Neural Networks](https://sciencetostartup.com/paper/pcfex-point-cloud-feature-extraction-for-graph-neural-networks) (8/10) — A novel GNN architecture for 3D point cloud processing, achieving state-of-the-art results in human pose estimation and 
- [PCodeTrans: Translate Decompiled Pseudocode to Compilable and Executable Equivalent](https://sciencetostartup.com/paper/pcodetrans-translate-decompiled-pseudocode-to-compilable-and-executable-equivalent) (8/10) — PCodeTrans is a feedback-driven framework that translates decompiled pseudocode into compilable and executable code with
- [PEAR: Pixel-aligned Expressive humAn mesh Recovery](https://sciencetostartup.com/paper/pear-pixel-aligned-expressive-human-mesh-recovery) (8/10) — PEAR offers real-time, pixel-level accurate 3D human mesh recovery for immersive applications using a ViT-based streamli
- [Perception-Aware Multimodal Spatial Reasoning from Monocular Images](https://sciencetostartup.com/paper/perception-aware-multimodal-spatial-reasoning-from-monocular-images) (8/10) — Enhance autonomous driving spatial reasoning by equipping VLMs with object-centric grounding using visual reference toke
- [PlotTwist: A Creative Plot Generation Framework with Small Language Models](https://sciencetostartup.com/paper/plottwist-a-creative-plot-generation-framework-with-small-language-models) (8/10) — PlotTwist is a framework that empowers small language models to generate high-quality plots efficiently.
- [Person Re-ID in 2025: Supervised, Self-Supervised, and Language-Aligned. What Works?](https://sciencetostartup.com/paper/person-re-id-in-2025-supervised-self-supervised-and-language-aligned-what-works) (8/10) — A novel AI-driven person re-identification system using language-aligned vision models for robust cross-domain performan
- [PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records](https://sciencetostartup.com/paper/personalalign-hierarchical-implicit-intent-alignment-for-personalized-gui-agent-with-long-term-user-centric-records) (8/10) — PersonalAlign transforms GUIs into proactive, personalized agents that align with user implicit intents.
- [Pointy - A Lightweight Transformer for Point Cloud Foundation Models](https://sciencetostartup.com/paper/pointy-a-lightweight-transformer-for-point-cloud-foundation-models) (8/10) — Pointy is a lightweight transformer architecture for point cloud data that outperforms larger models with fewer training
- [PISmith: Reinforcement Learning-based Red Teaming for Prompt Injection Defenses](https://sciencetostartup.com/paper/pismith-reinforcement-learning-based-red-teaming-for-prompt-injection-defenses) (8/10) — PISmith is a reinforcement learning framework that enhances prompt injection defenses for LLM applications by systematic
- [PlaneCycle: Training-Free 2D-to-3D Lifting of Foundation Models Without Adapters](https://sciencetostartup.com/paper/planecycle-training-free-2d-to-3d-lifting-of-foundation-models-without-adapters) (8/10) — Develop 3D-enabled AI models from existing 2D models without retraining, leveraging PlaneCycle's adapter-free technology
- [POCI-Diff: Position Objects Consistently and Interactively with 3D-Layout Guided Diffusion](https://sciencetostartup.com/paper/poci-diff-position-objects-consistently-and-interactively-with-3d-layout-guided-diffusion) (8/10) — POCI-Diff revolutionizes 3D content creation by enabling high-fidelity, interactive text-to-image generation with precis
- [BoxMind: Closed-loop AI strategy optimization for elite boxing validated in the 2024 Olympics](https://sciencetostartup.com/paper/boxmind-closed-loop-ai-strategy-optimization-for-elite-boxing-validated-in-the-2024-olympics) (8/10) — BoxMind uses AI to optimize boxing strategies, enhancing athlete performance with data-driven insights.
- [Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR](https://sciencetostartup.com/paper/polyglot-lion-efficient-multilingual-asr-for-singapore-via-balanced-fine-tuning-of-qwen3-asr) (8/10) — Polyglot-Lion offers efficient multilingual ASR tailored for Singapore's diverse languages at a fraction of the cost of 
- [POP: Prefill-Only Pruning for Efficient Large Model Inference](https://sciencetostartup.com/paper/pop-prefill-only-pruning-for-efficient-large-model-inference) (8/10) — POP offers a novel pruning method to make large language and vision-language models faster and cheaper to deploy without
- [Post-Training Local LLM Agents for Linux Privilege Escalation with Verifiable Rewards](https://sciencetostartup.com/paper/post-training-local-llm-agents-for-linux-privilege-escalation-with-verifiable-rewards) (8/10) — A local LLM agent for Linux privilege escalation that achieves high success rates with verifiable rewards.
- [Preference-Conditioned Reinforcement Learning for Space-Time Efficient Online 3D Bin Packing](https://sciencetostartup.com/paper/preference-conditioned-reinforcement-learning-for-space-time-efficient-online-3d-bin-packing) (8/10) — STEP optimizes robotic bin packing by using a preference-conditioned Transformer-based RL policy to balance space utiliz
- [Prompt-Driven Lightweight Foundation Model for Instance Segmentation-Based Fault Detection in Freight Trains](https://sciencetostartup.com/paper/prompt-driven-lightweight-foundation-model-for-instance-segmentation-based-fault-detection-in-freight-trains) (8/10) — Deployable fault detection system for freight trains using lightweight AI segmentation.
- [PromptDLA: A Domain-aware Prompt Document Layout Analysis Framework with Descriptive Knowledge as a Cue](https://sciencetostartup.com/paper/promptdla-a-domain-aware-prompt-document-layout-analysis-framework-with-descriptive-knowledge-as-a-cue) (8/10) — PromptDLA enhances document layout analysis by integrating domain-specific cues for improved model performance.
- [Prosody-Guided Harmonic Attention for Phase-Coherent Neural Vocoding in the Complex Spectrum](https://sciencetostartup.com/paper/prosody-guided-harmonic-attention-for-phase-coherent-neural-vocoding-in-the-complex-spectrum) (8/10) — Leverage improved prosody-guided neural vocoding for superior speech synthesis applications.
- [Protein Counterfactuals via Diffusion-Guided Latent Optimization](https://sciencetostartup.com/paper/protein-counterfactuals-via-diffusion-guided-latent-optimization) (8/10) — MCCOP enables precise protein engineering by generating biologically plausible sequence edits to optimize protein proper
- [PSTNet: Physically-Structured Turbulence Network](https://sciencetostartup.com/paper/pstnet-physically-structured-turbulence-network) (8/10) — PSTNet is a lightweight, physically-structured neural network for real-time atmospheric turbulence estimation, offering 
- [Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting](https://sciencetostartup.com/paper/pushing-the-frontier-of-black-box-lvlm-attacks-via-fine-grained-detail-targeting) (8/10) — Enhance security of vision-language models with highly effective black-box adversarial attack tool.
- [QdaVPR: A novel query-based domain-agnostic model for visual place recognition](https://sciencetostartup.com/paper/qdavpr-a-novel-query-based-domain-agnostic-model-for-visual-place-recognition) (8/10) — QdaVPR is a domain-agnostic visual place recognition model that achieves state-of-the-art performance on multiple benchm
- [Quantifying Membership Disclosure Risk for Tabular Synthetic Data Using Kernel Density Estimators](https://sciencetostartup.com/paper/quantifying-membership-disclosure-risk-for-tabular-synthetic-data-using-kernel-density-estimators) (8/10) — A practical method to quantify membership disclosure risk in synthetic datasets using kernel density estimators.
- [OpenCap Monocular: 3D Human Kinematics and Musculoskeletal Dynamics from a Single Smartphone Video](https://sciencetostartup.com/paper/opencap-monocular-3d-human-kinematics-and-musculoskeletal-dynamics-from-a-single-smartphone-video) (8/10) — OpenCap Monocular turns any smartphone into a 3D movement analytics tool for musculoskeletal insights.
- [RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation](https://sciencetostartup.com/paper/rbtact-rebuttal-as-supervision-for-actionable-review-feedback-generation) (8/10) — RbtAct enhances peer review processes by generating actionable feedback using rebuttal as supervision.
- [The 1st Winner for 5th PVUW MeViS-Text Challenge: Strong MLLMs Meet SAM3 for Referring Video Object Segmentation](https://sciencetostartup.com/paper/the-1st-winner-for-5th-pvuw-mevis-text-challenge-strong-mllms-meet-sam3-for-referring-video-object-segmentation) (8/10) — A training-free pipeline combining Gemini and SAM3 for referring video object segmentation, achieving state-of-the-art r
- [REACT++: Efficient Cross-Attention for Real-Time Scene Graph Generation](https://sciencetostartup.com/paper/react-efficient-cross-attention-for-real-time-scene-graph-generation) (8/10) — REACT++ is a state-of-the-art scene graph generation model that balances speed and accuracy, enabling real-time applicat
- [DriveVLM-RL: Neuroscience-Inspired Reinforcement Learning with Vision-Language Models for Safe and Deployable Autonomous Driving](https://sciencetostartup.com/paper/drivevlm-rl-neuroscience-inspired-reinforcement-learning-with-vision-language-models-for-safe-and-deployable-autonomous) (8/10) — A neuroscience-inspired reinforcement learning framework integrating vision-language models for safer and deployable aut
- [Real-Time Trust Verification for Safe Agentic Actions using TrustBench](https://sciencetostartup.com/paper/real-time-trust-verification-for-safe-agentic-actions-using-trustbench) (8/10) — TrustBench provides real-time trust verification for autonomous agents to prevent harmful actions before execution.
- [HiSpatial: Taming Hierarchical 3D Spatial Understanding in Vision-Language Models](https://sciencetostartup.com/paper/hispatial-taming-hierarchical-3d-spatial-understanding-in-vision-language-models) (8/10) — HiSpatial provides state-of-the-art 3D spatial intelligence for vision-language models, suitable for enhancing autonomou
- [ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack](https://sciencetostartup.com/paper/reasalign-reasoning-enhanced-safety-alignment-against-prompt-injection-attack) (8/10) — ReasAlign provides enhanced safety alignment for LLMs against prompt injection attacks using reasoning techniques.
- [Reasoning with Pixel-level Precision: QVLM Architecture and SQuID Dataset for Quantitative Geospatial Analytics](https://sciencetostartup.com/paper/reasoning-with-pixel-level-precision-qvlm-architecture-and-squid-dataset-for-quantitative-geospatial-analytics) (8/10) — Develop a geospatial analytics tool for precise quantitative reasoning using pixel-level data with QVLM and SQuID datase
- [RECAP: Resistance Capture in Text-based Mental Health Counseling with Large Language Models](https://sciencetostartup.com/paper/recap-resistance-capture-in-text-based-mental-health-counseling-with-large-language-models) (8/10) — PsyFIRE enhances text-based mental health counseling by accurately detecting client resistance, aiding counselor interve
- [Recover to Predict: Progressive Retrospective Learning for Variable-Length Trajectory Prediction](https://sciencetostartup.com/paper/recover-to-predict-progressive-retrospective-learning-for-variable-length-trajectory-prediction) (8/10) — A novel framework for improving trajectory prediction in autonomous driving using variable-length observations.
- [Recurrent Reasoning with Vision-Language Models for Estimating Long-Horizon Embodied Task Progress](https://sciencetostartup.com/paper/recurrent-reasoning-with-vision-language-models-for-estimating-long-horizon-embodied-task-progress) (8/10) — Recurrent Reasoning Vision-Language Model ($R^2$VLM) enhances task progress estimation for embodied agents using a novel
- [REFORGE: Multi-modal Attacks Reveal Vulnerable Concept Unlearning in Image Generation Models](https://sciencetostartup.com/paper/reforge-multi-modal-attacks-reveal-vulnerable-concept-unlearning-in-image-generation-models) (8/10) — REFORGE is a black-box red-teaming framework that enhances the robustness of image generation model unlearning against a
- [Regression Models Meet Foundation Models: A Hybrid-AI Approach to Practical Electricity Price Forecasting](https://sciencetostartup.com/paper/regression-models-meet-foundation-models-a-hybrid-ai-approach-to-practical-electricity-price-forecasting) (8/10) — FutureBoosting enhances regression-based electricity price forecasts by integrating forecasted features from a frozen ti
- [Reinforcement Learning with Conditional Expectation Reward](https://sciencetostartup.com/paper/reinforcement-learning-with-conditional-expectation-reward) (8/10) — Conditional Expectation Reward enhances reasoning in large language models by providing a flexible verification mechanis
- [ReMAP-DP: Reprojected Multi-view Aligned PointMaps for Diffusion Policy](https://sciencetostartup.com/paper/remap-dp-reprojected-multi-view-aligned-pointmaps-for-diffusion-policy) (8/10) — ReMAP-DP enhances robot manipulation tasks by integrating 3D spatial awareness with advanced diffusion policies.
- [Hierarchical Chain-of-Thought Prompting: Enhancing LLM Reasoning Performance and Efficiency](https://sciencetostartup.com/paper/hierarchical-chain-of-thought-prompting-enhancing-llm-reasoning-performance-and-efficiency) (8/10) — Enhance LLM reasoning by structuring prompts into hierarchical plans and execution steps, improving accuracy and efficie
- [Rethinking Video Generation Model for the Embodied World](https://sciencetostartup.com/paper/rethinking-video-generation-model-for-the-embodied-world) (8/10) — RBench offers a comprehensive framework for evaluating and training video generation models for robotics in embodied AI.
- [Retrieval-Augmented Generation with Covariate Time Series](https://sciencetostartup.com/paper/retrieval-augmented-generation-with-covariate-time-series) (8/10) — RAG4CTS provides a cutting-edge, training-free framework for anomaly detection in industrial time-series applications li
- [Retrieving Counterfactuals Improves Visual In-Context Learning](https://sciencetostartup.com/paper/retrieving-counterfactuals-improves-visual-in-context-learning) (8/10) — CIRCLES enhances vision-language models by using counterfactual examples for improved in-context learning and causal rea
- [Revisiting Text Ranking in Deep Research](https://sciencetostartup.com/paper/revisiting-text-ranking-in-deep-research) (8/10) — A new approach to text ranking for deep research with code and dataset available, ready for application in search produc
- [Reward Prediction with Factorized World States](https://sciencetostartup.com/paper/reward-prediction-with-factorized-world-states) (8/10) — StateFactory transforms unstructured observations into structured representations for accurate reward prediction across 
- [Riemannian Liquid Spatio-Temporal Graph Network](https://sciencetostartup.com/paper/riemannian-liquid-spatio-temporal-graph-network) (8/10) — RLSTG enables businesses to accurately model complex, non-Euclidean graph dynamics, unlocking deeper insights in spatio-
- [RL-Augmented MPC for Non-Gaited Legged and Hybrid Locomotion](https://sciencetostartup.com/paper/rl-augmented-mpc-for-non-gaited-legged-and-hybrid-locomotion) (8/10) — A novel RL and MPC framework for efficient locomotion control in legged robots.
- [RobuMTL: Enhancing Multi-Task Learning Robustness Against Weather Conditions](https://sciencetostartup.com/paper/robumtl-enhancing-multi-task-learning-robustness-against-weather-conditions) (8/10) — RobuMTL enhances autonomous systems' robustness in adverse weather using adaptive multi-task learning.
- [Robotic Ultrasound Makes CBCT Alive](https://sciencetostartup.com/paper/robotic-ultrasound-makes-cbct-alive) (8/10) — A real-time deformation-aware framework that updates static CBCT images using robotic ultrasound for enhanced surgical n
- [Retrieval-Augmented LLMs for Security Incident Analysis](https://sciencetostartup.com/paper/retrieval-augmented-llms-for-security-incident-analysis) (8/10) — A RAG-based system that automates security incident analysis by filtering logs and semantically reasoning with LLMs to r
- [RoCo Challenge at AAAI 2026: Benchmarking Robotic Collaborative Manipulation for Assembly Towards Industrial Automation](https://sciencetostartup.com/paper/roco-challenge-at-aaai-2026-benchmarking-robotic-collaborative-manipulation-for-assembly-towards-industrial-automation) (8/10) — The RoCo Challenge benchmarks robotic collaborative manipulation for industrial assembly, providing a dataset and evalua
- [RoTri-Diff: A Spatial Robot-Object Triadic Interaction-Guided Diffusion Model for Bimanual Manipulation](https://sciencetostartup.com/paper/rotri-diff-a-spatial-robot-object-triadic-interaction-guided-diffusion-model-for-bimanual-manipulation) (8/10) — RoTri-Diff is a diffusion-based imitation learning framework that enables robots to perform stable and coordinated biman
- [RS-WorldModel: a Unified Model for Remote Sensing Understanding and Future Sense Forecasting](https://sciencetostartup.com/paper/rs-worldmodel-a-unified-model-for-remote-sensing-understanding-and-future-sense-forecasting) (8/10) — RS-WorldModel is a unified model for remote sensing that enhances understanding of changes and forecasts future scenes u
- [RTD-Guard: A Black-Box Textual Adversarial Detection Framework via Replacement Token Detection](https://sciencetostartup.com/paper/rtd-guard-a-black-box-textual-adversarial-detection-framework-via-replacement-token-detection) (8/10) — RTD-Guard is a lightweight black-box framework for detecting textual adversarial attacks in NLP systems.
- [RTFDNet: Fusion-Decoupling for Robust RGB-T Segmentation](https://sciencetostartup.com/paper/rtfdnet-fusion-decoupling-for-robust-rgb-t-segmentation) (8/10) — RTFDNet enhances RGB-T segmentation for robust robotic systems in low-light environments through innovative fusion-decou
- [Learning Humanoid Navigation from Human Data](https://sciencetostartup.com/paper/learning-humanoid-navigation-from-human-data) (8/10) — EgoNav enables humanoid robots to autonomously navigate diverse environments using human walking data, bypassing traditi
- [OCP: Orthogonal Constrained Projection for Sparse Scaling in Industrial Commodity Recommendation](https://sciencetostartup.com/paper/ocp-orthogonal-constrained-projection-for-sparse-scaling-in-industrial-commodity-recommendation) (8/10) — A novel projection method for industrial recommendation systems that significantly improves scalability and performance 
- [SafeCRS: Personalized Safety Alignment for LLM-Based Conversational Recommender Systems](https://sciencetostartup.com/paper/safecrs-personalized-safety-alignment-for-llm-based-conversational-recommender-systems) (8/10) — SafeCRS offers a personalized safety alignment framework for conversational recommendation systems, optimizing user-spec
- [SAGE: Multi-Agent Self-Evolution for LLM Reasoning](https://sciencetostartup.com/paper/sage-multi-agent-self-evolution-for-llm-reasoning) (8/10) — SAGE is a self-evolving multi-agent framework that enhances reasoning in LLMs through closed-loop training with minimal 
- [Translating MRI to PET through Conditional Diffusion Models with Enhanced Pathology Awareness](https://sciencetostartup.com/paper/translating-mri-to-pet-through-conditional-diffusion-models-with-enhanced-pathology-awareness) (8/10) — A novel AI framework generates high-quality, pathology-aware synthetic PET scans from MRI, improving Alzheimer's diagnos
- [DomAgent: Leveraging Knowledge Graphs and Case-Based Reasoning for Domain-Specific Code Generation](https://sciencetostartup.com/paper/domagent-leveraging-knowledge-graphs-and-case-based-reasoning-for-domain-specific-code-generation) (8/10) — DomAgent enables LLMs to generate domain-specific code by combining knowledge graphs and case-based reasoning, significa
- [Scaling Tasks, Not Samples: Mastering Humanoid Control through Multi-Task Model-Based Reinforcement Learning](https://sciencetostartup.com/paper/scaling-tasks-not-samples-mastering-humanoid-control-through-multi-task-model-based-reinforcement-learning) (8/10) — Develop a scalable robot control solution using multi-task model-based reinforcement learning with a focus on task scali
- [ScalSelect: Scalable Training-Free Multimodal Data Selection for Efficient Visual Instruction Tuning](https://sciencetostartup.com/paper/scalselect-scalable-training-free-multimodal-data-selection-for-efficient-visual-instruction-tuning) (8/10) — ScalSelect offers an efficient data selection tool that reduces training costs for vision-language models by 84% without
- [SCoUT: Scalable Communication via Utility-Guided Temporal Grouping in Multi-Agent Reinforcement Learning](https://sciencetostartup.com/paper/scout-scalable-communication-via-utility-guided-temporal-grouping-in-multi-agent-reinforcement-learning) (8/10) — SCoUT enhances multi-agent MARL communication with scalable, utility-driven temporal grouping, delivering precise credit
- [SSAM: Singular Subspace Alignment for Merging Multimodal Large Language Models](https://sciencetostartup.com/paper/ssam-singular-subspace-alignment-for-merging-multimodal-large-language-models) (8/10) — A training-free framework to merge existing multimodal LLMs into a single model capable of handling any combination of i