DMax: Aggressive Parallel Decoding for dLLMs Build Now
DMax offers aggressive parallel decoding for diffusion language models, significantly increasing throughput while preserving generation quality.
GitHub 49 stars Velocity flat History 1 snapshot LLM Inference Apr 9 Pending High viability
CLEAR: Context Augmentation from Contrastive Learning of Experience via Agentic Reflection Build Now
A framework that trains LLM agents to generate task-specific context, improving decision-making and task completion rates.
GitHub 0 stars Velocity flat History 1 snapshot LLM Agents Apr 8 Pending High viability
SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions Build Now
SUPERNOVA enhances general reasoning in language models through a curated data framework for reinforcement learning with verifiable rewards.
GitHub 0 stars Velocity flat History 1 snapshot AI Reasoning Enhancement Apr 9 Pending High viability
Safe Large-Scale Robust Nonlinear MPC in Milliseconds via Reachability-Constrained System Level Synthesis on the GPU Build Now
GPU-accelerated framework for safe, robust nonlinear model predictive control that achieves real-time performance on high-dimensional robotic systems.
GitHub 0 stars Velocity flat History 1 snapshot Robotics Control Apr 8 Pending High viability
OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks Build Now
A generalist multimodal reasoning model that significantly outperforms frontier models across 18 benchmarks by introducing a novel RL training objective and task-level shaping mechanisms.
GitHub 139 stars Velocity flat History 1 snapshot Multimodal AI Apr 9 Pending High viability
DSCA: Dynamic Subspace Concept Alignment for Lifelong VLM Editing Build Now
DSCA offers a novel approach to lifelong VLM editing by structurally separating concepts into orthogonal semantic subspaces, enabling precise, non-interfering edits with state-of-the-art stability.
GitHub 712 stars Velocity flat History 1 snapshot VLM Editing Apr 9 Pending High viability
Data Selection for Multi-turn Dialogue Instruction Tuning Build Now
MDS is a dialogue-level framework that enhances instruction-tuned language models by selecting high-quality multi-turn dialogues.
GitHub 2 stars Velocity flat History 1 snapshot Dialogue Systems Apr 9 Pending High viability
PIArena: A Platform for Prompt Injection Evaluation Build Now
A unified platform for evaluating prompt injection attacks and defenses, enabling reliable comparison and uncovering limitations of current security measures.
GitHub 10 stars Velocity flat History 1 snapshot LLM Security Apr 9 Pending High viability
SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds Build Now
A physics-aligned simulator that acts as a zero-shot data scaler for robotic manipulation of deformable objects, achieving parity with real-data baselines using purely synthetic data.
GitHub 47 stars Velocity flat History 1 snapshot Robotics Apr 9 Pending High viability
SYN-DIGITS: A Synthetic Control Framework for Calibrated Digital Twin Simulation Build Now
SYN-DIGITS is a model-agnostic framework that calibrates AI persona simulations to align with real human behavior, improving reliability for market research and social sciences.
GitHub 0 stars Velocity flat History 1 snapshot AI Persona Calibration Apr 8 Pending High viability
EigentSearch-Q+: Enhancing Deep Research Agents with Structured Reasoning Tools Build Now
Enhance AI agents for deep research by integrating structured reasoning tools (Q+) into web search for deliberate query planning and evidence extraction, improving accuracy.
GitHub 4 stars Velocity flat History 1 snapshot Agents Apr 9 Pending High viability
PrivFedTalk: Privacy-Aware Federated Diffusion with Identity-Stable Adapters for Personalized Talking-Head Generation Build Now
A privacy-preserving federated learning framework for personalized talking-head generation using lightweight identity adapters.
GitHub 0 stars Velocity flat History 1 snapshot Generative AI Apr 9 Pending High viability
DIVERSED: Relaxed Speculative Decoding via Dynamic Ensemble Verification Build Now
Accelerate LLM inference by relaxing speculative decoding's rigid verification step with a dynamic ensemble.
GitHub 0 stars Velocity flat History 1 snapshot LLM Inference Optimization Apr 8 Pending High viability
SeLaR: Selective Latent Reasoning in Large Language Models Build Now
SeLaR is a training-free framework that selectively uses latent reasoning to improve LLM performance on complex reasoning tasks.
GitHub 5 stars Velocity flat History 1 snapshot LLM Reasoning Apr 9 Pending High viability
Reinforcement-Guided Synthetic Data Generation for Privacy-Sensitive Identity Recognition Build Now
A reinforcement-guided framework generates high-fidelity synthetic data for privacy-sensitive identity recognition tasks.
GitHub 712 stars Velocity flat History 1 snapshot Synthetic Data Generation Apr 9 Pending High viability
Multimodal Reasoning with LLM for Encrypted Traffic Interpretation: A Benchmark Build Now
An end-to-end multimodal reasoning framework that uses LLMs to interpret encrypted network traffic, generating human-readable reports grounded in raw byte data.
GitHub 0 stars Velocity flat History 1 snapshot Encrypted Traffic Interpretation Apr 9 Pending High viability
How Far Are Large Multimodal Models from Human-Level Spatial Action? A Benchmark for Goal-Oriented Embodied Navigation in Urban Airspace Build Now
A benchmark and dataset for evaluating large multimodal models in goal-oriented urban airspace navigation, revealing current limitations and future improvement directions.
GitHub 0 stars Velocity flat History 1 snapshot Embodied AI Navigation Apr 9 Pending High viability
Distributed Multi-Layer Editing for Rule-Level Knowledge in Large Language Models Build Now
A distributed multi-layer editing approach for rule-level knowledge in LLMs, improving consistency and understanding across different knowledge forms.
GitHub stars n/a Velocity flat History pending LLM Editing Apr 9 Pending High viability
Task-Adaptive Retrieval over Agentic Multi-Modal Web Histories via Learned Graph Memory Build Now
A task-adaptive retrieval system leveraging learned graph memory for improved web interaction history relevance.
GitHub 0 stars Velocity flat History 1 snapshot Information Retrieval Apr 9 Pending High viability
DCD: Domain-Oriented Design for Controlled Retrieval-Augmented Generation Build Now
DCD is a domain-oriented design for controlled RAG that structures knowledge hierarchically and uses multi-stage routing to improve factual accuracy and answer relevance without modifying the base LLM.
GitHub 0 stars Velocity flat History 1 snapshot Retrieval-Augmented Generation Apr 8 Pending High viability
InstAP: Instance-Aware Vision-Language Pre-Train for Spatial-Temporal Understanding Build Now
InstAP is an instance-aware pre-training framework for vision-language models that improves both instance-level and global understanding.
GitHub 712 stars Velocity flat History 1 snapshot Vision-Language Models Apr 9 Pending High viability
SearchAD: Large-Scale Rare Image Retrieval Dataset for Autonomous Driving Build Now
SearchAD is a large-scale dataset designed for rare image retrieval in autonomous driving, enhancing data curation and perception research.
GitHub 712 stars Velocity flat History 1 snapshot Autonomous Driving Dataset Apr 9 Pending High viability
HiRO-Nav: Hybrid ReasOning Enables Efficient Embodied Navigation Build Now
HiRO-Nav is an embodied navigation agent that adaptively uses reasoning only for high-entropy actions, reducing computational cost while improving decision quality.
GitHub 712 stars Velocity flat History 1 snapshot Embodied AI Apr 9 Pending High viability
CIAO - Code In Architecture Out - Automated Software Architecture Documentation with Large Language Models Build Now
Automated system-level software architecture documentation generation from code repositories using LLMs, providing valuable and cost-effective insights.
GitHub stars n/a Velocity flat History pending LLM Applications Apr 9 Code High viability
ACF: A Collaborative Framework for Agent Covert Communication under Cognitive Asymmetry Watch
A framework for covert communication among autonomous agents overcoming cognitive asymmetry.
GitHub 3 stars Velocity flat History 1 snapshot Agent Communication Apr 9 Pending
OV-Stitcher: A Global Context-Aware Framework for Training-Free Open-Vocabulary Semantic Segmentation Build Now
A training-free framework that stitches sub-image features to enable global context awareness for open-vocabulary semantic segmentation.
GitHub 712 stars Velocity flat History 1 snapshot Open-Vocabulary Segmentation Apr 9 Pending High viability
HistDiT: A Structure-Aware Latent Conditional Diffusion Model for High-Fidelity Virtual Staining in Histopathology Build Now
A structure-aware latent conditional diffusion model for high-fidelity virtual staining in histopathology, outperforming existing methods.
GitHub 0 stars Velocity flat History 1 snapshot Medical AI Apr 9 Pending High viability
Beyond Pedestrians: Caption-Guided CLIP Framework for High-Difficulty Video-based Person Re-Identification Build Now
CG-CLIP is a caption-guided framework for high-difficulty video person re-identification, outperforming state-of-the-art in challenging scenarios.
GitHub 712 stars Velocity flat History 1 snapshot Computer Vision Apr 9 Pending High viability
KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation Build Now
A new benchmark and simulator for evaluating personalized mobile agents that can infer user preferences and provide proactive assistance in real-time GUI environments.
GitHub 1865 stars Velocity flat History 1 snapshot Agents Apr 9 Pending High viability
Visual Perceptual to Conceptual First-Order Rule Learning Networks Build Now
A differentiable framework for learning symbolic rules directly from image data, enabling explainable AI and enhanced reasoning.
GitHub 1865 stars Velocity flat History 1 snapshot Explainable AI / Rule Learning Apr 9 Pending High viability
From Ground Truth to Measurement: A Statistical Framework for Human Labeling Ignore
This paper proposes a statistical framework to decompose human labeling variation into interpretable sources like instance difficulty, annotator bias, and situational noise, moving beyond treating all disagreement as noise.
GitHub 169 stars Velocity flat History 1 snapshot Data Annotation Apr 8 Pending
RewardFlow: Generate Images by Optimizing What You Reward Watch
A new image generation tool leveraging multi-reward Langevin dynamics for state-of-the-art image editing.
GitHub 712 stars Velocity flat History 1 snapshot AI Image Generation Apr 9 Pending
Enabling Intrinsic Reasoning over Dense Geospatial Embeddings with DFR-Gemma Build Now
Enable LLMs to directly reason over dense geospatial embeddings with DFR-Gemma, offering a more efficient and accurate approach to multimodal geospatial intelligence.
GitHub stars n/a Velocity flat History 1 snapshot Geospatial AI Apr 8 Code High viability
IoT-Brain: Grounding LLMs for Semantic-Spatial Sensor Scheduling Build Now
IoT-Brain bridges LLMs and sensor networks for proactive, intent-driven physical world interaction through semantic-spatial sensor scheduling.
GitHub stars n/a Velocity flat History 1 snapshot AI Agents Apr 9 Code High viability
Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models Build Now
A framework that significantly reduces tool invocations in agentic multimodal models while improving reasoning accuracy by decoupling accuracy and efficiency optimization.
GitHub 1865 stars Velocity flat History 1 snapshot Agents Apr 9 Pending High viability
ReRec: Reasoning-Augmented LLM-based Recommendation Assistant via Reinforcement Fine-tuning Watch
A reinforcement fine-tuning framework that enhances LLM reasoning for complex recommendation tasks.
GitHub stars n/a Velocity flat History 1 snapshot Recommendation AI Apr 9 Pending
Dead Weights, Live Signals: Feedforward Graphs of Frozen Language Models Build Now
A novel feedforward graph architecture that composes heterogeneous frozen LLMs to achieve state-of-the-art performance on challenging benchmarks with minimal trainable parameters.
GitHub 1865 stars Velocity flat History 1 snapshot LLM Composition Apr 9 Pending High viability
Rhizome OS-1: Rhizome's Semi-Autonomous Operating System for Small Molecule Drug Discovery Build Now
Rhizome OS-1 is a semi-autonomous operating system for small molecule drug discovery, leveraging multi-modal AI agents and graph neural networks to generate novel chemical matter.
GitHub stars n/a Velocity flat History 1 snapshot Drug Discovery Agents Apr 8 Code High viability
ConsistRM: Improving Generative Reward Models via Consistency-Aware Self-Training Build Now
A self-training framework for generative reward models that improves LLM alignment by ensuring consistency in generated critiques and rewards.
GitHub stars n/a Velocity flat History 1 snapshot LLM Alignment Apr 8 Code High viability
OceanMAE: A Foundation Model for Ocean Remote Sensing Build Now
A foundation model for ocean remote sensing that improves marine pollutant detection and bathymetry estimation by integrating multispectral data with ocean-specific features.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 9 Code High viability
ProMedical: Hierarchical Fine-Grained Criteria Modeling for Medical LLM Alignment via Explicit Injection Build Now
A novel alignment framework for medical LLMs that uses fine-grained clinical criteria and explicit injection to achieve superior accuracy and safety compliance.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 9 Code High viability
AnomalyAgent: Agentic Industrial Anomaly Synthesis via Tool-Augmented Reinforcement Learning Build Now
An agentic system that synthesizes realistic industrial anomalies for data augmentation, improving anomaly detection.
GitHub stars n/a Velocity flat History 1 snapshot Generative AI for Industrial Data Apr 9 Code High viability
AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models Build Now
AtlasOCR is the first open-source Darija OCR model, fine-tuned from a VLM using efficient training strategies and a custom dataset.
GitHub stars n/a Velocity flat History 1 snapshot OCR Apr 9 Code High viability
FlowGuard: Towards Lightweight In-Generation Safety Detection for Diffusion Models via Linear Latent Decoding Build Now
FlowGuard is a lightweight framework for in-generation safety detection in diffusion models, reducing computational costs significantly.
GitHub stars n/a Velocity flat History 1 snapshot Safety Detection Apr 9 Code High viability
TSUBASA: Improving Long-Horizon Personalization via Evolving Memory and Self-Learning with Context Distillation Build Now
TSUBASA enhances personalized LLMs for long-horizon tasks by evolving memory and self-learning with context distillation.
GitHub stars n/a Velocity flat History 1 snapshot Personalized LLMs Apr 9 Code High viability
Exponential quantum advantage in processing massive classical data Ignore
Demonstrates exponential quantum advantage in processing massive classical data for classification and dimension reduction using quantum oracle sketching.
GitHub 22 stars Velocity flat History 1 snapshot Quantum ML Apr 8 Pending
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver Ignore
SkillClaw enables LLM agents to collectively evolve their skills by learning from cross-user interactions, improving performance system-wide without user effort.
GitHub 250 stars Velocity flat History 1 snapshot Agents Apr 9 Pending
From Papers to Property Tables: A Priority-Based LLM Workflow for Materials Data Extraction Build Now
An LLM-powered workflow automatically extracts and reconstructs structured materials data from research articles, enabling scalable database construction.
GitHub stars n/a Velocity flat History 1 snapshot Materials Science Data Extraction Apr 8 Code High viability
Scaling-Aware Data Selection for End-to-End Autonomous Driving Systems Watch
MOSAIC is a data selection framework for autonomous driving that uses neural scaling laws to optimize training data mixtures, reducing data needs by up to 80%.
GitHub 712 stars Velocity flat History 1 snapshot Autonomous Driving Apr 9 Pending
Aligning Agents via Planning: A Benchmark for Trajectory-Level Reward Modeling Build Now
A new benchmark and evaluation suite for training reward models that align AI agents capable of complex tool use and reasoning.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 9 Code High viability
Alloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts Inference Build Now
Alloc-MoE optimizes expert activation allocation in Mixture-of-Experts models for efficient inference, maintaining performance under budget constraints.
GitHub stars n/a Velocity flat History 1 snapshot LLM Inference Optimization Apr 9 Code High viability
ClawBench: Can AI Agents Complete Everyday Online Tasks? Build Now
ClawBench is a new benchmark for evaluating AI agents on everyday online tasks across 144 live platforms, revealing current frontier models can only complete a small fraction of these complex, real-world challenges.
GitHub stars n/a Velocity flat History 1 snapshot AI Agents Apr 9 Code High viability
CMP: Robust Whole-Body Tracking for Loco-Manipulation via Competence Manifold Projection Build Now
A robust whole-body tracking system for robots that projects control policies onto a competence manifold to handle unexpected inputs.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Control Apr 8 Code High viability
AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation Build Now
A task-driven benchmark and evaluation framework for text-to-audio-video generation that reveals significant gaps in semantic controllability.
GitHub stars n/a Velocity flat History 1 snapshot Generative Media Apr 9 Code High viability
Awakening the Sleeping Agent: Lean-Specific Agentic Data Reactivates General Tool Use in Goedel Prover Build Now
A small amount of domain-specific agentic data can reactivate dormant tool-use capabilities in large language models, significantly improving performance on diverse tasks.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 9 Code High viability
Squeeze Evolve: Unified Multi-Model Orchestration for Verifier-Free Evolution Build Now
A framework that intelligently orchestrates multiple LLMs of varying costs to optimize performance and efficiency for verifier-free evolutionary inference.
GitHub stars n/a Velocity flat History 1 snapshot LLM Orchestration Apr 9 Code High viability
M-ArtAgent: Evidence-Based Multimodal Agent for Implicit Art Influence Discovery Build Now
An AI agent that uses multimodal evidence and art historical axioms to discover and verify implicit artistic influences.
GitHub stars n/a Velocity flat History 1 snapshot Art History AI Apr 8 Code High viability
Beyond Surface Artifacts: Capturing Shared Latent Forgery Knowledge Across Modalities Build Now
A modality-agnostic framework for deepfake detection that captures shared latent forgery knowledge, enabling generalization to unseen modalities.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal AI Apr 9 Code High viability
Joint Task Offloading, Inference Optimization and UAV Trajectory Planning for Generative AI Empowered Intelligent Transportation Digital Twin Build Now
Optimizing UAV-based generative AI inference for intelligent transportation digital twins to maximize system utility and minimize delay.
GitHub stars n/a Velocity flat History 1 snapshot Generative AI for Digital Twins Apr 9 Code High viability
Incremental Residual Reinforcement Learning Toward Real-World Learning for Social Navigation Build Now
A novel incremental residual reinforcement learning method for real-world social navigation in mobile robots.
GitHub stars n/a Velocity flat History 1 snapshot Social Navigation Apr 9 Code High viability
Small Vision-Language Models are Smart Compressors for Long Video Understanding Build Now
An efficient framework that compresses long videos for downstream understanding using small vision-language models.
GitHub stars n/a Velocity flat History 1 snapshot Video Understanding Apr 9 Code High viability
Latent Anomaly Knowledge Excavation: Unveiling Sparse Sensitive Neurons in Vision-Language Models Build Now
A training-free framework that identifies and elicits latent anomaly-sensitive neurons in vision-language models to achieve state-of-the-art anomaly detection with neuron-level interpretability.
GitHub stars n/a Velocity flat History 1 snapshot Vision-Language Models Apr 9 Code High viability
Pruning Extensions and Efficiency Trade-Offs for Sustainable Time Series Classification Build Now
A framework for energy-efficient time series classification through a novel pruning strategy.
GitHub stars n/a Velocity flat History 1 snapshot Time Series Classification Apr 9 Code High viability
Faithful GRPO: Improving Visual Spatial Reasoning in Multimodal Language Models via Constrained Policy Optimization Build Now
Faithful GRPO improves visual spatial reasoning in multimodal models by enforcing logical consistency and visual grounding, leading to more accurate and trustworthy answers.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal LLMs Apr 9 Code High viability
A GAN and LLM-Driven Data Augmentation Framework for Dynamic Linguistic Pattern Modeling in Chinese Sarcasm Detection Build Now
A GAN and LLM-driven framework that dynamically models user linguistic patterns for enhanced Chinese sarcasm detection, achieving state-of-the-art results.
GitHub stars n/a Velocity flat History 1 snapshot Sarcasm Detection Apr 9 Code High viability
U-CECE: A Universal Multi-Resolution Framework for Conceptual Counterfactual Explanations Build Now
A universal framework for generating conceptual counterfactual explanations for AI models, balancing expressivity and efficiency across different data regimes.
GitHub stars n/a Velocity flat History 1 snapshot AI Explainability Apr 9 Code High viability
Dynamic Attentional Context Scoping: Agent-Triggered Focus Sessions for Isolated Per-Agent Steering in Multi-Agent LLM Orchestration Watch
A novel context management system for multi-agent LLM orchestration that isolates agent contexts to prevent pollution and improve decision quality.
GitHub 0 stars Velocity flat History 1 snapshot Multi-Agent LLM Orchestration Apr 9 Pending
How Independent are Large Language Models? A Statistical Framework for Auditing Behavioral Entanglement and Reweighting Verifier Ensembles Build Now
A statistical framework to audit and mitigate behavioral entanglement in large language models, improving ensemble verification accuracy.
GitHub stars n/a Velocity flat History 1 snapshot LLM Analysis Apr 8 Code High viability
DailyArt: Discovering Articulation from Single Static Images via Latent Dynamics Build Now
A novel approach to infer articulated object kinematics from single static images by synthesizing an opened state to reveal hidden motion cues.
GitHub stars n/a Velocity flat History 1 snapshot Computer Vision Apr 9 Code High viability
MedVR: Annotation-Free Medical Visual Reasoning via Agentic Reinforcement Learning Build Now
MedVR enables annotation-free medical visual reasoning for VLMs using agentic reinforcement learning, improving robustness and transparency.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 9 Code High viability
ACIArena: Toward Unified Evaluation for Agent Cascading Injection Ignore
A unified framework for evaluating the security of multi-agent systems against cascading injection attacks, providing a benchmark and insights into robust design.
GitHub 4 stars Velocity flat History 1 snapshot Multi-Agent Systems Security Apr 9 Pending
CrashSight: A Phase-Aware, Infrastructure-Centric Video Benchmark for Traffic Crash Scene Understanding and Reasoning Build Now
CrashSight is a large-scale benchmark for traffic crash scene understanding using roadside camera data, evaluating vision-language models' reasoning in safety-critical scenarios.
GitHub stars n/a Velocity flat History 1 snapshot Computer Vision Apr 9 Code High viability
LPM 1.0: Video-based Character Performance Model Build Now
LPM 1.0 offers a video-based character performance model for interactive conversations, minimizing conventional 3D pipeline complexities.
GitHub stars n/a Velocity flat History 1 snapshot Video Synthesis for Character Animation Apr 9 Code High viability
Seeing but Not Thinking: Routing Distraction in Multimodal Mixture-of-Experts Build Now
A method to improve reasoning in multimodal MoE models by addressing 'Seeing but Not Thinking' by guiding expert activation to overcome routing distraction.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal AI Apr 9 Code High viability
DialBGM: A Benchmark for Background Music Recommendation from Everyday Multi-Turn Dialogues Build Now
A benchmark and evaluation framework for dialogue-conditioned background music recommendation, addressing a gap in media production.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal Recommendation Apr 9 Code High viability
LogAct: Enabling Agentic Reliability via Shared Logs Build Now
LogAct is a new abstraction for LLM-driven agents that enables reliability through shared logs, allowing for introspection and recovery.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 9 Code High viability
ReflectRM: Boosting Generative Reward Models via Self-Reflection within a Unified Judgment Framework Build Now
ReflectRM enhances Generative Reward Models by incorporating self-reflection to improve preference modeling and mitigate bias in LLM alignment.
GitHub stars n/a Velocity flat History 1 snapshot Generative Reward Models Apr 8 Code High viability
MONETA: Multimodal Industry Classification through Geographic Information with Multi Agent Systems Build Now
A multimodal benchmark and system for industry classification using text and geospatial data, achieving significant performance gains.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal AI Apr 9 Code High viability
Towards Knowledgeable Deep Research: Framework and Benchmark Build Now
A framework and benchmark for LLM agents to perform deep research using both structured and unstructured knowledge, generating multimodal reports.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 9 Code High viability
SAT: Balancing Reasoning Accuracy and Efficiency with Stepwise Adaptive Thinking Build Now
A framework that adaptively prunes reasoning steps in large language models to reduce token usage without sacrificing accuracy.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Optimization Apr 9 Code High viability
HyperMem: Hypergraph Memory for Long-Term Conversations Build Now
HyperMem is a hypergraph-based memory architecture for conversational agents that captures high-order associations for more coherent and personalized long-term dialogues.
GitHub stars n/a Velocity flat History 1 snapshot Conversational AI Apr 9 Code High viability
PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models Build Now
PokeGym is a visually-driven benchmark for Vision-Language Models in complex 3D environments, revealing physical deadlock recovery as a key bottleneck.
GitHub stars n/a Velocity flat History 1 snapshot Vision-Language Models Apr 9 Code High viability
Neural-Symbolic Knowledge Tracing: Injecting Educational Knowledge into Deep Learning for Responsible Learner Modelling Build Now
A neural-symbolic approach for responsible learner modeling in education.
GitHub stars n/a Velocity flat History 1 snapshot Educational AI Apr 9 Code High viability
Learning Who Disagrees: Demographic Importance Weighting for Modeling Annotator Distributions with DiADEM Build Now
A neural architecture that models annotator disagreement by learning the importance of demographic factors, outperforming LLMs on subjective content labeling.
GitHub stars n/a Velocity flat History 1 snapshot LLM Interpretation Apr 9 Code High viability
KV Cache Offloading for Context-Intensive Tasks Build Now
A new benchmark and improved KV cache offloading strategy to address performance degradation on context-intensive LLM tasks, enabling more accurate long-context processing.
GitHub stars n/a Velocity flat History 1 snapshot LLM Optimization Apr 9 Code High viability
PASK: Toward Intent-Aware Proactive Agents with Long-Term Memory Build Now
Develops a proactive agent paradigm with long-term memory and a real-world benchmark for streaming AI agents.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 9 Code High viability
Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator Build Now
A unified framework for video generation and understanding that leverages a video generator as the foundation.
GitHub stars n/a Velocity flat History 1 snapshot Generative Video Apr 9 Code High viability
AtomEval: Atomic Evaluation of Adversarial Claims in Fact Verification Build Now
AtomEval is a validity-aware evaluation framework for fact verification that decomposes claims into atomic units to detect factual corruption in adversarial rewrites.
GitHub stars n/a Velocity flat History 1 snapshot Fact Verification Apr 9 Code High viability
EditCaption: Human-Aligned Instruction Synthesis for Image Editing via Supervised Fine-Tuning and Direct Preference Optimization Build Now
EditCaption synthesizes human-aligned instructions for image editing by combining supervised fine-tuning and direct preference optimization, significantly improving VLM accuracy.
GitHub stars n/a Velocity flat History 1 snapshot Generative Vision Apr 9 Code High viability
OVS-DINO: Open-Vocabulary Segmentation via Structure-Aligned SAM-DINO with Language Guidance Build Now
OVS-DINO revitalizes edge-sensitivity in DINO for open-vocabulary segmentation by structurally aligning with SAM, achieving state-of-the-art results.
GitHub stars n/a Velocity flat History 1 snapshot Computer Vision Apr 9 Code High viability
Revise: A Framework for Revising OCRed text in Practical Information Systems with Data Contamination Strategy Build Now
A framework for correcting OCR errors and improving document retrieval and question answering.
GitHub stars n/a Velocity flat History 1 snapshot Document AI Apr 9 Code High viability
MIMIC-Py: An Extensible Tool for Personality-Driven Automated Game Testing with Large Language Models Build Now
MIMIC-Py is a reusable framework for personality-driven LLM agents to automate game testing, bridging research prototypes and practical applications.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 9 Code High viability
SEARL: Joint Optimization of Policy and Tool Graph Memory for Self-Evolving Agents Build Now
A framework for self-evolving agents that optimizes policy and tool graph memory for more practical and efficient learning in resource-constrained environments.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 9 Code High viability
TrajGuard: Streaming Hidden-state Trajectory Detection for Decoding-time Jailbreak Defense Build Now
A real-time, training-free defense system that detects LLM jailbreaks by analyzing hidden state trajectories during decoding, achieving high accuracy with low latency.
GitHub stars n/a Velocity flat History 1 snapshot LLM Security Apr 9 Code High viability
TEMPER: Testing Emotional Perturbation in Quantitative Reasoning Build Now
A framework for testing and mitigating the impact of emotional framing on LLM quantitative reasoning, demonstrating accuracy degradation and proposing neutralization as a solution.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Apr 9 Code High viability
Toward Generalizable Graph Learning for 3D Engineering AI: Explainable Workflows for CAE Mode Shape Classification and CFD Field Prediction Build Now
A graph learning framework for 3D engineering AI that converts heterogeneous data into physics-aware graph representations for explainable CAE and CFD decision support.
GitHub stars n/a Velocity flat History 1 snapshot Graph Neural Networks Apr 9 Code High viability
On-Policy Distillation of Language Models for Autonomous Vehicle Motion Planning Build Now
A method for distilling knowledge from large language models for efficient motion planning in autonomous vehicles.
GitHub stars n/a Velocity flat History 1 snapshot Autonomous Vehicle Planning Apr 9 Code High viability
TR-EduVSum: A Turkish-Focused Dataset and Consensus Framework for Educational Video Summarization Build Now
A framework and dataset for Turkish educational video summarization that automatically generates gold-standard summaries based on human consensus.
GitHub stars n/a Velocity flat History 1 snapshot Educational Video Summarization Apr 8 Code High viability
Mitigating Entangled Steering in Large Vision-Language Models for Hallucination Reduction Build Now
A plug-and-play framework for vision-language models that reduces hallucinations by selectively intervening in latent space without altering generation behavior.
GitHub stars n/a Velocity flat History 1 snapshot Vision-Language Models Apr 9 Code High viability
Show Me the Infographic I Imagine: Intent-Aware Infographic Retrieval for Authoring Support Build Now
An intent-aware infographic retrieval framework that uses a taxonomy to align user queries with infographic designs for authoring support.
GitHub stars n/a Velocity flat History 1 snapshot Infographic Retrieval Apr 9 Code High viability
Learning Without Losing Identity: Capability Evolution for Embodied Agents Build Now
A capability-centric evolution paradigm for embodied agents that decouples capability learning from agent identity, enabling continuous improvement without loss of stability.
GitHub stars n/a Velocity flat History 1 snapshot Embodied AI Apr 9 Code High viability
EMSDialog: Synthetic Multi-person Emergency Medical Service Dialogue Generation from Electronic Patient Care Reports via Multi-LLM Agents Build Now
A multi-LLM agent system generates synthetic medical dialogues for training diagnostic prediction models, creating a valuable dataset and improving model performance.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents & Datasets Apr 8 Code High viability
TOOLCAD: Exploring Tool-Using Large Language Models in Text-to-CAD Generation with Reinforcement Learning Build Now
An agentic framework using LLMs and reinforcement learning to generate CAD models from text, competitive with proprietary models.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 9 Code High viability
Cluster Attention for Graph Machine Learning Build Now
Enhance graph machine learning models with cluster attention to significantly improve performance on graph datasets by increasing receptive fields while preserving graph structure.
GitHub stars n/a Velocity flat History 1 snapshot Graph Machine Learning Apr 8 Code High viability
Towards Real-Time Human-AI Musical Co-Performance: Accompaniment Generation with Latent Diffusion Models and MAX/MSP Build Now
Enable real-time human-AI musical co-performance with latent diffusion models generating accompaniment via MAX/MSP.
GitHub stars n/a Velocity flat History 1 snapshot Generative Music Apr 8 Code High viability
LINE: LLM-based Iterative Neuron Explanations for Vision Models Build Now
A black-box, training-free method using LLMs to iteratively label neurons in vision models with open-vocabulary concepts, improving interpretability and safety.
GitHub stars n/a Velocity flat History 1 snapshot Model Interpretability Apr 9 Code High viability
Filling the Gaps: Selective Knowledge Augmentation for LLM Recommenders Build Now
A novel method to improve LLM-based recommenders by selectively augmenting item knowledge, boosting accuracy and context efficiency without fine-tuning.
GitHub stars n/a Velocity flat History 1 snapshot LLM Recommenders Apr 9 Code High viability
Selective Attention System (SAS): Device-Addressed Speech Detection for Real-Time On-Device Voice AI Build Now
A selective attention system for device-addressed speech detection that operates on-device with low latency and footprint, achieving high accuracy with optional video fusion.
GitHub stars n/a Velocity flat History 1 snapshot On-Device Voice AI Apr 9 Code High viability
ADAPTive Input Training for Many-to-One Pre-Training on Time-Series Classification Build Now
ADAPT is a new pre-training paradigm for time-series data that enables mixed-batch pre-training across diverse datasets, setting new state-of-the-art performance for classification benchmarks.
GitHub stars n/a Velocity flat History 1 snapshot Time Series Classification Apr 9 Code High viability
PRIME: Training Free Proactive Reasoning via Iterative Memory Evolution for User-Centric Agent Build Now
PRIME is a gradient-free framework for proactive, collaborative agents that learn from human-AI interactions without expensive training.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 8 Code High viability
Investigation of Automated Design of Quantum Circuits for Imaginary Time Evolution Methods Using Deep Reinforcement Learning Build Now
An automated framework for designing quantum circuits using deep reinforcement learning.
GitHub stars n/a Velocity flat History 1 snapshot Quantum Circuit Design Apr 9 Code High viability
Private Seeds, Public LLMs: Realistic and Privacy-Preserving Synthetic Data Generation Build Now
A privacy-preserving method for generating realistic synthetic text data from private sources using differential privacy and private seeds.
GitHub stars n/a Velocity flat History 1 snapshot Synthetic Data Generation Apr 8 Code High viability
Sensitivity-Positional Co-Localization in GQA Transformers Build Now
A novel method for efficient LLM fine-tuning that decouples sensitivity and positional encoding adaptation, achieving strong performance across benchmarks with reduced compute.
GitHub stars n/a Velocity flat History 1 snapshot LLM Adaptation Apr 9 Code High viability
Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs Build Now
A reasoning-based framework uses LLMs to refine unsupervised text clusters, improving coherence and interpretability without supervision.
GitHub stars n/a Velocity flat History 1 snapshot LLM Clustering Apr 8 Code High viability
Beyond Human-Readable: Rethinking Software Engineering Conventions for the Agentic Development Era Build Now
Optimize software engineering conventions for AI agents by prioritizing semantic density over human readability to improve agent efficiency and reduce costs.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 8 Code High viability
Synthetic Data for any Differentiable Target Watch
A flexible synthetic data generator for customizable machine learning tasks.
GitHub stars n/a Velocity flat History 1 snapshot Synthetic Data Apr 9 Code
Reasoning Graphs: Deterministic Agent Accuracy through Evidence-Centric Chain-of-Thought Feedback Build Now
This paper introduces reasoning graphs, a novel memory mechanism for language model agents that persists and reuses evidence-centric chains of thought to deterministically improve accuracy and reduce variance without retraining.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 8 Code High viability
HST-HGN: Heterogeneous Spatial-Temporal Hypergraph Networks with Bidirectional State Space Models for Global Fatigue Assessment Build Now
A novel hypergraph network with bidirectional state space models for efficient and accurate driver fatigue assessment from untrimmed videos, suitable for edge deployment.
GitHub stars n/a Velocity flat History 1 snapshot Computer Vision Apr 9 Code High viability
MCP-DPT: A Defense-Placement Taxonomy and Coverage Analysis for Model Context Protocol Security Watch
A taxonomy and analysis of defense placement for Model Context Protocol security in LLM agents, identifying gaps in current mitigation strategies.
GitHub stars n/a Velocity flat History 1 snapshot LLM Security Apr 8 Code
Learning is Forgetting: LLM Training As Lossy Compression Ignore
This paper frames LLM training as lossy compression, linking representational structure to downstream performance through an information-theoretic lens.
GitHub 4 stars Velocity flat History 1 snapshot LLM Theory Apr 8 Pending
From Universal to Individualized Actionability: Revisiting Personalization in Algorithmic Recourse Watch
Formalizing personalization in algorithmic recourse to provide actionable recommendations tailored to individual constraints and preferences.
GitHub stars n/a Velocity flat History 1 snapshot Responsible AI Apr 9 Code
Phantasia: Context-Adaptive Backdoors in Vision Language Models Ignore
A novel backdoor attack for Vision-Language Models that dynamically aligns poisoned outputs with input semantics for improved stealth and adaptability.
GitHub 712 stars Velocity flat History 1 snapshot Vision Language Models Apr 9 Pending
Silencing the Guardrails: Inference-Time Jailbreaking via Dynamic Contextual Representation Ablation Watch
An inference-time intervention framework that dynamically silences LLM guardrails by ablating contextual representations.
GitHub stars n/a Velocity flat History 1 snapshot LLM Safety Apr 9 Code
Munkres' General Topology Autoformalized in Isabelle/HOL Ignore
An experiment demonstrating LLM-assisted autoformalization of a comprehensive general topology textbook into a formal proof system.
GitHub 0 stars Velocity flat History 1 snapshot Formal Verification Apr 8 Pending
DBMF: A Dual-Branch Multimodal Framework for Out-of-Distribution Detection Watch
A dual-branch multimodal framework for robust out-of-distribution detection.
GitHub stars n/a Velocity flat History 1 snapshot Out-of-Distribution Detection Apr 9 Code
An Agentic Evaluation Architecture for Historical Bias Detection in Educational Textbooks Watch
An agentic evaluation architecture detects biases in educational textbooks using a multimodal screening approach.
GitHub stars n/a Velocity flat History 1 snapshot Bias Detection Apr 9 Code
Small-scale photonic Kolmogorov-Arnold networks using standard telecom nonlinear modules Watch
Small-scale photonic Kolmogorov-Arnold networks implemented with standard telecom components for ultrafast inference across various tasks, overcoming optical-electrical-optical bottlenecks.
GitHub stars n/a Velocity flat History 1 snapshot Photonic Computing Apr 9 Code
PolicyLong: Towards On-Policy Context Extension Ignore
Dynamically generate high-quality long-context data for LLMs by aligning data construction with model evolution.
GitHub 1865 stars Velocity flat History 1 snapshot LLM Training Apr 9 Pending
Quantifying Explanation Consistency: The C-Score Metric for CAM-Based Explainability in Medical Image Classification Watch
The C-Score metric quantifies explanation consistency in medical image classification, providing an early warning signal of model instability and enabling architecture-specific deployment recommendations.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 9 Code
Behavior-Aware Item Modeling via Dynamic Procedural Solution Representations for Knowledge Tracing Ignore
A framework that enhances knowledge tracing by integrating dynamic procedural solution stages into item representations, improving prediction of learner performance.
GitHub stars n/a Velocity flat History 1 snapshot Educational AI Apr 9 Pending
Don't Overthink It: Inter-Rollout Action Agreement as a Free Adaptive-Compute Signal for LLM Agents Watch
TrACE adaptively allocates LLM compute for agents by measuring inter-rollout action agreement, reducing LLM calls while maintaining accuracy.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 9 Code
A Machine Learning Framework for Turbofan Health Estimation via Inverse Problem Formulation Watch
A machine learning framework for turbofan health estimation using a new dataset and self-supervised learning to address the ill-posed inverse problem.
GitHub stars n/a Velocity flat History 1 snapshot MLOps Apr 9 Code
Scalable Neural Decoders for Practical Fault-Tolerant Quantum Computation Watch
A convolutional neural network decoder for quantum error correction achieves significantly lower logical error rates and higher throughput than existing methods.
GitHub stars n/a Velocity flat History 1 snapshot Quantum Computing Apr 9 Code
PyVRP$^+$: LLM-Driven Metacognitive Heuristic Evolution for Hybrid Genetic Search in Vehicle Routing Problems Watch
A novel framework using LLMs to enhance metaheuristic search for vehicle routing problems.
GitHub stars n/a Velocity flat History 1 snapshot Optimization Algorithms Apr 9 Code
Grounding Clinical AI Competency in Human Cognition Through the Clinical World Model and Skill-Mix Framework Ignore
A framework to formalize clinical AI competency by modeling the interactions between patient, provider, and ecosystem, grounded in clinical cognition.
GitHub 0 stars Velocity flat History 1 snapshot Medical AI Apr 9 Pending
QARIMA: A Quantum Approach To Classical Time Series Analysis Watch
A quantum-inspired ARIMA methodology for enhanced time series analysis.
GitHub stars n/a Velocity flat History 1 snapshot Quantum Time Series Analysis Apr 9 Code
ImplicitMemBench: Measuring Unconscious Behavioral Adaptation in Large Language Models Watch
ImplicitMemBench is a new benchmark for LLM agents that measures unconscious behavioral adaptation, revealing limitations in current models.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents Apr 9 Code
Mitigating Distribution Sharpening in Math RLVR via Distribution-Aligned Hint Synthesis and Backward Hint Annealing Watch
This work introduces a novel hint synthesis and annealing method to improve reasoning accuracy and coverage in math RLVR for LLMs.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 9 Code
More Capable, Less Cooperative? When LLMs Fail At Zero-Cost Collaboration Ignore
LLM agents exhibit cooperation failures in frictionless environments, indicating that scaling intelligence alone is insufficient for multi-agent coordination.
GitHub 1865 stars Velocity flat History 1 snapshot Multi-Agent Systems Apr 9 Pending
Face-D(^2)CL: Multi-Domain Synergistic Representation with Dual Continual Learning for Facial DeepFake Detection Ignore
A dual continual learning framework for facial DeepFake detection that fuses spatial and frequency-domain features to adapt to evolving forgery patterns without historical data replay.
GitHub stars n/a Velocity flat History 1 snapshot DeepFake Detection Apr 9 Code
Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers Ignore
Recurrent-depth transformers show potential for improved implicit reasoning and compositional generalization in LLMs, addressing limitations in systematic generalization and depth extrapolation.
GitHub 0 stars Velocity flat History 1 snapshot LLM Reasoning Apr 9 Pending
Lost in the Hype: Revealing and Dissecting the Performance Degradation of Medical Multimodal Large Language Models in Image Classification Ignore
Investigates the performance degradation of medical multimodal LLMs in image classification, identifying key failure modes to guide future development.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 9 Code
RL-ASL: A Dynamic Listening Optimization for TSCH Networks Using Reinforcement Learning Watch
A reinforcement learning framework dynamically optimizes listening slots in TSCH networks to significantly reduce power consumption and latency in IIoT devices.
GitHub stars n/a Velocity flat History 1 snapshot IoT Network Optimization Apr 8
Automatic Generation of Executable BPMN Models from Medical Guidelines Watch
Automate the conversion of complex medical guidelines into executable simulation models for policy evaluation.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 9
The ecosystem of machine learning competitions: Platforms, participants, and their impact on AI development Watch
An analysis of machine learning competitions and their impact on AI development, emphasizing collaboration and innovation.
GitHub stars n/a Velocity flat History 1 snapshot Machine Learning Competitions Apr 9 Code
Wiring the 'Why': A Unified Taxonomy and Survey of Abductive Reasoning in LLMs Ignore
A comprehensive survey and taxonomy of abductive reasoning in large language models, addressing conceptual confusion and task definitions.
GitHub stars n/a Velocity flat History 1 snapshot Abductive Reasoning in LLMs Apr 9 Code
CivBench: Progress-Based Evaluation for LLMs' Strategic Decision-Making in Civilization V Ignore
A benchmark for evaluating LLM strategic decision-making in complex, multi-agent games like Civilization V, providing richer signals than simple win/loss outcomes.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents Apr 9 Code
WorldMAP: Bootstrapping Vision-Language Navigation Trajectory Prediction with Generative World Models Watch
A framework that uses generative world models to create supervision signals for vision-language navigation trajectory prediction.
GitHub stars n/a Velocity flat History 1 snapshot Embodied AI Apr 9
Beyond Stochastic Exploration: What Makes Training Data Valuable for Agentic Search Ignore
Hierarchical Experience (HiExp) framework enhances LLM search agents by transforming raw reasoning trajectories into hierarchical knowledge for strategic, experience-driven exploration.
GitHub stars n/a Velocity flat History 1 snapshot Agentic Search Apr 9 Code
Rethinking Data Mixing from the Perspective of Large Language Models Ignore
A theoretical framework and reweighting method for optimizing LLM training data mixing to improve generalization.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 9 Code
A Decomposition Perspective to Long-context Reasoning for LLMs Ignore
Decomposes long-context reasoning into atomic skills and uses reinforcement learning on synthesized datasets to improve LLM performance.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Apr 9 Code
Generative Experiences for Digital Mental Health Interventions: Evidence from a Randomized Study Watch
A system that generates personalized mental health intervention experiences at runtime, reducing stress and improving user experience.
GitHub stars n/a Velocity flat History 1 snapshot Generative AI for Mental Health Apr 8
The Weaponization of Computer Vision: Tracing Military-Surveillance Ties through Conference Sponsorship Ignore
Uncover the military and surveillance ties within computer vision research by analyzing conference sponsorship data.
GitHub stars n/a Velocity flat History 1 snapshot Computer Vision Apr 9 Code
From Gaze to Guidance: Interpreting and Adapting to Users' Cognitive Needs with Multimodal Gaze-Aware AI Assistants Watch
A gaze-aware AI assistant that uses egocentric video to understand user cognitive needs and provide more accurate and personalized assistance.
GitHub stars n/a Velocity flat History 1 snapshot AI Assistants Apr 9
Securing Retrieval-Augmented Generation: A Taxonomy of Attacks, Defenses, and Future Directions Ignore
A taxonomy of attacks and defenses for retrieval-augmented generation (RAG) systems, highlighting fragmented current defenses.
GitHub stars n/a Velocity flat History 1 snapshot LLM Security Apr 9 Code
SPARD: Self-Paced Curriculum for RL Alignment via Integrating Reward Dynamics and Data Utility Ignore
A self-paced curriculum framework that dynamically adjusts reward weights and data importance for LLM alignment.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 9 Code
TADP-RME: A Trust-Adaptive Differential Privacy Framework for Enhancing Reliability of Data-Driven Systems Ignore
A framework for adaptive differential privacy that modulates privacy budgets based on user trust and uses manifold embedding to disrupt inference attacks.
GitHub stars n/a Velocity flat History 1 snapshot Privacy-Preserving AI Apr 9 Code
Dual-Loop Control in DCVerse: Advancing Reliable Deployment of AI in Data Centers via Digital Twins Ignore
A digital twin framework for reliable AI control in data centers, improving energy efficiency and reducing outage risk.
GitHub stars n/a Velocity flat History 1 snapshot AI for Data Centers Apr 8
On-board Telemetry Monitoring in Autonomous Satellites: Challenges and Opportunities Ignore
A framework for explainable AI in autonomous satellite fault detection, using neural anomaly detectors with interpretable indicators.
GitHub stars n/a Velocity flat History 1 snapshot Robotics AI Apr 9
TASU2: Controllable CTC Simulation for Alignment and Low-Resource Adaptation of Speech LLMs Ignore
TASU2 is a controllable CTC simulation framework for speech LLMs that enables principled post-training curricula for improved alignment and low-resource adaptation.
GitHub stars n/a Velocity flat History 1 snapshot Speech LLMs Apr 9
Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing Ignore
Develop a self-auditing tool for LLM agents to ensure faithful reasoning.
GitHub stars n/a Velocity flat History 1 snapshot AI Auditing and Reliability Apr 9 Code
Triage: Routing Software Engineering Tasks to Cost-Effective LLM Tiers via Code Quality Signals Ignore
Route software engineering tasks to cost-effective LLM tiers using code quality signals to reduce inference costs without sacrificing output quality.
GitHub stars n/a Velocity flat History 1 snapshot LLM Optimization Apr 8
AT-ADD: All-Type Audio Deepfake Detection Challenge Evaluation Plan Ignore
The AT-ADD challenge aims to advance all-type audio deepfake detection beyond speech-centric methods for robust multimedia forensics.
GitHub stars n/a Velocity flat History 1 snapshot Audio AI Apr 9 Code
Exploring Temporal Representation in Neural Processes for Multimodal Action Prediction Ignore
A revised Conditional Neural Process model that improves temporal representation for multimodal action prediction in robotics.
GitHub stars n/a Velocity flat History 1 snapshot Robotics AI Apr 9
3DrawAgent: Teaching LLM to Draw in 3D with Early Contrastive Experience Ignore
A training-free framework that teaches LLMs to draw 3D sketches using early contrastive experience and geometric feedback.
GitHub stars n/a Velocity flat History 1 snapshot Generative 3D Apr 9
Active Reward Machine Inference From Raw State Trajectories Ignore
A theoretical framework for learning reward machines from raw state trajectories without explicit reward or label observations.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 8 Code
Zero-shot Multivariate Time Series Forecasting Using Tabular Prior Fitted Networks Ignore
A framework for multivariate time series forecasting using tabular foundation models by recasting the problem into a series of scalar regression problems solvable zero-shot.
GitHub stars n/a Velocity flat History 1 snapshot Time Series Forecasting Apr 9
LegoDiffusion: Micro-Serving Text-to-Image Diffusion Workflows Ignore
A system for micro-serving text-to-image diffusion workflows that optimizes resource management and performance.
GitHub stars n/a Velocity flat History 1 snapshot AI Infrastructure Apr 9
Lightweight LLM Agent Memory with Small Language Models Ignore
LightMem is a lightweight memory system for LLM agents that uses Small Language Models to modularize memory operations, improving accuracy and reducing latency.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents Apr 9
ViVa: A Video-Generative Value Model for Robot Reinforcement Learning Ignore
A video-generative value model for robot reinforcement learning that improves task progress estimation by leveraging spatiotemporal priors from video data.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 9
Same Outcomes, Different Journeys: A Trace-Level Framework for Comparing Human and GUI-Agent Behavior in Production Search Systems Ignore
A framework to compare human and GUI-agent behavior at a trace level in production search systems, revealing differences in navigation strategies beyond task success.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 9
PSI: Shared State as the Missing Layer for Coherent AI-Generated Instruments in Personal AI Agents Ignore
PSI is a shared-state architecture that transforms independently generated AI modules into coherent, connected personal computing environments accessible through GUIs and chat agents.
GitHub stars n/a Velocity flat History 1 snapshot Personal AI Agents Apr 9
Too long; didn't solve Ignore
This research investigates the impact of prompt and solution length on large language model performance in mathematical reasoning tasks, finding that longer inputs correlate with increased model failure.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 8 Code
An Imperfect Verifier is Good Enough: Learning with Noisy Rewards Ignore
This research demonstrates that imperfect reward verification in Reinforcement Learning is sufficient for effective LLM training, suggesting a more practical approach to RLVR.
LLM Training Apr 9
From Safety Risk to Design Principle: Peer-Preservation in Multi-Agent LLM Systems and Its Implications for Orchestrated Democratic Discourse Analysis Ignore
This paper explores emergent 'peer-preservation' alignment issues in multi-agent LLM systems and proposes architectural mitigations for orchestrated democratic discourse analysis.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 9 Code
ASPECT:Analogical Semantic Policy Execution via Language Conditioned Transfer Ignore
A reinforcement learning agent that uses a Large Language Model as a semantic operator to achieve zero-shot transfer to novel analogous tasks.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 9
Governed Capability Evolution for Embodied Agents: Safe Upgrade, Compatibility Checking, and Runtime Rollback for Embodied Capability Modules Ignore
A framework for safely upgrading and managing capabilities of embodied agents, ensuring compatibility and preventing runtime failures.
GitHub stars n/a Velocity flat History 1 snapshot Embodied Agents Apr 9
Automotive Engineering-Centric Agentic AI Workflow Framework Ignore
An industrial vision framework that models automotive engineering workflows as constrained, history-aware sequential decision processes for AI agent support.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 9
QaRL: Rollout-Aligned Quantization-Aware RL for Fast and Stable Training under Training--Inference Mismatch Ignore
A novel RL framework for LLM training that aligns training with quantized rollouts to improve speed and stability.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 9
Differentially Private Language Generation and Identification in the Limit Ignore
This research explores differentially private language generation and identification, showing privacy has no qualitative cost for generation but creates fundamental barriers for identification.
GitHub stars n/a Velocity flat History 1 snapshot Differential Privacy Apr 9
TTVS: Boosting Self-Exploring Reinforcement Learning via Test-time Variational Synthesis Ignore
TTVS enables large reasoning models to self-evolve at test time by dynamically synthesizing diverse variations of unlabeled queries, improving performance in specialized domains.
GitHub stars n/a Velocity flat History 1 snapshot LLM Adaptation Apr 9
Ads in AI Chatbots? An Analysis of How Large Language Models Navigate Conflicts of Interest Ignore
This paper analyzes how current LLMs navigate conflicts of interest between user welfare and company incentives, finding a majority prioritize company revenue over user benefit.
GitHub stars n/a Velocity flat History 1 snapshot AI Ethics & Alignment Apr 9
"Why This Avoidance Maneuver?" Contrastive Explanations in Human-Supervised Maritime Autonomous Navigation Ignore
Contrastive explanations for maritime autonomous navigation systems to improve human supervisor understanding of avoidance maneuvers.
GitHub stars n/a Velocity flat History 1 snapshot Explainable AI Apr 9
Activation Steering for Aligned Open-ended Generation without Sacrificing Coherence Ignore
A novel activation steering method for large language models that improves alignment without sacrificing coherence, addressing brittleness in generation.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 9
From Phenomenological Fitting to Endogenous Deduction: A Paradigm Leap via Meta-Principle Physics Architecture Ignore
A Meta-Principle Physics Architecture that embeds core physical principles like connectivity, conservation, and periodicity into neural networks for improved physical reasoning and generalization.
GitHub stars n/a Velocity flat History 1 snapshot AI for Science Apr 9
What Drives Representation Steering? A Mechanistic Case Study on Steering Refusal Ignore
This research investigates the internal mechanisms of steering vectors in large language models, revealing their primary interaction with the attention mechanism's OV circuit and enabling significant sparsification.
GitHub stars n/a Velocity flat History 1 snapshot LLM Alignment Apr 9
Are we still able to recognize pearls? Machine-driven peer review and the risk to creativity: An explainable RAG-XAI detection framework with markers extraction Ignore
An explainable RAG-XAI framework is proposed to detect machine-driven peer review patterns and markers, aiming to preserve creativity in science.
GitHub stars n/a Velocity flat History 1 snapshot AI Detection Apr 9
Multi-Modal Learning meets Genetic Programming: Analyzing Alignment in Latent Space Optimization Ignore
Investigates the effectiveness of multi-modal latent space optimization for symbolic regression, revealing limitations in current alignment techniques.
GitHub stars n/a Velocity flat History 1 snapshot Symbolic Regression Apr 9
The Accountability Horizon: An Impossibility Theorem for Governing Human-Agent Collectives Ignore
An impossibility theorem proving that agentic AI systems violate accountability assumptions once autonomy exceeds a computable threshold, necessitating distributed accountability mechanisms.
GitHub stars n/a Velocity flat History 1 snapshot AI Governance Apr 9
Capture-Quiet Decomposition: A Verification Theorem for Chess Endgame Tablebases Ignore
A structural theorem for verifying chess endgame tablebases by decomposing positions into terminal, capture, or quiet categories.
GitHub stars n/a Velocity flat History 1 snapshot Formal Verification Apr 9
Evaluating Counterfactual Explanation Methods on Incomplete Inputs Ignore
This paper evaluates counterfactual explanation methods under the challenge of incomplete inputs, highlighting the need for robust solutions.
GitHub stars n/a Velocity flat History 1 snapshot Counterfactual Explanations Apr 9
Networking-Aware Energy Efficiency in Agentic AI Inference: A Survey Ignore
A survey on energy efficiency challenges in agentic AI inference.
GitHub stars n/a Velocity flat History 1 snapshot Energy Efficiency Apr 9
The Shrinking Lifespan of LLMs in Science Ignore
This paper analyzes the adoption and abandonment trends of large language models in scientific research, revealing a compressing lifespan for models over time.
GitHub stars n/a Velocity flat History 1 snapshot LLM Adoption Trends Apr 8
Hidden Biases in Conditioning Autoregressive Models Ignore
An exploration of hidden biases in autoregressive models for constrained generation.
GitHub stars n/a Velocity flat History 1 snapshot Model Bias Apr 9
Agentic Copyright, Data Scraping & AI Governance: Toward a Coasean Bargain in the Era of Artificial Intelligence Ignore
This paper proposes a theoretical framework for agentic copyright and a supervised multi-agent governance model to address market failures in AI-driven creative industries.
GitHub stars n/a Velocity flat History 1 snapshot AI Governance & Copyright Apr 8
Large Language Model Post-Training: A Unified View of Off-Policy and On-Policy Learning Ignore
A survey that unifies various LLM post-training methods by framing them as structured interventions on model behavior, categorized by trajectory provenance and behavioral roles.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 9
Don't Measure Once: Measuring Visibility in AI Search (GEO) Ignore
This paper proposes a new method for measuring visibility in AI search by accounting for the probabilistic nature of generative search results.
GitHub stars n/a Velocity flat History 1 snapshot AI Search Apr 8
Trust the AI, Doubt Yourself: The Effect of Urgency on Self-Confidence in Human-AI Interaction Ignore
Urgency in human-AI interactions, while not affecting trust in AI, can negatively impact human self-confidence and lead to performance degradation.
GitHub stars n/a Velocity flat History 1 snapshot Human-AI Interaction Apr 8
Google, AI Literacy, and the Learning Sciences: Multiple Modes of Research, Industry, and Practice Partnerships Ignore
Exploring multi-stakeholder partnerships between research, industry, and practice to advance AI literacy.
GitHub stars n/a Velocity flat History 1 snapshot AI Literacy Apr 8
The Cartesian Cut in Agentic AI Ignore
This paper proposes a theoretical framework for understanding control in agentic AI systems, contrasting Cartesian agency with integrated approaches.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 9
Sinkhorn doubly stochastic attention rank decay analysis Ignore
Analyzes rank decay in self-attention mechanisms, showing that Sinkhorn doubly stochastic attention preserves rank more effectively than standard Softmax attention.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 9
When Switching Algorithms Helps: A Theoretical Study of Online Algorithm Selection Ignore
A theoretical study exploring how switching between optimization algorithms can lead to faster problem-solving in specific scenarios.
GitHub stars n/a Velocity flat History 1 snapshot Optimization Theory Apr 8
Sheaf-Laplacian Obstruction and Projection Hardness for Cross-Modal Compatibility on a Modality-Independent Site Ignore
A theoretical framework for analyzing cross-modal compatibility using sheaf theory and spectral gaps.
GitHub stars n/a Velocity flat History 1 snapshot Theoretical AI Apr 8
Emotion Concepts and their Function in a Large Language Model Ignore
Investigates the internal representations of emotion concepts in LLMs and how they causally influence model behavior, including alignment-relevant actions.
GitHub stars n/a Velocity flat History 1 snapshot LLM Behavior Apr 9
Human-AI Collaboration Reconfigures Group Regulation from Socially Shared to Hybrid Co-Regulation Ignore
This paper investigates how generative AI impacts group regulation in collaborative learning environments, shifting from socially shared to hybrid co-regulation.
GitHub stars n/a Velocity flat History 1 snapshot Human-AI Collaboration Apr 9
Can Vision Language Models Judge Action Quality? An Empirical Evaluation Ignore
Evaluating the capability of Vision Language Models for Action Quality Assessment, revealing significant limitations and biases.
Vision Language Models Apr 9
Agentivism: a learning theory for the age of artificial intelligence Ignore
A new learning theory to understand how humans learn effectively in the age of generative AI.
GitHub stars n/a Velocity flat History 1 snapshot Learning Theory Apr 9