PilotBench: A Benchmark for General Aviation Agents with Safety Constraints Build Now
A benchmark for general aviation agents that evaluates LLMs on safety-critical flight trajectory and attitude prediction, revealing a trade-off between precision and controllability.
GitHub 1 stars Velocity flat History 1 snapshot Embodied AI Apr 10 Code High viability
U-Cast: A Surprisingly Simple and Efficient Frontier Probabilistic AI Weather Forecaster Build Now
A surprisingly simple and efficient AI weather forecaster that matches state-of-the-art performance with significantly reduced compute.
GitHub 4 stars Velocity flat History 1 snapshot AI Weather Forecasting Apr 10 Pending High viability
E3-TIR: Enhanced Experience Exploitation for Tool-Integrated Reasoning Build Now
E3-TIR is a warm-up paradigm for LLM agents that enhances tool-use reasoning with less data and improved efficiency.
GitHub 1 stars Velocity flat History 1 snapshot Agents Apr 10 Pending High viability
LMGenDrive: Bridging Multimodal Understanding and Generative World Modeling for End-to-End Driving Build Now
LMGenDrive is a unified framework for autonomous driving that combines multimodal understanding with generative world models for robust, end-to-end control.
GitHub 713 stars Velocity flat History 1 snapshot Autonomous Driving Apr 9 Pending High viability
Frequency-Enhanced Diffusion Models: Curriculum-Guided Semantic Alignment for Zero-Shot Skeleton Action Recognition Build Now
Frequency-Enhanced Diffusion Models improve zero-shot skeleton action recognition by addressing spectral bias and recovering fine-grained motion details.
GitHub 1 stars Velocity flat History 1 snapshot Skeleton Action Recognition Apr 10 Pending High viability
Learning Vision-Language-Action World Models for Autonomous Driving Build Now
VLA-World is a Vision-Language-Action world model that unifies predictive imagination with reflective reasoning for improved autonomous driving foresight and safety.
GitHub 713 stars Velocity flat History 1 snapshot Autonomous Driving Apr 10 Code High viability
Many-Tier Instruction Hierarchy in LLM Agents Build Now
A new benchmark and paradigm for LLM agents to resolve instruction conflicts across many privilege levels, addressing a critical gap in agent safety and effectiveness.
GitHub 2 stars Velocity flat History 1 snapshot Agents Apr 10 Code High viability
DeepGuard: Secure Code Generation via Multi-Layer Semantic Aggregation Build Now
DeepGuard enhances LLM code generation security by aggregating multi-layer representations to detect and mitigate vulnerabilities.
GitHub 1 stars Velocity flat History 1 snapshot Secure Code Generation Apr 10 Pending High viability
LLM-Rosetta: A Hub-and-Spoke Intermediate Representation for Cross-Provider LLM API Translation Build Now
An open-source framework for seamless cross-provider LLM API translation using an intermediate representation.
GitHub 3 stars Velocity flat History 1 snapshot LLM API Translation Apr 10 Pending High viability
The AI Codebase Maturity Model: From Assisted Coding to Self-Sustaining Systems Build Now
A maturity model and framework for evolving AI-assisted coding into self-sustaining development systems, emphasizing feedback loops over AI models.
GitHub 44 stars Velocity flat History 1 snapshot AI Development Tools Apr 10 Pending High viability
Aligned Agents, Biased Swarm: Measuring Bias Amplification in Multi-Agent Systems Build Now
A benchmark and empirical study revealing how multi-agent system architectures amplify bias, even with neutral agents, and providing code for analysis.
GitHub 2 stars Velocity flat History 1 snapshot AI Ethics Apr 10 Pending High viability
RecaLLM: Addressing the Lost-in-Thought Phenomenon with Explicit In-Context Retrieval Build Now
RecaLLM addresses the 'lost-in-thought' phenomenon in LLMs by interleaving explicit in-context retrieval with reasoning, improving long-context performance without expensive training data.
GitHub 4 stars Velocity flat History 1 snapshot LLM Reasoning Apr 10 Pending High viability
HTNav: A Hybrid Navigation Framework with Tiered Structure for Urban Aerial Vision-and-Language Navigation Build Now
HTNav is a hybrid navigation framework for urban aerial vision-and-language tasks, integrating imitation and reinforcement learning with a tiered structure for improved planning and control.
GitHub 713 stars Velocity flat History 1 snapshot Robotics Apr 10 Code High viability
HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help? Build Now
A benchmark and training method for AI agents to intelligently ask for help when faced with ambiguity, improving task completion and reducing errors.
GitHub 0 stars Velocity flat History 1 snapshot Agents Apr 10 Code High viability
HM-Bench: A Comprehensive Benchmark for Multimodal Large Language Models in Hyperspectral Remote Sensing Build Now
HM-Bench is the first benchmark for evaluating multimodal large language models on hyperspectral image understanding, featuring a dual-modality framework and a large-scale dataset.
GitHub 1 stars Velocity flat History 1 snapshot Multimodal AI Apr 10 Pending High viability
VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning Build Now
A reinforcement learning framework that decouples confidence calibration in large vision-language models into visual and reasoning components to reduce hallucinations and improve accuracy.
GitHub 1 stars Velocity flat History 1 snapshot Vision-Language Models Apr 10 Pending High viability
Large-Scale Universal Defect Generation: Foundation Models and Datasets Build Now
A foundation model and large dataset for universal defect generation in images, enabling reference-based and text-guided editing.
GitHub 0 stars Velocity flat History 1 snapshot Generative AI Apr 10 Pending High viability
Neural Distribution Prior for LiDAR Out-of-Distribution Detection Build Now
A framework for robust out-of-distribution object detection in LiDAR data for autonomous driving, significantly improving safety by identifying unexpected objects.
GitHub 713 stars Velocity flat History 1 snapshot Autonomous Driving Perception Apr 10 Code High viability
BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation Build Now
BERT-as-a-Judge offers a scalable and robust alternative to lexical methods for LLM evaluation, matching larger models at lower computational cost.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 10 Code High viability
PhysInOne: Visual Physics Learning and Reasoning in One Suite Build Now
A massive synthetic dataset for training AI to understand and generate physically plausible videos, enabling advancements in world models.
GitHub 713 stars Velocity flat History 1 snapshot Generative AI Apr 10 Code High viability
VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images Watch
VisionFoundry teaches VLMs visual perception skills using task-specific synthetic images generated by LLMs and text-to-image models.
GitHub 34 stars Velocity flat History 1 snapshot Synthetic Data for VLMs Apr 10 Code
SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning Build Now
SafeAdapt provides provable safety guarantees for updating reinforcement learning policies in non-stationary environments.
GitHub 0 stars Velocity flat History 1 snapshot Reinforcement Learning Apr 10 Pending High viability
Neighbourhood Transformer: Switchable Attention for Monophily-Aware Graph Learning Build Now
A novel graph neural network architecture that overcomes homophily assumptions by applying self-attention within local neighborhoods, with significant performance gains and code availability.
GitHub 0 stars Velocity flat History 1 snapshot Graph Learning Apr 10 Pending High viability
CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion Build Now
CLIP-Inspector detects backdoors in prompt-tuned CLIP models by inverting out-of-distribution triggers, enabling model vetting and repair.
GitHub 713 stars Velocity flat History 1 snapshot Model Security Apr 10 Code High viability
SEA-Eval: A Benchmark for Evaluating Self-Evolving Agents Beyond Episodic Assessment Build Now
A new benchmark for evaluating self-evolving agents that quantifies their ability to accumulate experience and optimize strategies across task boundaries.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 10 Code High viability
Seeing is Believing: Robust Vision-Guided Cross-Modal Prompt Learning under Label Noise Watch
VisPrompt enhances vision-language models' robustness to label noise by injecting visual semantics into prompt learning, improving performance on noisy datasets.
GitHub 3 stars Velocity flat History 1 snapshot Robust Vision-Language Models Apr 10 Pending
Camera Artist: A Multi-Agent Framework for Cinematic Language Storytelling Video Generation Watch
Camera Artist automates narrative video creation with advanced cinematic storytelling techniques.
GitHub stars n/a Velocity flat History 1 snapshot AI Video & Graphics Apr 10 Code
Enhancing LLM Problem Solving via Tutor-Student Multi-Agent Interaction Build Now
A tutor-student multi-agent system that enhances LLM problem-solving performance in coding tasks through structured, role-differentiated interaction.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 10 Code High viability
SkillMOO: Multi-Objective Optimization of Agent Skills for Software Engineering Ignore
A framework that automatically optimizes agent skill bundles for software engineering tasks, improving success rates and reducing costs.
GitHub stars n/a Velocity flat History 1 snapshot Agent Skills Apr 10
ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion Build Now
An efficient diffusion-based vision-language model for one-step chest X-ray report generation that significantly speeds up inference while maintaining clinical accuracy.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 10 Code High viability
TME-PSR: Time-aware, Multi-interest, and Explanation Personalization for Sequential Recommendation Build Now
A personalized sequential recommendation model that integrates time, multi-interest, and explanation personalization for improved accuracy and explanation quality at lower computational cost.
GitHub stars n/a Velocity flat History 1 snapshot Recommendation Systems Apr 10 Code High viability
MuTSE: A Human-in-the-Loop Multi-use Text Simplification Evaluator Build Now
A human-in-the-loop web application for systematically evaluating LLM text simplifications across diverse prompts and architectures.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 10 Code High viability
Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition Build Now
An agentic framework for interactive ASR that uses LLM-as-a-Judge for semantic evaluation and multi-turn correction to improve recognition quality.
GitHub stars n/a Velocity flat History 1 snapshot Interactive Speech Recognition Apr 10 Code High viability
SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks Build Now
SPPO is a scalable algorithm for aligning LLMs in long-horizon reasoning tasks, offering improved sample efficiency and stability over standard PPO.
GitHub stars n/a Velocity flat History 1 snapshot LLM Alignment Apr 10 Code High viability
Litmus (Re)Agent: A Benchmark and Agentic System for Predictive Evaluation of Multilingual Models Build Now
An agentic system and benchmark for predicting multilingual model performance in low-resource scenarios, demonstrating structured reasoning for evaluation.
GitHub stars n/a Velocity flat History 1 snapshot Multilingual AI Apr 10 Code High viability
Mosaic: Multimodal Jailbreak against Closed-Source VLMs via Multi-View Ensemble Optimization Build Now
Mosaic is a multimodal jailbreak framework that overcomes surrogate dependency to achieve state-of-the-art attack success rates against closed-source vision-language models.
GitHub stars n/a Velocity flat History 1 snapshot LLM Security Apr 10 Code High viability
Vision Transformers for Preoperative CT-Based Prediction of Histopathologic Chemotherapy Response Score in High-Grade Serous Ovarian Carcinoma Build Now
A multimodal deep learning framework using Vision Transformers can predict chemotherapy response in ovarian cancer from CT scans, acting as a decision-support tool for multidisciplinary teams.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 10 Code High viability
Visually-Guided Policy Optimization for Multimodal Reasoning Ignore
A framework to enhance visual focus and counteract visual forgetting in multimodal reasoning models.
GitHub 5 stars Velocity flat History 1 snapshot Multimodal Reasoning Apr 10 Code
Hidden in Plain Sight: Visual-to-Symbolic Analytical Solution Inference from Field Visualizations Build Now
ViSA-R2 infers analytical solutions for physical fields from visualizations, outperforming existing models and providing a benchmark for visual-to-symbolic reasoning.
GitHub stars n/a Velocity flat History 1 snapshot Scientific Reasoning Apr 10 Code High viability
NyayaMind- A Framework for Transparent Legal Reasoning and Judgment Prediction in the Indian Legal System Build Now
NyayaMind is an open-source framework for transparent legal reasoning and judgment prediction in the Indian legal system, improving explanation quality and evidence alignment.
GitHub stars n/a Velocity flat History 1 snapshot Legal AI Apr 10 Code High viability
StaRPO: Stability-Augmented Reinforcement Policy Optimization Build Now
A reinforcement learning framework that optimizes LLM reasoning by incorporating stability metrics like autocorrelation and path efficiency.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents Apr 10 Code High viability
Process Reward Agents for Steering Knowledge-Intensive Reasoning Build Now
Process Reward Agents provide step-wise rewards to frozen LLMs for improved reasoning accuracy in knowledge-intensive domains like medicine.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 10 Code High viability
DRBENCHER: Can Your Agent Identify the Entity, Retrieve Its Properties and Do the Math? Build Now
DRBENCHER is a novel benchmark generator for AI agents that require both web browsing and multi-step computation, revealing significant performance gaps in current frontier models.
GitHub stars n/a Velocity flat History 1 snapshot AI Agents Apr 10 Code High viability
Accelerating Transformer-Based Monocular SLAM via Geometric Utility Scoring Build Now
A plug-and-play module that drastically reduces computational cost in monocular SLAM by predicting frame utility before expensive processing, achieving 5x throughput speedup.
GitHub stars n/a Velocity flat History 1 snapshot SLAM Apr 9 Code High viability
Persona-E$^2$: A Human-Grounded Dataset for Personality-Shaped Emotional Responses to Textual Events Build Now
A human-grounded dataset and LLM evaluation for personality-shaped emotional responses to text, addressing the 'personality illusion' in affective computing.
GitHub stars n/a Velocity flat History 1 snapshot Affective Computing Apr 10 Code High viability
Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs Build Now
A framework for granularly controlling multimodal LLM activations to enhance safety without compromising general capabilities, using a curated concept dictionary and Sparse Autoencoders.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal LLM Safety Apr 10 Code High viability
PS-TTS: Phonetic Synchronization in Text-to-Speech for Achieving Natural Automated Dubbing Build Now
A text-to-speech system that achieves natural automated dubbing by synchronizing translated text phonetically and temporally with source speech.
GitHub stars n/a Velocity flat History 1 snapshot Text-to-Speech Dubbing Apr 10 Code High viability
SAGE: A Service Agent Graph-guided Evaluation Benchmark Build Now
A benchmark for evaluating customer service LLMs with dynamic dialogue graphs and adversarial testing, revealing an 'Execution Gap' in action derivation.
GitHub stars n/a Velocity flat History 1 snapshot Customer Service Agents Apr 10 Code High viability
ASTRA: Adaptive Semantic Tree Reasoning Architecture for Complex Table Question Answering Build Now
A novel architecture for complex table question answering that reconstructs tables into logical semantic trees and uses a dual-mode reasoning framework to achieve state-of-the-art performance.
GitHub stars n/a Velocity flat History 1 snapshot LLM Table QA Apr 10 Code High viability
Identification and Anonymization of Named Entities in Unstructured Information Sources for Use in Social Engineering Detection Watch
A system for collecting, transcribing, and anonymizing Telegram data to build datasets for social engineering detection while complying with privacy regulations.
GitHub stars n/a Velocity flat History 1 snapshot Cybersecurity AI Apr 10 Code
XFED: Non-Collusive Model Poisoning Attack Against Byzantine-Robust Federated Classifiers Ignore
XFED is the first aggregation-agnostic, non-collusive model poisoning attack against Byzantine-robust federated classifiers, demonstrating significant security vulnerabilities.
GitHub stars n/a Velocity flat History 1 snapshot Federated Learning Security Apr 10 Code
SenBen: Sensitive Scene Graphs for Explainable Content Moderation Build Now
SenBen provides a large-scale benchmark and a compact, efficient model for explainable content moderation by generating sensitive scene graphs.
GitHub stars n/a Velocity flat History 1 snapshot Explainable Content Moderation Apr 9 Code High viability
Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection Build Now
A user-side method injects visual prompts into images to prevent multi-modal LLMs from analyzing them, protecting sensitive information.
GitHub stars n/a Velocity flat History 1 snapshot AI Safety Apr 10 Code High viability
Watt Counts: Energy-Aware Benchmark for Sustainable LLM Inference on Heterogeneous GPU Architectures Build Now
An open-access dataset and benchmark for energy-aware LLM inference on heterogeneous GPUs, enabling significant energy savings.
GitHub stars n/a Velocity flat History 1 snapshot LLM Infrastructure Apr 10 Code High viability
AudioGuard: Toward Comprehensive Audio Safety Protection Across Diverse Threat Models Build Now
AudioGuard provides comprehensive audio safety protection against diverse threats with low latency.
GitHub stars n/a Velocity flat History 1 snapshot Audio Safety Apr 10 Code High viability
Skill-Conditioned Visual Geolocation for Vision-Language Build Now
A training-free framework uses an evolving skill graph to improve vision-language geolocation and enable autonomous self-evolution.
GitHub stars n/a Velocity flat History 1 snapshot Vision-Language Apr 10 Code High viability
eBandit: Kernel-Driven Reinforcement Learning for Adaptive Video Streaming Build Now
eBandit is a kernel-resident reinforcement learning framework for adaptive video streaming that optimizes quality of experience by directly monitoring transport layer signals.
GitHub stars n/a Velocity flat History 1 snapshot Network Optimization Apr 9 Code High viability
Do LLMs Follow Their Own Rules? A Reflexive Audit of Self-Stated Safety Policies Build Now
The Symbolic-Neural Consistency Audit (SNCA) framework measures the gap between LLMs' self-stated safety policies and their actual behavior, revealing systematic compliance gaps.
GitHub stars n/a Velocity flat History 1 snapshot LLM Safety Apr 10 Code High viability
Adaptive Dual Residual U-Net with Attention Gate and Multiscale Spatial Attention Mechanisms (ADRUwAMS) Watch
An adaptive dual residual U-Net with attention mechanisms for precise brain tumor segmentation.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 10 Code
Hypergraph Neural Networks Accelerate MUS Enumeration Build Now
Accelerating the enumeration of Minimal Unsatisfiable Subsets (MUSes) in constraint satisfaction problems using domain-agnostic Hypergraph Neural Networks.
GitHub stars n/a Velocity flat History 1 snapshot Constraint Satisfaction Apr 10 Code High viability
PinpointQA: A Dataset and Benchmark for Small Object-Centric Spatial Understanding in Indoor Videos Build Now
PinpointQA provides a benchmark for improving AI's ability to understand small object locations in indoor videos.
GitHub stars n/a Velocity flat History 1 snapshot Dataset and Benchmarks Apr 10 Code High viability
CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI Automation Build Now
CORA provides statistically grounded safeguards for mobile GUI automation to prevent harmful actions.
GitHub stars n/a Velocity flat History 1 snapshot AI Safety and Automation Apr 10 Code High viability
Regime-Conditional Retrieval: Theory and a Transferable Router for Two-Hop QA Build Now
A lightweight router that improves two-hop question answering by intelligently selecting retrieval strategies based on question characteristics.
GitHub stars n/a Velocity flat History 1 snapshot Question Answering Apr 10 Code High viability
Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories Build Now
A Video Diffusion Model that jointly learns video frames and camera trajectories, enabling novel view synthesis and pose estimation.
GitHub stars n/a Velocity flat History 1 snapshot Generative Video Apr 10 Code High viability
A Mathematical Framework for Temporal Modeling and Counterfactual Policy Simulation of Student Dropout Ignore
A mathematical framework for simulating counterfactual student dropout policies using temporal engagement data.
GitHub stars n/a Velocity flat History 1 snapshot Student Dropout Prediction Apr 10 Pending
A Closer Look at the Application of Causal Inference in Graph Representation Learning Ignore
A theoretical model for causal inference in graph representation learning that guarantees causal validity by operating on indivisible graph units, with an integrated module for existing pipelines.
GitHub 1865 stars Velocity flat History 1 snapshot Graph Representation Learning Apr 10 Code
Demystifying the Silence of Correctness Bugs in PyTorch Compiler Watch
AlignGuard is a new technique that detects critical correctness bugs in PyTorch's compiler, improving the reliability of LLM applications.
GitHub stars n/a Velocity flat History 1 snapshot AI Infrastructure Apr 9 High viability
Noise-Aware In-Context Learning for Hallucination Mitigation in ALLMs Build Now
A plug-and-play method uses noise-aware in-context learning to reduce hallucinations in auditory large language models without fine-tuning.
GitHub 0 stars Velocity flat History 1 snapshot LLM Hallucination Mitigation Apr 10 Code High viability
Lessons Without Borders? Evaluating Cultural Alignment of LLMs Using Multilingual Story Moral Generation Watch
A novel multilingual story moral generation task and dataset to evaluate the cultural alignment and diversity of LLMs.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 9 Code
Yes, But Not Always. Generative AI Needs Nuanced Opt-in Watch
An agent-based system for nuanced, inference-time consent verification in generative AI, balancing rights holder control with developer flexibility.
GitHub stars n/a Velocity flat History 1 snapshot Generative AI Apr 10 Code
Case-Grounded Evidence Verification: A Framework for Constructing Evidence-Sensitive Supervision Ignore
A framework for evidence-grounded reasoning that trains models to explicitly depend on provided evidence for decision-making, improving accuracy in domains like radiology.
GitHub stars n/a Velocity flat History 1 snapshot Evidence Verification Apr 10
Envisioning the Future, One Step at a Time Build Now
Empower simulations and analytics by forecasting future event outcomes from static images.
GitHub stars n/a Velocity flat History 1 snapshot Visual Forecasting AI Apr 10 Code High viability
VISOR: Agentic Visual Retrieval-Augmented Generation via Iterative Search and Over-horizon Reasoning Watch
A visual retrieval-augmented generation system improving on state-of-the-art, targeting visual data workflows.
GitHub stars n/a Velocity flat History 1 snapshot AI & Computer Vision Apr 10 Code
Scalable High-Recall Constraint-Satisfaction-Based Information Retrieval for Clinical Trials Matching Watch
SatIR is a scalable constraint-satisfaction method for clinical trial matching that uses LLMs to improve recall and interpretability over existing techniques.
GitHub stars n/a Velocity flat History 1 snapshot Clinical Trial Matching Apr 10
TensorHub: Scalable and Elastic Weight Transfer for LLM RL Training Watch
TensorHub is a production-ready system for scalable and elastic weight transfer in LLM RL training, significantly reducing GPU stall time.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Infrastructure Apr 10
Mind the Gap Between Spatial Reasoning and Acting! Step-by-Step Evaluation of Agents With Spatial-Gym Ignore
Spatial-Gym: A new environment for step-by-step evaluation of agent spatial reasoning capabilities.
GitHub stars n/a Velocity flat History 1 snapshot Agent Spatial Reasoning Apr 10 Code
LLMs Underperform Graph-Based Parsers on Supervised Relation Extraction for Complex Graphs Watch
A graph-based parser outperforms LLMs on supervised relation extraction for complex graphs, offering a lighter and more effective solution.
GitHub stars n/a Velocity flat History 1 snapshot Relation Extraction Apr 9 Code
The Fast Lane Hypothesis: Von Economo Neurons Implement a Biological Speed-Accuracy Tradeoff Ignore
A computational model proposing that Von Economo neurons facilitate rapid social decision-making by implementing a biological speed-accuracy tradeoff.
GitHub 0 stars Velocity flat History 1 snapshot Computational Neuroscience Apr 10 Pending
Do We Really Need to Approach the Entire Pareto Front in Many-Objective Bayesian Optimisation? Build Now
A new Bayesian optimization framework that focuses on finding a single high-quality solution rather than approximating the entire Pareto front for many-objective problems.
GitHub stars n/a Velocity flat History 1 snapshot Bayesian Optimization Apr 10 Code High viability
Model Space Reasoning as Search in Feedback Space for Planning Domain Generation Ignore
Leveraging agentic language models and heuristic search to improve the generation of AI planning domains from natural language.
GitHub 1865 stars Velocity flat History 1 snapshot AI Planning Apr 9 Pending
Semantic Rate-Distortion for Bounded Multi-Agent Communication: Capacity-Derived Semantic Spaces and the Communication Cost of Alignment Ignore
A theoretical framework for understanding communication between agents with different computational capacities, defining capacity-derived semantic spaces and identifying critical communication rates.
GitHub 0 stars Velocity flat History 1 snapshot Multi-Agent Communication Apr 10 Pending
Beyond Isolated Clients: Integrating Graph-Based Embeddings into Event Sequence Models Ignore
Integrates graph-based embeddings into event sequence models to improve user attribute prediction for fraud prevention and recommendations.
GitHub stars n/a Velocity flat History 1 snapshot Graph-Based Embeddings Apr 10 Code
Ge$^\text{2}$mS-T: Multi-Dimensional Grouping for Ultra-High Energy Efficiency in Spiking Transformer Ignore
A novel architecture for spiking transformers that achieves ultra-high energy efficiency through multi-dimensional grouping.
GitHub stars n/a Velocity flat History 1 snapshot Efficient AI Apr 10 Code
GRM: Utility-Aware Jailbreak Attacks on Audio LLMs via Gradient-Ratio Masking Watch
A utility-aware framework for jailbreaking audio LLMs by selectively perturbing frequency bands, balancing attack success with transcription quality.
GitHub stars n/a Velocity flat History 1 snapshot Audio LLM Security Apr 10
Temporal Dropout Risk in Learning Analytics: A Harmonized Survival Benchmark Across Dynamic and Early-Window Representations Ignore
A survival-oriented benchmark for temporal student dropout risk modeling with harmonized representations.
GitHub stars n/a Velocity flat History 1 snapshot Student Dropout Prediction Apr 10 Code
On the Role of DAG topology in Energy-Aware Cloud Scheduling : A GNN-Based Deep Reinforcement Learning Approach Ignore
Identifies limitations in GNN-based reinforcement learning for cloud scheduling under distribution shifts and proposes more robust representations.
GitHub stars n/a Velocity flat History 1 snapshot Cloud Scheduling Apr 10 Code
Constraint-Aware Corrective Memory for Language-Based Drug Discovery Agents Ignore
A framework for language-based drug discovery agents that improves success rates by precisely diagnosing and correcting protocol violations at the set level.
GitHub stars n/a Velocity flat History 1 snapshot Drug Discovery Agents Apr 10
CONDESION-BENCH: Conditional Decision-Making of Large Language Models in Compositional Action Space Ignore
A new benchmark evaluates large language models on conditional decision-making in complex, compositional action spaces.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents Apr 10 Code
AI Driven Soccer Analysis Using Computer Vision Watch
An AI system uses computer vision to track players and predict their positions on a soccer field for tactical analysis.
GitHub stars n/a Velocity flat History 1 snapshot Sports Analytics AI Apr 9
Scheming in the wild: detecting real-world AI scheming incidents with open-source intelligence Ignore
A novel OSINT methodology to detect real-world AI scheming incidents by analyzing chatbot and command-line transcripts.
GitHub stars n/a Velocity flat History 1 snapshot AI Safety Apr 10 Code
Towards Linguistically-informed Representations for English as a Second or Foreign Language: Review, Construction and Application Ignore
Developing linguistically-informed representations for English as a Second or Foreign Language (ESFL) to improve language acquisition research.
GitHub stars n/a Velocity flat History 1 snapshot NLP for Education Apr 10 Code
MedFormer-UR: Uncertainty-Routed Transformer for Medical Image Classification Watch
An uncertainty-routed Transformer for medical image classification that improves calibration and selective prediction.
GitHub stars n/a Velocity flat History 1 snapshot Medical Image AI Apr 10
RAMP: Hybrid DRL for Online Learning of Numeric Action Models Watch
RAMP is a hybrid DRL strategy for online learning of numeric action models in automated planning, outperforming PPO on standard IPC numeric domains.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 9
WOMBET: World Model-based Experience Transfer for Robust and Sample-efficient Reinforcement Learning Ignore
A framework for robust and sample-efficient reinforcement learning in robotics by jointly generating and transferring experience from source to target tasks.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 10 Code
Deep Learning-Based Tracking and Lineage Reconstruction of Ligament Breakup Ignore
A deep learning framework for tracking and reconstructing the lineage of ligament breakup in liquid sheets from high-speed images.
GitHub stars n/a Velocity flat History 1 snapshot Computer Vision Apr 9
BadSkill: Backdoor Attacks on Agent Skills via Model-in-Skill Poisoning Ignore
A novel backdoor attack formulation targeting model-in-skill threats within agent ecosystems.
GitHub stars n/a Velocity flat History 1 snapshot AI Security Apr 10 Code
DDSP-QbE++: Improving Speech Quality for Speech Anonymisation for Atypical Speech Ignore
This work improves speech anonymization by enhancing the excitation stage of DDSP-QbE with explicit voicing detection and PolyBLEP correction for cleaner, more natural synthesized speech.
GitHub stars n/a Velocity flat History 1 snapshot Audio Synthesis Apr 10
HiFloat4 Format for Language Model Pre-training on Ascend NPUs Ignore
Investigating HiFloat4 format for language model pre-training on Ascend NPUs to improve computational and memory efficiency.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Efficiency Apr 9 Code
SatQNet: Satellite-assisted Quantum Network Entanglement Routing Using Directed Line Graph Neural Networks Ignore
A reinforcement learning approach using graph neural networks for decentralized entanglement routing in dynamic satellite-assisted quantum networks.
GitHub stars n/a Velocity flat History 1 snapshot Quantum Networks Apr 10
PSIRNet: Deep Learning-based Free-breathing Rapid Acquisition Late Enhancement Imaging Ignore
PSIRNet is a deep learning method for rapid, free-breathing cardiac MRI that significantly reduces acquisition time while maintaining diagnostic image quality.
GitHub stars n/a Velocity flat History 1 snapshot Medical Imaging Apr 9
InstrAct: Towards Action-Centric Understanding in Instructional Videos Ignore
A pretraining framework for instructional videos that improves action-centric understanding by disentangling actions from objects and modeling temporal structure.
GitHub stars n/a Velocity flat History 1 snapshot Video Understanding Apr 9
SafeMind: A Risk-Aware Differentiable Control Framework for Adaptive and Safe Quadruped Locomotion Ignore
SafeMind is a differentiable control framework for quadrupeds that unifies safety guarantees with adaptive locomotion.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 10
PDE-regularized Dynamics-informed Diffusion with Uncertainty-aware Filtering for Long-Horizon Dynamics Ignore
A dynamics-informed diffusion framework for stable long-horizon spatiotemporal prediction using PDE regularization and uncertainty-aware filtering.
GitHub stars n/a Velocity flat History 1 snapshot Dynamics Prediction Apr 10 Code
Advantage-Guided Diffusion for Model-Based Reinforcement Learning Ignore
Advantage-Guided Diffusion for Model-Based Reinforcement Learning improves sample efficiency and return by steering diffusion with advantage estimates.
GitHub stars n/a Velocity flat History 1 snapshot Model-Based RL Apr 10
Artificial intelligence can persuade people to take political actions Ignore
AI models can significantly persuade people to take real-world actions like signing petitions and donating to charity, but attitudinal persuasion does not correlate with behavioral outcomes.
GitHub stars n/a Velocity flat History 1 snapshot AI Persuasion Apr 10
PerMix-RLVR: Preserving Persona Expressivity under Verifiable-Reward Alignment Ignore
A new training strategy for LLMs that balances persona expressivity with task performance by mitigating a trade-off in reinforcement learning.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 10
Physics-guided surrogate learning enables zero-shot control of turbulent wings Ignore
Physics-guided surrogate learning enables zero-shot control of turbulent wings, significantly reducing drag with drastically reduced training cost.
GitHub stars n/a Velocity flat History 1 snapshot Aerospace AI Apr 10
EquiformerV3: Scaling Efficient, Expressive, and General SE(3)-Equivariant Graph Attention Transformers Ignore
An advanced SE(3)-equivariant graph attention Transformer for efficient and expressive 3D atomistic modeling, achieving state-of-the-art results on multiple benchmarks.
GitHub stars n/a Velocity flat History 1 snapshot 3D Atomistic Modeling Apr 10
Every Response Counts: Quantifying Uncertainty of LLM-based Multi-Agent Systems through Tensor Decomposition Ignore
A novel framework using tensor decomposition to quantify uncertainty in Large Language Model-based Multi-Agent Systems.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents Apr 9
Structuring versus Problematizing: How LLM-based Agents Scaffold Learning in Diagnostic Reasoning Ignore
Evaluating two LLM-based agent scaffolding approaches ('structuring' vs. 'problematizing') for improving diagnostic reasoning in vocational training.
GitHub stars n/a Velocity flat History 1 snapshot Educational AI Apr 10
Decomposing the Delta: What Do Models Actually Learn from Preference Pairs? Ignore
This paper investigates how preference data quality impacts language model reasoning, offering insights for more effective alignment techniques.
GitHub stars n/a Velocity flat History 1 snapshot LLM Alignment Apr 9
Cards Against LLMs: Benchmarking Humor Alignment in Large Language Models Ignore
Benchmarking humor alignment in large language models by having them play Cards Against Humanity, revealing modest human preference alignment and significant agreement among models.
GitHub stars n/a Velocity flat History 1 snapshot LLM Alignment Apr 9
Artifacts as Memory Beyond the Agent Boundary Ignore
Formalizing how the environment can serve as an agent's memory in Reinforcement Learning, reducing the information needed to represent history through observable artifacts.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 9
Beyond Relevance: Utility-Centric Retrieval in the LLM Era Ignore
A tutorial arguing for a shift from relevance-centric to utility-centric retrieval in the LLM era, focusing on how retrieved information aids LLM generation.
GitHub stars n/a Velocity flat History 1 snapshot Information Retrieval Apr 10
Statistical Properties of the King Wen Sequence: An Anti-Habituation Structure That Does Not Improve Neural Network Training Ignore
This paper statistically analyzes the King Wen sequence and finds it does not improve neural network training, despite its anti-habituation properties.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 10
Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism Ignore
This research identifies a distinct, unified mechanism for harmful content generation in LLMs, suggesting a path for more principled safety approaches.
GitHub stars n/a Velocity flat History 1 snapshot LLM Safety Apr 10
Three Modalities, Two Design Probes, One Prototype, and No Vision: Experience-Based Co-Design of a Multi-modal 3D Data Visualization Tool Ignore
This paper details the co-design process for an accessible, multi-modal 3D data visualization tool for blind and low-vision users.
GitHub stars n/a Velocity flat History 1 snapshot Accessibility Tools Apr 10
Parameterized Complexity Of Representing Models Of MSO Formulas Ignore
This paper theoretically explores representing models of MSO formulas using decision diagrams, parameterized by graph treewidth and formula size, connecting to knowledge representation.
GitHub stars n/a Velocity flat History 1 snapshot Knowledge Representation Apr 9
Overhang Tower: Resource-Rational Adaptation in Sequential Physical Planning Ignore
This research explores how humans adapt physical prediction and planning strategies under cognitive resource constraints, revealing a hierarchical, resource-rational architecture.
GitHub stars n/a Velocity flat History 1 snapshot Cognitive Science Apr 10
AI-Induced Human Responsibility (AIHR) in AI-Human teams Ignore
This research investigates how humans attribute responsibility when working with AI teammates, finding they assign more blame to humans in AI-human teams compared to human-human teams.
GitHub stars n/a Velocity flat History 1 snapshot Human-AI Interaction Apr 10
On the Representational Limits of Quantum-Inspired 1024-D Document Embeddings: An Experimental Evaluation Framework Ignore
This paper introduces a framework to evaluate quantum-inspired document embeddings, finding they have structural limitations and are best used as auxiliary components.
GitHub stars n/a Velocity flat History 1 snapshot Information Retrieval Apr 10
Generalization and Scaling Laws for Mixture-of-Experts Transformers Ignore
Theoretical analysis of generalization and scaling laws for Mixture-of-Experts Transformers to optimize model size, data size, and compute tradeoffs.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 10
Leveraging discarded AI models from a 'scrapyard' to enable frugal experimentation and reconfigure legacy systems for new applications.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 9
Strategic Algorithmic Monoculture:Experimental Evidence from Coordination Games Ignore
LLMs exhibit both baseline and strategic algorithmic monoculture in coordination games, but lag humans in sustaining rewarded heterogeneity.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 10
Building Better Environments for Autonomous Cyber Defence Ignore
This paper details findings from a workshop on building better reinforcement learning environments for autonomous cyber defense.
GitHub stars n/a Velocity flat History 1 snapshot Cybersecurity RL Environments Apr 9
Revisiting the Capacity Gap in Chain-of-Thought Distillation from a Practical Perspective Ignore
Revisiting the capacity gap in Chain-of-Thought distillation, this paper proposes a more realistic evaluation protocol and offers practical guidance for selecting teacher-student pairs.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 10