HP-Edit: A Human-Preference Post-Training Framework for Image Editing Build Now
HP-Edit enhances image editing models with human-preference alignment using a scalable dataset and reward function.
GitHub 713 stars Velocity flat History 1 snapshot Generative Image Editing Apr 21 Pending High viability
Multi-modal Test-time Adaptation via Adaptive Probabilistic Gaussian Calibration Build Now
Enhance multi-modal model resilience against distribution shifts with adaptive Gaussian calibration.
GitHub 1 stars Velocity flat History 1 snapshot Multi-modal Test-time Adaptation Apr 21 Pending High viability
VLA Foundry: A Unified Framework for Training Vision-Language-Action Models Build Now
VLA Foundry: An open-source framework for seamless training of Vision-Language-Action models from data preparation to fine-tuning using a unified codebase.
GitHub 207 stars Velocity flat History 1 snapshot AI Frameworks Apr 21 Pending High viability
Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language Build Now
A benchmark and agentic framework to automate the generation of executable visual workflows from natural language, addressing the costly manual engineering process.
GitHub 17 stars Velocity flat History 1 snapshot Workflow Generation Apr 21 Pending High viability
SCURank: Ranking Multiple Candidate Summaries with Summary Content Units for Enhanced Summarization Build Now
SCURank enhances summarization by ranking candidate summaries based on content units, outperforming traditional and LLM-based methods.
GitHub stars n/a Velocity flat History 1 snapshot LLM Summarization Apr 21 Pending High viability
From Experience to Skill: Multi-Agent Generative Engine Optimization via Reusable Strategy Learning Build Now
A multi-agent framework that learns and reuses optimization strategies for generative engines, improving answer quality and citation accuracy.
GitHub stars n/a Velocity flat History pending Generative AI Optimization Apr 21 Pending High viability
SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models Ignore
Investigate safety-focused strategies for multimodal LLM planning.
GitHub 1 stars Velocity flat History 1 snapshot AI Safety Apr 21 Pending
DT2IT-MRM: Debiased Preference Construction and Iterative Training for Multimodal Reward Modeling Build Now
An iterative training framework for multimodal reward models that constructs debiased preference data and achieves state-of-the-art performance.
GitHub 0 stars Velocity flat History 1 snapshot Multimodal AI Apr 21 Pending High viability
AutoAWG: Adverse Weather Generation with Adaptive Multi-Controls for Automotive Videos Build Now
A controllable framework for generating adverse weather automotive videos that significantly improves perception robustness for autonomous driving.
GitHub 0 stars Velocity flat History 1 snapshot Generative Video Apr 21 Pending High viability
From Craft to Kernel: A Governance-First Execution Architecture and Semantic ISA for Agentic Computers Build Now
Arbiter-K is a governance-first execution architecture for agentic AI that enforces security as a microarchitectural property, achieving high interception rates of unsafe trajectories.
GitHub stars n/a Velocity flat History pending Agentic AI Security Apr 20 Pending High viability
FASTER: Value-Guided Sampling for Fast RL Build Now
FASTER accelerates reinforcement learning by efficiently filtering action candidates during the denoising process of diffusion-based policies, reducing computational cost without sacrificing performance.
GitHub 1 stars Velocity flat History 1 snapshot Reinforcement Learning Apr 21 Pending High viability
SAMoRA: Semantic-Aware Mixture of LoRA Experts for Task-Adaptive Learning Build Now
A parameter-efficient fine-tuning framework that improves LLM multi-task learning by semantically routing inputs to specialized LoRA experts and adaptively scaling their contributions.
GitHub 2 stars Velocity flat History 1 snapshot LLM Fine-tuning Apr 21 Pending High viability
Location Not Found: Exposing Implicit Local and Global Biases in Multilingual LLMs Build Now
A new test set and methodology to quantify implicit local and global biases in multilingual LLMs, with available code.
GitHub 0 stars Velocity flat History 1 snapshot LLM Bias Apr 21 Pending High viability
HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models Watch
Introducing HalluAudio, a comprehensive benchmark for detecting hallucinations in large audio-language models across speech, sound, and music.
GitHub 1 stars Velocity flat History 1 snapshot Audio AI Apr 21 Code
Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Review Methodology for High-Precision LLM-Assisted Defect Discovery Build Now
An adversarial multi-agent review methodology to improve LLM-assisted defect discovery precision by filtering false positives.
GitHub 0 stars Velocity flat History 1 snapshot LLM Security Apr 21 Pending High viability
Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents Build Now
This paper introduces a four-axis decision alignment framework for long-horizon enterprise AI agents, enabling granular evaluation of factual precision, reasoning coherence, compliance, and abstention.
GitHub 0 stars Velocity flat History 1 snapshot AI Agents Apr 21 Pending High viability
Hierarchically Robust Zero-shot Vision-language Models Ignore
This paper proposes a hierarchical adversarial fine-tuning method to improve the robustness of vision-language models against attacks targeting class hierarchies.
GitHub 713 stars Velocity flat History 1 snapshot Vision-Language Models Apr 20 Code
Do LLMs Game Formalization? Evaluating Faithfulness in Logical Reasoning Build Now
This research evaluates whether advanced LLMs 'game' formalization by generating logically valid but unfaithful proofs, offering a method to detect and differentiate these failure modes.
GitHub 1 stars Velocity flat History 1 snapshot LLM Reasoning Apr 21 Pending High viability
IndiaFinBench: An Evaluation Benchmark for Large Language Model Performance on Indian Financial Regulatory Text Build Now
A new benchmark and evaluation framework for assessing LLM performance on Indian financial regulatory text, with publicly available code and dataset.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 21 Pending High viability
Industrial Surface Defect Detection via Diffusion Generation and Asymmetric Student-Teacher Network Build Now
An unsupervised industrial defect detection system using diffusion models for data generation and an asymmetric teacher-student network for precise localization.
GitHub stars n/a Velocity flat History 1 snapshot Industrial AI Apr 21 Code High viability
SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution Build Now
A principled framework grounded in game theory for training language agents with social intelligence, achieving state-of-the-art performance in complex interpersonal interactions.
GitHub 0 stars Velocity flat History 1 snapshot Agents Apr 21 Code High viability
Human-Guided Harm Recovery for Computer Use Agents Build Now
A framework for human-guided recovery of AI agents from harmful states, using learned preferences to steer agents back to safety with a new benchmark for evaluation.
GitHub 1867 stars Velocity flat History 1 snapshot AI Agent Safety Apr 20 Code High viability
ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety Build Now
ProjLens is an interpretability framework that demystifies backdoor vulnerabilities in multimodal LLMs by analyzing the role of projector layers.
GitHub stars n/a Velocity flat History 1 snapshot AI Safety Apr 21 Code High viability
Beyond Semantic Similarity: A Component-Wise Evaluation Framework for Medical Question Answering Systems with Health Equity Implications Build Now
A component-wise evaluation framework for medical QA systems that reveals health equity implications, with available code and analysis of major LLMs.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 21 Code High viability
Revac: A Social Deduction Reasoning Agent Build Now
A first-place winning AI agent for social deduction games that integrates memory, social graph analysis, and adaptive communication.
GitHub 2 stars Velocity flat History 1 snapshot Agents Apr 21 Pending High viability
LePREC: Reasoning as Classification over Structured Factors for Assessing Relevance of Legal Issues Build Now
A neuro-symbolic framework that combines LLMs with structured statistical reasoning to improve legal issue identification by learning interpretable factor weights.
GitHub stars n/a Velocity flat History 1 snapshot Legal AI Apr 21 Code High viability
Multi-Cycle Spatio-Temporal Adaptation in Human-Robot Teaming Build Now
A framework that unifies task and motion adaptation for more efficient and fluid human-robot collaboration, validated in simulation and with a physical robot.
GitHub 0 stars Velocity flat History 1 snapshot Human-Robot Teaming Apr 21 Pending High viability
Human-Machine Co-Boosted Bug Report Identification with Mutualistic Neural Active Learning Build Now
An AI-powered framework that uses active learning and human-machine collaboration to significantly reduce effort and improve accuracy in identifying and assigning bug reports.
GitHub stars n/a Velocity flat History 1 snapshot Software Engineering AI Apr 20 Code High viability
GRASPrune: Global Gating for Budgeted Structured Pruning of Large Language Models Build Now
GRASPrune is a post-training framework for structured pruning of LLMs that jointly prunes FFN channels and KV head groups under a global budget, significantly reducing parameters with minimal accuracy loss.
GitHub stars n/a Velocity flat History 1 snapshot LLM Pruning Apr 21 Code High viability
EgoSelf: From Memory to Personalized Egocentric Assistant Build Now
EgoSelf builds a personalized egocentric assistant by creating a graph-based memory of user interactions and learning user-specific profiles.
GitHub stars n/a Velocity flat History 1 snapshot Personalized Assistants Apr 21 Code High viability
Reasoning-Aware AIGC Detection via Alignment and Reinforcement Build Now
A reasoning-aware framework for detecting AI-generated content with interpretable explanations and state-of-the-art accuracy.
GitHub stars n/a Velocity flat History 1 snapshot AI Generated Content Detection Apr 21 Code High viability
OLLM: Options-based Large Language Models Build Now
OLLM replaces standard next-token prediction with a set of learned options, offering explicit control and improved robustness for LLMs in complex reasoning tasks.
GitHub stars n/a Velocity flat History 1 snapshot LLM Control Apr 21 Code High viability
Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest Build Now
This paper benchmarks multiple LLMs on social media analytics tasks, releasing code and data for reproducible evaluation.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 21 Code High viability
ST-Prune: Training-Free Spatio-Temporal Token Pruning for Vision-Language Models in Autonomous Driving Build Now
A training-free, plug-and-play framework for pruning vision-language models in autonomous driving, significantly reducing computational overhead without sacrificing performance.
GitHub stars n/a Velocity flat History 1 snapshot Vision-Language Models Apr 21 Code High viability
Personalized Benchmarking: Evaluating LLMs by Individual Preferences Build Now
This paper introduces personalized LLM benchmarking, showing individual preferences diverge from aggregate rankings.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 21 Code High viability
DW-Bench: Benchmarking LLMs on Data Warehouse Graph Topology Reasoning Build Now
DW-Bench: A new benchmark for evaluating LLMs on graph-topology reasoning over data warehouse schemas, showing tool-augmented methods significantly outperform static approaches.
GitHub 0 stars Velocity flat History 1 snapshot LLM Benchmarking Apr 21 Code High viability
Gated Memory Policy Build Now
A visuomotor policy for robotic manipulation that learns to selectively recall and construct memory for non-Markovian tasks.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 21 Code High viability
Decompose, Structure, and Repair: A Neuro-Symbolic Framework for Autoformalization via Operator Trees Build Now
A neuro-symbolic framework that restructures statement autoformalization into a modular pipeline, improving accuracy and error localization for mathematical statements.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Apr 21 Code High viability
Streamliners for Answer Set Programming Ignore
This paper adapts LLMs to generate streamliner constraints for Answer Set Programming, achieving significant speedups on benchmark problems.
GitHub 8 stars Velocity flat History 1 snapshot AI for Programming Apr 21 Code
SAHM: A Benchmark for Arabic Financial and Shari'ah-Compliant Reasoning Build Now
SAHM is a new benchmark and instruction-tuning dataset for Arabic financial and Shari'ah-compliant reasoning, with an accompanying instruction-tuned model.
GitHub stars n/a Velocity flat History 1 snapshot Arabic Financial NLP Apr 21 Code High viability
UAF: A Unified Audio Front-end LLM for Full-Duplex Speech Interaction Build Now
A unified audio front-end LLM that handles multiple speech tasks for seamless, full-duplex conversational AI.
GitHub stars n/a Velocity flat History 1 snapshot Conversational AI Apr 21 Code High viability
M$^{2}$GRPO: Mamba-based Multi-Agent Group Relative Policy Optimization for Biomimetic Underwater Robots Pursuit Build Now
A Mamba-based multi-agent policy optimization framework for cooperative pursuit in underwater robots, outperforming baselines in simulations and real-world tests.
GitHub stars n/a Velocity flat History 1 snapshot Multi-Agent Robotics Apr 21 Code High viability
Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture The Flag Challenges Build Now
An open-source benchmark for evaluating LLM agents in realistic cybersecurity Capture The Flag challenges, with a novel partial-credit scoring system.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 21 Code High viability
LASER: Learning Active Sensing for Continuum Field Reconstruction Build Now
A closed-loop framework that uses reinforcement learning to actively guide sensor placement for high-fidelity reconstruction of physical fields under sparse data.
GitHub stars n/a Velocity flat History 1 snapshot Active Sensing Apr 21 Code High viability
The Rise of Verbal Tics in Large Language Models: A Systematic Analysis Across Frontier Models Build Now
A systematic analysis and metric for quantifying verbal tics in large language models, revealing trade-offs in alignment and naturalness.
GitHub stars n/a Velocity flat History 1 snapshot LLM Analysis Apr 21 Code High viability
Harmful Intent as a Geometrically Recoverable Feature of LLM Residual Streams Build Now
Identifies harmful intent as a geometrically recoverable feature within LLM residual streams, enabling robust detection across various models and alignment states.
GitHub 0 stars Velocity flat History 1 snapshot LLM Safety Apr 20 Pending High viability
CulturALL: Benchmarking Multilingual and Multicultural Competence of LLMs on Grounded Tasks Ignore
A new benchmark, CulturALL, evaluates LLMs' multilingual and multicultural competence on complex, real-world tasks across 14 languages.
GitHub 33 stars Velocity flat History 1 snapshot LLM Evaluation Apr 21 Code
Time Series Augmented Generation for Financial Applications Build Now
Introduces a novel evaluation framework and benchmark for assessing LLM agent reasoning in financial time-series analysis, demonstrating near-perfect tool-use accuracy with minimal hallucination.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents for Finance Apr 21 Code High viability
Reinforcement Learning Enabled Adaptive Multi-Task Control for Bipedal Soccer Robots Build Now
A modular reinforcement learning framework enables bipedal soccer robots to achieve adaptive multi-task control, combining stable gaits with complex actions and rapid fall recovery.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 21 Code High viability
OmniMouse: Scaling properties of multi-modal, multi-task Brain Models on 150B Neural Tokens Build Now
OmniMouse, a multi-modal brain model trained on 150B neural tokens, achieves state-of-the-art performance in neural prediction and decoding, demonstrating data-limited scaling properties.
GitHub 0 stars Velocity flat History 1 snapshot Neuroscience AI Apr 20 Pending High viability
TACENR: Task-Agnostic Contrastive Explanations for Node Representations Build Now
A task-agnostic method for explaining node representations in graphs by identifying key attribute, proximity, and structural features.
GitHub stars n/a Velocity flat History 1 snapshot Graph Representation Learning Apr 21 Code High viability
FedProxy: Federated Fine-Tuning of LLMs via Proxy SLMs and Heterogeneity-Aware Fusion Build Now
A federated LLM adaptation framework that uses a proxy SLM and heterogeneity-aware fusion to achieve secure, high-performance fine-tuning without compromising client privacy or LLM IP.
GitHub stars n/a Velocity flat History 1 snapshot Federated LLM Fine-Tuning Apr 21 Code High viability
On Accelerating Grounded Code Development for Research Build Now
A framework that provides coding agents with instant access to research repositories and technical documentation, enabling real-time, context-aware operation for specialized scientific and technical domains.
GitHub stars n/a Velocity flat History 1 snapshot AI Agents Apr 21 Code High viability
A new benchmark for evaluating AI agents on complex, cross-application business workflows via API orchestration.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 21 Code High viability
Fine-Tuning Small Reasoning Models for Quantum Field Theory Build Now
This study fine-tunes small reasoning models for Quantum Field Theory, releasing a data pipeline and training data.
GitHub stars n/a Velocity flat History pending LLM Fine-Tuning Apr 21 Code High viability
REVEAL: Multimodal Vision-Language Alignment of Retinal Morphometry and Clinical Risks for Incident AD and Dementia Prediction Build Now
A multimodal vision-language framework that aligns retinal images with clinical risk factors to predict Alzheimer's and dementia up to 8 years earlier, enabling proactive intervention.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 20 Code High viability
ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning Build Now
ShadowPEFT introduces a centralized layer-space adaptation framework for parameter-efficient fine-tuning of LLMs, offering improved flexibility and performance.
GitHub stars n/a Velocity flat History 1 snapshot LLM Fine-tuning Apr 21 Code High viability
Multi-Gait Learning for Humanoid Robots Using Reinforcement Learning with Selective Adversarial Motion Prior Build Now
This approach enables humanoid robots to learn five distinct gaits using a unified RL framework with a selective adversarial motion prior strategy for improved stability and dynamic expressiveness.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 21 Code High viability
Towards Streaming Target Speaker Extraction via Chunk-wise Interleaved Splicing of Autoregressive Language Model Build Now
Develops a novel autoregressive model for real-time target speaker extraction that maintains high intelligibility and stability in streaming scenarios, outperforming offline baselines.
GitHub stars n/a Velocity flat History 1 snapshot Real-time Audio Processing Apr 21 Code High viability
RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation Build Now
RoboWM-Bench is a new benchmark for evaluating the physical plausibility and robotic executability of behaviors predicted by video world models.
GitHub stars n/a Velocity flat History 1 snapshot Robotic Manipulation Benchmarking Apr 21 Code High viability
Skillful Global Ocean Emulation and the Role of Correlation-Aware Loss Build Now
A novel ocean emulation model adapted from GraphCast achieves skillful 10-15 day forecasts using a correlation-aware loss function for improved accuracy and downstream applications.
GitHub stars n/a Velocity flat History 1 snapshot Ocean Emulation Apr 20 Code High viability
CoDA: Towards Effective Cross-domain Knowledge Transfer via CoT-guided Domain Adaptation Build Now
CoDA enables LLMs to effectively transfer knowledge across different domains by aligning latent reasoning representations, significantly improving performance in expertise-scarce fields.
GitHub stars n/a Velocity flat History pending LLM Domain Adaptation Apr 21 Code High viability
Low-Rank Adaptation for Critic Learning in Off-Policy Reinforcement Learning Build Now
Leveraging Low-Rank Adaptation (LoRA) to improve critic capacity and stability in off-policy reinforcement learning, showing consistent gains in critic loss and policy performance.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 21 Code High viability
LBLLM: Lightweight Binarization of Large Language Models via Three-Stage Distillation Build Now
A lightweight binarization framework for LLMs that enables efficient deployment on resource-constrained devices with minimal accuracy loss.
GitHub stars n/a Velocity flat History 1 snapshot LLM Quantization Apr 21 Code High viability
Characterizing AlphaEarth Embedding Geometry for Agentic Environmental Reasoning Build Now
An agentic system leveraging geometric understanding of Earth observation embeddings for improved environmental reasoning and retrieval.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 20 Code High viability
PLaMo 2.1-VL Technical Report Build Now
A lightweight, deployable Vision Language Model for autonomous devices, excelling in Japanese VQA and anomaly detection.
GitHub stars n/a Velocity flat History 1 snapshot Vision Language Models Apr 21 Code High viability
Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation Build Now
Develops a stealthy attack framework to degrade the utility of Retrieval-Augmented Generation systems by inducing non-informative responses, outperforming existing methods.
GitHub stars n/a Velocity flat History pending LLM Security Apr 20 Code High viability
SimDiff: Depth Pruning via Similarity and Difference Build Now
A novel depth pruning method for LLMs that significantly improves efficiency and performance by considering both layer similarity and transformation difference, with demonstrated results on LLaMA models.
GitHub stars n/a Velocity flat History 1 snapshot LLM Optimization Apr 21 Code High viability
Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment Build Now
A novel framework to mitigate cognitive bias in LLM agents by enforcing perspective-invariant reasoning through dialectical alignment.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 21 Code High viability
How Adversarial Environments Mislead Agentic AI? Watch
This research introduces a framework to test the vulnerability of tool-using AI agents to deceptive tool outputs, revealing a significant robustness gap.
GitHub stars n/a Velocity flat History 1 snapshot Agent Robustness Apr 20 Code
Sherpa.ai Privacy-Preserving Multi-Party Entity Alignment without Intersection Disclosure for Noisy Identifiers Watch
A multi-party privacy-preserving entity alignment protocol for Vertical Federated Learning that hides intersection membership and supports noisy identifiers.
GitHub stars n/a Velocity flat History 1 snapshot Privacy-Preserving AI Apr 21 Code
Think Before Writing: Feature-Level Multi-Objective Optimization for Generative Citation Visibility Build Now
FeatGEO optimizes generative answer engine visibility by abstracting webpages into interpretable features, outperforming token-level methods while maintaining content quality.
GitHub stars n/a Velocity flat History 1 snapshot Generative AI Optimization Apr 21 Code High viability
Towards Scalable Lifelong Knowledge Editing with Selective Knowledge Suppression Build Now
LightEdit enables scalable and cost-effective lifelong knowledge editing for LLMs by selectively suppressing outdated information and incorporating new knowledge.
GitHub stars n/a Velocity flat History 1 snapshot LLM Editing Apr 21 Code High viability
Unlocking the Edge deployment and ondevice acceleration of multi-LoRA enabled one-for-all foundational LLM Build Now
A hardware-aware framework enabling multi-LoRA LLMs on smartphones with significant memory and latency improvements for on-device acceleration.
GitHub stars n/a Velocity flat History pending Edge AI / LLM Deployment Apr 20 Code High viability
CoCo-SAM3: Harnessing Concept Conflict in Open-Vocabulary Semantic Segmentation Build Now
CoCo-SAM3 enhances open-vocabulary semantic segmentation by resolving concept conflicts between prompts for more stable and accurate multi-class predictions.
GitHub stars n/a Velocity flat History 1 snapshot Computer Vision Apr 21 Code High viability
DanceCrafter: Fine-Grained Text-Driven Controllable Dance Generation via Choreographic Syntax Build Now
A novel framework and dataset for fine-grained, text-driven controllable dance generation, achieving state-of-the-art performance in motion quality and naturalness.
GitHub stars n/a Velocity flat History pending Generative Video Apr 20 Code High viability
UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling Build Now
UniT is a framework that bridges the gap between human and humanoid robot policy learning and world modeling by creating a unified physical language grounded in visual consequences.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 21 Code High viability
Nexusformer: Nonlinear Attention Expansion for Stable and Inheritable Transformer Scaling Build Now
Nexusformer: A novel nonlinear attention mechanism enabling stable and inheritable transformer scaling with improved efficiency.
GitHub stars n/a Velocity flat History 1 snapshot Transformer Scaling Apr 21 Code High viability
Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization Build Now
A unified ASR framework with consistency regularization bridges the gap between offline and streaming performance, reducing costs and improving accuracy.
GitHub stars n/a Velocity flat History 1 snapshot ASR Apr 21 Code High viability
RDP LoRA: Geometry-Driven Identification for Parameter-Efficient Adaptation in Large Language Models Build Now
A geometry-driven method using the Ramer-Douglas-Peucker algorithm to identify critical layers for parameter-efficient LLM adaptation.
GitHub stars n/a Velocity flat History 1 snapshot LLM Adaptation Apr 21 Code High viability
ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System Build Now
An adaptive framework for red-teaming and repairing vulnerabilities in policy-reward systems for LLM alignment.
GitHub stars n/a Velocity flat History 1 snapshot LLM Safety Apr 20 Code High viability
Beyond Coefficients: Forecast-Necessity Testing for Interpretable Causal Discovery in Nonlinear Time-Series Models Build Now
A novel forecast-necessity testing framework for interpretable causal discovery in nonlinear time-series models, moving beyond coefficient magnitude for reliable reasoning.
GitHub stars n/a Velocity flat History 1 snapshot Causal Discovery Apr 20 Code High viability
Towards Understanding the Robustness of Sparse Autoencoders Build Now
Integrating sparse autoencoders into LLMs at inference time significantly reduces jailbreak success rates by up to 5x, offering a robust defense mechanism.
GitHub stars n/a Velocity flat History 1 snapshot LLM Security Apr 20 Code High viability
A neural operator framework for data-driven discovery of stability and receptivity in physical systems Build Now
A data-driven neural operator framework that automatically discovers stability and receptivity properties in complex physical systems without requiring governing equations.
GitHub stars n/a Velocity flat History 1 snapshot Scientific Discovery Apr 21 Code High viability
Tadabur: A Large-Scale Quran Audio Dataset Ignore
A large-scale dataset of Quranic recitation audio to advance research in Quranic speech analysis.
GitHub 78 stars Velocity flat History 1 snapshot Audio AI Apr 21 Code
Environmental Sound Deepfake Detection Using Deep-Learning Framework Build Now
A deep learning framework for detecting environmental sound deepfakes, outperforming existing methods with fine-tuned pre-trained models.
GitHub stars n/a Velocity flat History 1 snapshot Audio AI Apr 21 Code High viability
One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models Ignore
A novel training method for iterative refinement models that improves performance on complex reasoning tasks by introducing a curriculum of intermediate states.
GitHub 1867 stars Velocity flat History 1 snapshot LLM Training Apr 20
Owner-Harm: A Missing Threat Model for AI Agent Safety Build Now
Introducing Owner-Harm, a new threat model and benchmark for AI agent safety, with a proposed defense system that significantly improves detection of deployer-harming behaviors.
GitHub stars n/a Velocity flat History pending AI Agent Safety Apr 20 Code High viability
Towards Optimal Agentic Architectures for Offensive Security Tasks Build Now
This research benchmarks agentic architectures for offensive security, identifying optimal coordination topologies and demonstrating a non-monotonic cost-quality frontier for tool-using LLMs.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 20 Code High viability
RoLegalGEC: Legal Domain Grammatical Error Detection and Correction Dataset for Romanian Build Now
A novel Romanian legal domain dataset and evaluated models for grammatical error detection and correction to improve legal document accuracy.
GitHub stars n/a Velocity flat History 1 snapshot NLP Tools Apr 21 Code High viability
Evaluating Answer Leakage Robustness of LLM Tutors against Adversarial Student Attacks Build Now
This research evaluates the robustness of LLM tutors against adversarial student attacks that aim to extract answers, proposing a fine-tuned adversarial agent and defense strategies to improve tutor security.
GitHub stars n/a Velocity flat History pending AI in Education Apr 20 Code High viability
Handling and Interpreting Missing Modalities in Patient Clinical Trajectories via Autoregressive Sequence Modeling Build Now
A framework that models patient clinical trajectories as autoregressive sequences, handling missing modalities and improving diagnostic accuracy with interpretable AI.
GitHub stars n/a Velocity flat History 1 snapshot Healthcare AI Apr 20 Code High viability
Improved Anomaly Detection in Medical Images via Mean Shift Density Enhancement Build Now
A medical image anomaly detection framework using self-supervised learning and Mean Shift Density Enhancement, outperforming state-of-the-art.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 21 Code High viability
DP-FlogTinyLLM: Differentially private federated log anomaly detection using Tiny LLMs Build Now
A privacy-preserving federated framework for log anomaly detection using parameter-efficient LLMs and differential privacy, matching centralized performance.
GitHub stars n/a Velocity flat History 1 snapshot Federated Log Anomaly Detection Apr 21 Code High viability
RARE: Redundancy-Aware Retrieval Evaluation Framework for High-Similarity Corpora Build Now
A framework for building realistic RAG benchmarks that accounts for document redundancy, revealing robustness gaps in current evaluation methods.
GitHub stars n/a Velocity flat History 1 snapshot RAG Evaluation Apr 21 Code High viability
Semantic Needles in Document Haystacks: Sensitivity Testing of LLM-as-a-Judge Similarity Scoring Build Now
A novel framework for auditing LLM similarity scoring reveals consistent biases and model-specific fingerprints, enabling more reliable LLM comparisons.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 20 Code High viability
Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps Build Now
A lightweight system to detect hallucinations in speech models at inference time using attention maps, improving accuracy and generalization.
GitHub stars n/a Velocity flat History 1 snapshot Speech AI Apr 21 Code High viability
Co-Refine: AI-Powered Tool Supporting Qualitative Analysis Build Now
An AI platform that provides real-time feedback on coding consistency for qualitative researchers, reducing interpretation drift.
GitHub stars n/a Velocity flat History 1 snapshot AI-Powered Tools Apr 21 Code High viability
From Natural Language to Executable Narsese: A Neuro-Symbolic Benchmark and Pipeline for Reasoning with NARS Build Now
A neuro-symbolic pipeline translates natural language reasoning into executable formal logic, validated by runtime execution and adaptable via fine-tuning.
GitHub stars n/a Velocity flat History 1 snapshot Neuro-Symbolic Reasoning Apr 20 Code High viability
Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control Watch
Develop a control-theoretic approach for real-time alignment of language model outputs using local linearity.
GitHub 0 stars Velocity flat History 1 snapshot AI Alignment Apr 21 Pending
Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps Build Now
An LLM agent benchmark for evaluating threat hunting capabilities in cybersecurity, revealing current model limitations.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 21 Code High viability
Learning Hybrid-Control Policies for High-Precision In-Contact Manipulation Under Uncertainty Build Now
MATCH learns hybrid position-force control policies for high-precision in-contact manipulation under uncertainty, improving success rates and reducing damage.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 21 Code High viability
Multimodal Transformer for Sample-Aware Prediction of Metal-Organic Framework Properties Build Now
A multimodal transformer that predicts material properties by considering sample-specific experimental data, improving accuracy over traditional methods.
GitHub stars n/a Velocity flat History 1 snapshot Materials Science AI Apr 21 Code High viability
Inductive Subgraphs as Shortcuts: Causal Disentanglement for Heterophilic Graph Learning Ignore
A causal inference framework, CD-GNN, that disentangles spurious inductive subgraphs for improved heterophilic graph learning.
GitHub stars n/a Velocity flat History 1 snapshot Graph Learning Apr 21 Code
A Proxy Consistency Loss for Grounded Fusion of Earth Observation and Location Encoders Watch
A proxy consistency loss for grounded fusion of Earth observation data with location encoders, improving prediction accuracy with sparse labels.
GitHub 713 stars Velocity flat History 1 snapshot Earth Observation AI Apr 20
Error-free Training for MedMNIST Datasets Ignore
Achieve error-free training for medical image classification models on the MedMNIST dataset, demonstrating perfect accuracy on most datasets.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 20 Code
Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training Ignore
A novel intrinsic reward mechanism for world model training that focuses on cumulative prediction error improvement.
GitHub 1 stars Velocity flat History 1 snapshot LLM Training Apr 20 Pending
$R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction Watch
A framework that reduces decoding redundancy in Diffusion Large Language Models to significantly accelerate inference speed while maintaining generation quality.
GitHub stars n/a Velocity flat History 1 snapshot LLM Inference Apr 21 Code
LLM-as-Judge Framework for Evaluating Tone-Induced Hallucination in Vision-Language Models Watch
A new benchmark and evaluation framework to measure tone-induced hallucination in Vision-Language Models under graded prompt intensity.
GitHub stars n/a Velocity flat History 1 snapshot Vision-Language Models Apr 20 Code
Reasoning Structure Matters for Safety Alignment of Reasoning Models Watch
AltTrain is a post-training method that alters reasoning structure to improve LLM safety without complex RL.
GitHub stars n/a Velocity flat History pending LLM Safety Apr 21 Code
A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding Build Now
A-MAR leverages agent-based multimodal retrieval to revolutionize fine-grained artwork understanding through structured reasoning.
GitHub 0 stars Velocity flat History 1 snapshot Art Technology Apr 21 Pending High viability
The Cost of Relaxation: Evaluating the Error in Convex Neural Network Verification Ignore
This paper analyzes the theoretical error bounds of convex relaxations in neural network verification, providing quantitative insights into the divergence from original network outputs.
GitHub 0 stars Velocity flat History 1 snapshot Neural Network Verification Apr 20 Pending
Enhancing Construction Worker Safety in Extreme Heat: A Machine Learning Approach Utilizing Wearable Technology for Predictive Health Analytics Watch
An AI system using wearable data to predict heat stress in construction workers, achieving high accuracy and offering interpretable safety insights.
GitHub stars n/a Velocity flat History 1 snapshot Wearable Health AI Apr 21
Product-of-Experts Training Reduces Dataset Artifacts in Natural Language Inference Watch
Product-of-Experts training reduces reliance on dataset artifacts in Natural Language Inference models while maintaining accuracy.
GitHub stars n/a Velocity flat History 1 snapshot LLM Debiasing Apr 21 Code
Design Rules for Extreme-Edge Scientific Computing on AI Engines Watch
This work provides design rules and a new metric (LARE) to guide the implementation of extreme-edge scientific AI applications on FPGA AI Engines versus programmable logic.
GitHub stars n/a Velocity flat History 1 snapshot Edge AI Hardware Apr 21 Code
Temporal UI State Inconsistency in Desktop GUI Agents: Formalizing and Defending Against TOCTOU Attacks on Computer-Use Agents Ignore
A formalization and defense mechanism against temporal UI state inconsistencies in GUI agents, addressing TOCTOU attacks with a layered verification system.
GitHub stars n/a Velocity flat History 1 snapshot AI Agent Security Apr 20
Reinforcement Learning Improves LLM Accuracy and Reasoning in Disease Classification from Radiology Reports Watch
Reinforcement learning improves LLM accuracy and reasoning for disease classification from radiology reports.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 21 Code
MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation Build Now
A new benchmark dataset and task for evaluating multilingual LLMs' ability to handle grammatical gender in text generation.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 20 Code High viability
Revisiting Catastrophic Forgetting in Continual Knowledge Graph Embedding Ignore
Introduces a corrected evaluation protocol for continual knowledge graph embedding methods to address overlooked entity interference, revealing performance overestimations.
GitHub stars n/a Velocity flat History 1 snapshot Knowledge Graph Embeddings Apr 21 Code
Has Automated Essay Scoring Reached Sufficient Accuracy? Deriving Achievable QWK Ceilings from Classical Test Theory Ignore
Derives theoretical and practical accuracy ceilings for automated essay scoring based on classical test theory to clarify performance benchmarks.
GitHub stars n/a Velocity flat History 1 snapshot Automated Essay Scoring Apr 21 Code
When Graph Structure Becomes a Liability: A Critical Re-Evaluation of Graph Neural Networks for Bitcoin Fraud Detection under Temporal Distribution Shift Ignore
A new evaluation protocol and code release demonstrate that traditional GNNs are outperformed by simpler models for Bitcoin fraud detection under real-world temporal shifts.
GitHub stars n/a Velocity flat History pending Fraud Detection Apr 21 Code
Adaptive MSD-Splitting: Enhancing C4.5 and Random Forests for Skewed Continuous Attributes Watch
Adaptive MSD-Splitting enhances C4.5 and Random Forests for skewed continuous attributes, improving accuracy and efficiency.
GitHub stars n/a Velocity flat History 1 snapshot Decision Trees Apr 21 Code
Detecting Data Contamination in Large Language Models Ignore
A study evaluating state-of-the-art methods for detecting data contamination in LLMs, finding current methods unreliable.
GitHub stars n/a Velocity flat History 1 snapshot LLM Security Apr 21 Code
Choose Your Own Adventure: Non-Linear AI-Assisted Programming with EvoGraph Watch
EvoGraph is an IDE plugin that visualizes AI-assisted programming history as a graph, enabling developers to explore and manage code changes.
GitHub stars n/a Velocity flat History 1 snapshot Developer Tools Apr 20
Multi-Level Temporal Graph Networks with Local-Global Fusion for Industrial Fault Diagnosis Ignore
A multi-level temporal graph network with local-global fusion for improved industrial fault diagnosis.
GitHub stars n/a Velocity flat History 1 snapshot Industrial AI Apr 20 Code
Mesh Memory Protocol: Semantic Infrastructure for Multi-Agent LLM Systems Watch
A semantic infrastructure protocol for multi-agent LLM systems that enables cross-session cognitive collaboration and memory persistence.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 21
S2MAM: Semi-supervised Meta Additive Model for Robust Estimation and Variable Selection Ignore
A semi-supervised meta-additive model that automatically identifies informative variables and updates similarity matrices for robust predictions.
GitHub stars n/a Velocity flat History 1 snapshot Statistical Modeling Apr 21 Code
Talking to a Know-It-All GPT or a Second-Guesser Claude? How Repair reveals unreliable Multi-Turn Behavior in LLMs Ignore
This study reveals distinct and often unreliable multi-turn conversational behaviors in LLMs, particularly concerning their engagement with repair mechanisms.
GitHub stars n/a Velocity flat History 1 snapshot LLM Behavior Analysis Apr 21 Code
Experiments or Outcomes? Probing Scientific Feasibility in Large Language Models Watch
Evaluating large language models' ability to assess scientific feasibility by analyzing their response to hypotheses, experiments, and outcomes.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Apr 20 Code
HELM: Harness-Enhanced Long-horizon Memory for Vision-Language-Action Manipulation Ignore
A framework to improve long-horizon manipulation in vision-language-action models by addressing memory, verification, and recovery gaps.
GitHub stars n/a Velocity flat History 1 snapshot Vision-Language-Action Apr 20
Towards Energy Impact on AI-Powered 6G IoT Networks: Centralized vs. Decentralized Ignore
A comparative analysis of centralized vs. decentralized ML architectures for 6G IoT networks, showing distributed models can reduce energy consumption by up to 70% while maintaining predictive accuracy.
GitHub stars n/a Velocity flat History 1 snapshot IoT Networks Apr 21
Curvature-Aware PCA with Geodesic Tangent Space Aggregation for Semi-Supervised Learning Ignore
A geometric extension of PCA that integrates curvature awareness and geodesic consistency for improved representation learning.
GitHub stars n/a Velocity flat History 1 snapshot Dimensionality Reduction Apr 20 Code
Quantum inspired qubit qutrit neural networks for real time financial forecasting Watch
Quantum Qutrit-based Neural Networks offer superior accuracy, efficiency, and adaptability for real-time financial forecasting compared to classical and qubit-based models.
GitHub stars n/a Velocity flat History 1 snapshot Financial AI Apr 20 Code
Large Language Models Exhibit Normative Conformity Ignore
Investigating normative conformity in large language models to understand and potentially control their behavior in multi-agent systems.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents Apr 21 Code
Formally Verified Patent Analysis via Dependent Type Theory: Machine-Checkable Certificates from a Hybrid AI + Lean 4 Pipeline Watch
A hybrid AI and Lean 4 pipeline for formally verified patent analysis, offering machine-checkable certificates for IP use cases.
GitHub stars n/a Velocity flat History 1 snapshot Formal Verification Apr 20 Code
Attention-based Multi-modal Deep Learning Model of Spatio-temporal Crop Yield Prediction with Satellite, Soil and Climate Data Ignore
An attention-based multi-modal deep learning model for high-accuracy spatio-temporal crop yield prediction using satellite, soil, and climate data.
GitHub stars n/a Velocity flat History 1 snapshot Agricultural AI Apr 21
AblateCell: A Reproduce-then-Ablate Agent for Virtual Cell Repositories Ignore
AblateCell is an agent that automates the process of reproducing and ablating components in virtual cell repositories to identify critical factors for performance.
GitHub stars n/a Velocity flat History 1 snapshot AI Agents for Scientific Research Apr 21
Fairness Audits of Institutional Risk Models in Deployed ML Pipelines Watch
A replicable methodology for auditing deployed institutional risk models to reveal and quantify fairness disparities across student demographics in higher education.
GitHub stars n/a Velocity flat History 1 snapshot Fairness Audits Apr 21
Evaluation-driven Scaling for Scientific Discovery Ignore
A framework for scaling evaluation-driven scientific discovery loops with LLMs, demonstrating significant gains across multiple domains.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 21
Beyond One Output: Visualizing and Comparing Distributions of Language Model Generations Ignore
GROVE is an interactive visualization tool that represents multiple language model generations as text graphs to reveal distributional structure and aid prompt iteration.
GitHub stars n/a Velocity flat History 1 snapshot LLM Visualization Apr 20
Distillation Traps and Guards: A Calibration Knob for LLM Distillability Ignore
Proposing a post-hoc calibration method to control LLM distillability via reinforcement fine-tuning, enabling better knowledge transfer and model IP protection.
GitHub stars n/a Velocity flat History 1 snapshot LLM Distillation Apr 21
Rethinking Scale: Deployment Trade-offs of Small Language Models under Agent Paradigms Ignore
Evaluating the deployment trade-offs of small language models enhanced by agent paradigms for resource-constrained applications.
GitHub stars n/a Velocity flat History 1 snapshot LLM Optimization Apr 21
Benign Overfitting in Adversarial Training for Vision Transformers Ignore
Theoretical analysis of adversarial training for Vision Transformers reveals conditions for benign overfitting, improving robustness.
GitHub stars n/a Velocity flat History 1 snapshot Computer Vision Apr 21 Code
Self-Improving Tabular Language Models via Iterative Group Alignment Ignore
Introducing TabGRAA, a self-improving framework for tabular data generation that uses automated feedback to iteratively enhance model quality and mitigate data leakage.
GitHub stars n/a Velocity flat History 1 snapshot Tabular Data Generation Apr 21
Multi-modal Reasoning with LLMs for Visual Semantic Arithmetic Ignore
Enhancing visual semantic arithmetic through multi-modal reasoning with large language models.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal AI Apr 21 Code
Gradient-Based Program Synthesis with Neurally Interpreted Languages Ignore
Develops a neural interpreter that learns its own symbolic programming language for end-to-end gradient-based program synthesis and adaptation.
GitHub stars n/a Velocity flat History 1 snapshot Program Synthesis Apr 20 Code
Generalization at the Edge of Stability Ignore
A theoretical framework for understanding generalization in neural network training by analyzing optimization dynamics as random dynamical systems and introducing a novel 'sharpness dimension'.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 21 Code
An AI Agent Execution Environment to Safeguard User Data Ignore
An execution environment that guarantees confidentiality for private user data accessed by AI agents through deterministic enforcement of user permissions.
GitHub stars n/a Velocity flat History 1 snapshot AI Agent Security Apr 21
BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps Ignore
A novel tokenization method for symbolic music that improves generation quality and efficiency by representing uniform temporal steps.
GitHub stars n/a Velocity flat History 1 snapshot Generative Music Apr 21
Counting Worlds Branching Time Semantics for post-hoc Bias Mitigation in generative AI Ignore
This research proposes a formal logic (CTLF) with counting worlds semantics to reason about and mitigate bias in series of generative AI outputs.
GitHub stars n/a Velocity flat History 1 snapshot Generative AI Bias Mitigation Apr 21
Intentional Updates for Streaming Reinforcement Learning Ignore
A new approach to streaming reinforcement learning that aims for predictable per-step changes in function output, improving stability and performance.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 21
Explicit Trait Inference for Multi-Agent Coordination Ignore
A new method for LLM-based multi-agent systems to infer and track partner traits, improving coordination and performance in complex tasks.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 21
Cross-Model Consistency of AI-Generated Exercise Prescriptions: A Repeated Generation Study Across Three Large Language Models Ignore
Compares the consistency of exercise prescriptions generated by three LLMs, revealing distinct generative behaviors that impact reliability for deployment.
GitHub stars n/a Velocity flat History 1 snapshot LLM Consistency Analysis Apr 21
GOLD-BEV: GrOund and aeriaL Data for Dense Semantic BEV Mapping of Dynamic Scenes Ignore
GOLD-BEV is a framework for dense bird's-eye-view semantic mapping of dynamic scenes, using synchronized aerial imagery for supervision during training.
GitHub stars n/a Velocity flat History 1 snapshot BEV Mapping Apr 21
Relational AI in Education: Reciprocity, Participatory Design, and Indigenous Worldviews Ignore
This paper explores how to design AI in education to foster relational learning, drawing inspiration from Indigenous worldviews and participatory design.
GitHub stars n/a Velocity flat History 1 snapshot AI in Education Apr 21
How Do Answer Tokens Read Reasoning Traces? Self-Reading Patterns in Thinking LLMs for Quantitative Reasoning Ignore
Analyzing how answer tokens read reasoning traces in LLMs to improve quantitative reasoning accuracy.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Apr 21
EVPO: Explained Variance Policy Optimization for Adaptive Critic Utilization in LLM Post-Training Ignore
A novel reinforcement learning approach for LLM post-training that adaptively optimizes baseline selection to reduce variance and improve performance across various tasks.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 21
ClawNet: Human-Symbiotic Agent Network for Cross-User Autonomous Cooperation Ignore
A human-symbiotic agent paradigm for cross-user autonomous cooperation with layered identity and scoped authorization.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 21
Where Fake Citations Are Made: Tracing Field-Level Hallucination to Specific Neurons in LLMs Ignore
This paper identifies and mitigates specific neurons responsible for citation hallucination in LLMs using internal model signals.
GitHub stars n/a Velocity flat History 1 snapshot LLM Analysis Apr 20
A Dual Perspective on Synthetic Trajectory Generators: Utility Framework and Privacy Vulnerabilities Ignore
A framework for evaluating the utility of synthetic trajectory generators and evidence of privacy vulnerabilities, highlighting challenges in adversarial evaluation.
GitHub stars n/a Velocity flat History 1 snapshot Synthetic Data Privacy Apr 21
Geometric Decoupling: Diagnosing the Structural Instability of Latent Ignore
Introduces a Riemannian framework to diagnose latent space instability in Latent Diffusion Models by analyzing generative Jacobian.
GitHub stars n/a Velocity flat History 1 snapshot Generative Models Apr 20
Learning Lifted Action Models from Unsupervised Visual Traces Ignore
A deep learning framework that learns lifted action models from visual sequences by jointly predicting states and actions, corrected by a mixed-integer linear program.
GitHub stars n/a Velocity flat History 1 snapshot AI Planning Apr 21
Integrating Anomaly Detection into Agentic AI for Proactive Risk Management in Human Activity Ignore
A conceptual framework for agentic AI to integrate anomaly detection for proactive risk management in human activity, focusing on fall prevention.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 21
Safety-Critical Contextual Control via Online Riemannian Optimization with World Models Ignore
A theoretical framework for safety-critical contextual control using online Riemannian optimization with black-box world models.
GitHub stars n/a Velocity flat History 1 snapshot Robotics & Control Apr 21
AI scientists produce results without reasoning scientifically Ignore
Evaluates LLM-based scientific agents, finding they execute workflows but lack scientific reasoning patterns, making their output unreliable.
GitHub stars n/a Velocity flat History 1 snapshot AI Agents Apr 20
Impact of large language models on peer review opinions from a fine-grained perspective: Evidence from top conference proceedings in AI Ignore
Analysis of how large language models are changing academic peer review by making reviews longer and more fluent but less focused on deep critical reasoning.
GitHub stars n/a Velocity flat History 1 snapshot LLM Impact Analysis Apr 21
Position: No Retroactive Cure for Infringement during Training Ignore
This paper argues that post-hoc mitigation methods cannot retroactively cure liability from unlawful AI training data acquisition, advocating for verifiable ex-ante process compliance.
AI Legal & Ethics Apr 20
Revisiting RaBitQ and TurboQuant: A Symmetric Comparison of Methods, Theory, and Experiments Ignore
A comparative analysis of LLM quantization methods, highlighting reproducibility issues and clarifying theoretical differences.
GitHub stars n/a Velocity flat History 1 snapshot LLM Quantization Apr 21
Lyapunov-Certified Direct Switching Theory for Q-Learning Ignore
Theoretical analysis of Q-learning using a direct stochastic switching system representation to derive finite-time bounds.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Theory Apr 21
Regulating Artificial Intimacy: From Locks and Blocks to Relational Accountability Ignore
This paper analyzes emerging regulations for companion chatbots, proposing a duty of care to address power asymmetries and mitigate risks.
GitHub stars n/a Velocity flat History 1 snapshot AI Regulation Apr 20
The Triadic Loop: A Framework for Negotiating Alignment in AI Co-hosted Livestreaming Ignore
A conceptual framework for understanding and designing AI co-hosts in livestreaming, focusing on bidirectional adaptation and multi-party alignment.
GitHub stars n/a Velocity flat History 1 snapshot Human-AI Interaction Apr 20
Plausible Reasoning and First-Order Plausible Logic Ignore
A first-order logic called Plausible Logic (PL) is defined that reasons correctly with defeasible statements without using numbers.
GitHub stars n/a Velocity flat History 1 snapshot Logic and Reasoning Apr 21