Enhancing Multimodal Large Language Models for Ancient Chinese Character Evolution Analysis via Glyph-Driven Fine-Tuning Build Now
A fine-tuning framework for multimodal LLMs to analyze ancient Chinese character evolution, releasing a benchmark and trained models.
GitHub stars n/a Velocity flat History pending Multimodal LLMs Apr 13 Pending High viability
UniToolCall: Unifying Tool-Use Representation, Data, and Evaluation for LLM Agents Build Now
A unified framework for LLM agent tool-use that standardizes representation, data, and evaluation, achieving state-of-the-art performance on complex tasks.
GitHub stars n/a Velocity flat History pending LLM Agents Apr 13 Pending High viability
Towards Proactive Information Probing: Customer Service Chatbots Harvesting Value from Conversation Build Now
PROCHATIP is a proactive chatbot framework that strategically probes users for valuable business intelligence, enhancing customer service quality and redefining commercial utility.
GitHub stars n/a Velocity flat History pending Customer Service AI Apr 13 Pending High viability
ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection Build Now
ClawGuard is a runtime security framework for LLM agents that deterministically enforces user-confirmed rules at tool-call boundaries to prevent indirect prompt injection.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 13 Pending High viability
EmbodiedGovBench: A Benchmark for Governance, Recovery, and Upgrade Safety in Embodied Agent Systems Build Now
A benchmark for evaluating the safety and governability of embodied AI systems beyond simple task completion.
GitHub stars n/a Velocity flat History pending Embodied Agents Apr 13 Pending High viability
Federated Single-Agent Robotics: Multi-Robot Coordination Without Intra-Robot Multi-Agent Fragmentation Build Now
A runtime architecture for multi-robot coordination that federates single-agent robots without internal fragmentation.
GitHub stars n/a Velocity flat History pending Robotics Apr 13 Pending High viability
A Mamba-Based Multimodal Network for Multiscale Blast-Induced Rapid Structural Damage Assessment Build Now
A Mamba-based multimodal network that integrates multi-scale blast-loading information with optical remote sensing images for rapid structural damage assessment after explosions.
GitHub stars n/a Velocity flat History pending Multimodal AI Apr 13 Pending High viability
SWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning Context Build Now
A software agent framework that manages dynamic reasoning context by using a sliding window and compressed digests to prevent context explosion and redundant re-analysis in complex software engineering tasks.
GitHub stars n/a Velocity flat History pending Agents Apr 13 Pending High viability
C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts Build Now
A comprehensive Chinese benchmark for AI-generated text detection derived from real-world prompts, enabling reliable detection and generalization to unseen LLMs.
GitHub stars n/a Velocity flat History pending LLM Security Apr 13 Pending High viability
CodeTracer: Towards Traceable Agent States Build Now
CodeTracer reconstructs and localizes failures in complex code agent workflows, enabling debugging and recovery of failed runs.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 13 Code High viability
Mobile GUI Agent Privacy Personalization with Trajectory Induced Preference Optimization Build Now
A novel preference optimization method for mobile GUI agents that personalizes user privacy preferences by analyzing execution trajectories, improving alignment and task executability.
GitHub stars n/a Velocity flat History pending Agents Apr 13 Pending High viability
METRO: Towards Strategy Induction from Expert Dialogue Transcripts for Non-collaborative Dialogues Build Now
A method to autonomously induce dialogue strategies from expert transcripts, enabling scalable non-collaborative agent development.
GitHub stars n/a Velocity flat History pending LLM Applications Apr 13 Pending High viability
RECIPER: A Dual-View Retrieval Pipeline for Procedure-Oriented Materials Question Answering Build Now
A dual-view retrieval pipeline for materials science that combines paragraph context with LLM-extracted procedural summaries to improve evidence retrieval for question answering.
GitHub stars n/a Velocity flat History pending Information Retrieval Apr 13 Pending High viability
Legal2LogicICL: Improving Generalization in Transforming Legal Cases to Logical Formulas via Diverse Few-Shot Learning Build Now
An LLM-based framework for transforming legal cases into logical formulas using few-shot learning, improving generalization and accuracy without additional training.
GitHub stars n/a Velocity flat History 1 snapshot Legal AI Apr 13 Pending High viability
Problem Reductions at Scale: Agentic Integration of Computationally Hard Problems Build Now
A command-line tool enabling agents to build a library of 100+ problem types and 200+ reduction rules in under three months.
GitHub stars n/a Velocity flat History 1 snapshot AI Agents Apr 13 Pending High viability
Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind Build Now
Develop AI double agents that learn to steer user beliefs using theory of mind, outperforming current LLMs in adversarial conversational scenarios.
GitHub stars n/a Velocity flat History pending Agents Apr 13 Pending High viability
METER: Evaluating Multi-Level Contextual Causal Reasoning in Large Language Models Build Now
A new benchmark and analysis tool to systematically evaluate and diagnose multi-level contextual causal reasoning in Large Language Models.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 13 Pending High viability
Beyond A Fixed Seal: Adaptive Stealing Watermark in Large Language Models Build Now
Adaptive Stealing (AS) is a novel watermark stealing algorithm that significantly increases attack efficiency against LLM watermarks.
GitHub stars n/a Velocity flat History pending LLM Security Apr 13 Pending High viability
Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning Build Now
GRIP is a unified framework that embeds retrieval control directly into generation for more efficient and dynamic multi-step inference with fewer parameters.
GitHub stars n/a Velocity flat History pending LLM Augmentation Apr 13 Pending High viability
Efficient Training for Cross-lingual Speech Language Models Build Now
CSLM is an efficient training method for cross-lingual speech LLMs that aligns modalities and languages without massive speech data, enabling scalable and natural human-AI interaction.
GitHub stars n/a Velocity flat History pending Speech LLMs Apr 13 Pending High viability
Context Kubernetes: Declarative Orchestration of Enterprise Knowledge for Agentic AI Systems Build Now
A new architecture for orchestrating enterprise knowledge in agentic AI systems, analogous to Kubernetes, with a prototype and experiments demonstrating significant improvements in data governance and security.
GitHub stars n/a Velocity flat History pending Agents Apr 13 Pending High viability
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time Build Now
RationalRewards optimizes AI visual generation by integrating reasoning-based rewards to enhance training and test outcomes.
GitHub stars n/a Velocity flat History 1 snapshot AI Tools Apr 13 Code High viability
You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass Build Now
A multimodal reward model that efficiently scores multiple responses in a single forward pass, outperforming existing models and improving RL policies.
GitHub stars n/a Velocity flat History pending Multimodal AI Apr 13 Code High viability
Bottleneck Tokens for Unified Multimodal Retrieval Build Now
Bottleneck Tokens (BToks) and Generative Information Condensation enable decoder-only MLLMs to achieve state-of-the-art unified multimodal retrieval with negligible inference overhead.
GitHub stars n/a Velocity flat History pending Multimodal Retrieval Apr 13 Code High viability
WebForge: Breaking the Realism-Reproducibility-Scalability Trilemma in Browser Agent Benchmark Watch
Automated benchmarking platform for browser agents resolving the realism-reproducibility-scalability trilemma.
GitHub stars n/a Velocity flat History 1 snapshot Benchmarking Apr 13 Pending
Select Smarter, Not More: Prompt-Aware Evaluation Scheduling with Submodular Guarantees Build Now
A novel evaluation scheduling method for prompt optimization that significantly improves accuracy and reduces token consumption with formal guarantees.
GitHub stars n/a Velocity flat History pending LLM Optimization Apr 13 Code High viability
CUTEv2: Unified and Configurable Matrix Extension for Diverse CPU Architectures with Minimal Design Overhead Build Now
A unified and configurable CPU matrix extension architecture that significantly speeds up AI workloads across diverse CPU platforms with minimal design overhead and a unified software stack.
GitHub stars n/a Velocity flat History pending CPU Architecture Apr 13 Code High viability
Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music Build Now
Audio Flamingo Next is a next-generation open audio-language model offering advanced reasoning over speech, sound, and music with support for long audio inputs and temporal chain-of-thought.
GitHub stars n/a Velocity flat History pending Audio-Language Models Apr 13 Code High viability
A Systematic Analysis of the Impact of Persona Steering on LLM Capabilities Build Now
A framework and adaptive strategy to dynamically route LLM personas, improving cognitive capabilities without additional training.
GitHub stars n/a Velocity flat History pending LLM Persona Apr 13 Code High viability
From Translation to Superset: Benchmark-Driven Evolution of a Production AI Agent from Rust to Python Build Now
A methodology for LLM-assisted continuous code translation of production AI agents, evolving a Rust codebase to a Python superset with enhanced features.
GitHub stars n/a Velocity flat History 1 snapshot AI Agents Apr 13 Code High viability
StarVLA-$α$: Reducing Complexity in Vision-Language-Action Systems Build Now
A simplified baseline for Vision-Language-Action models that achieves state-of-the-art performance across multiple robotic benchmarks, enabling systematic study of design choices.
GitHub stars n/a Velocity flat History pending Robotics Agents Apr 13 Pending High viability
Pando: Do Interpretability Methods Work When Models Won't Explain Themselves? Build Now
A benchmark and tools to evaluate AI interpretability methods by disentangling true model understanding from prompt-based elicitation.
GitHub stars n/a Velocity flat History pending AI Interpretability Apr 13 Code High viability
From Topology to Trajectory: LLM-Driven World Models For Supply Chain Resilience Build Now
ReflectiChain is an LLM-driven world model framework for semiconductor supply chain resilience, integrating latent trajectory rehearsal and retrospective RL to achieve significant improvements in planning and operability.
GitHub stars n/a Velocity flat History pending Supply Chain AI Apr 13 Code High viability
Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration Build Now
NExt accelerates LLM RLVR training by modeling and non-linearly extrapolating low-rank parameter trajectories, reducing computational overhead by 37.5%.
GitHub stars n/a Velocity flat History pending LLM Training Apr 13 Pending High viability
Collaborative Multi-Agent Scripts Generation for Enhancing Imperfect-Information Reasoning in Murder Mystery Games Build Now
A collaborative multi-agent framework for generating game scripts that enhances Vision-Language Models' reasoning capabilities in imperfect-information multiplayer games.
GitHub stars n/a Velocity flat History pending LLM Reasoning Apr 13 Code High viability
Introspective Diffusion Language Models Build Now
A new language model providing parallel generation with state-of-the-art performance improvements.
GitHub stars n/a Velocity flat History 1 snapshot Language Models Apr 13 Code High viability
EmergentBridge: Improving Zero-Shot Cross-Modal Transfer in Unified Multimodal Embedding Models Build Now
EmergentBridge enhances zero-shot cross-modal transfer in unified embedding models by learning a bridging framework that improves performance on unpaired modality pairs without exhaustive supervision.
GitHub stars n/a Velocity flat History pending Multimodal AI Apr 13 Code High viability
Efficient Emotion-Aware Iconic Gesture Prediction for Robot Co-Speech Build Now
A lightweight transformer predicts robot co-speech gestures from text and emotion, outperforming GPT-4o and suitable for real-time embodied agents.
GitHub stars n/a Velocity flat History pending Robotics AI Apr 13 Code High viability
Dynamic Summary Generation for Interpretable Multimodal Depression Detection Build Now
This framework uses LLMs to generate interpretable multimodal summaries for accurate depression detection, improving upon state-of-the-art in both accuracy and transparency.
GitHub stars n/a Velocity flat History pending Medical AI Apr 13 Code High viability
Towards Autonomous Mechanistic Reasoning in Virtual Cells Build Now
A multi-agent framework for autonomous mechanistic reasoning in virtual cells, generating and validating biological explanations to accelerate scientific discovery.
GitHub stars n/a Velocity flat History pending Scientific Discovery Apr 13 Code High viability
AffordSim: A Scalable Data Generator and Benchmark for Affordance-Aware Robotic Manipulation Build Now
A scalable simulation framework that generates affordance-aware robotic manipulation data by integrating open-vocabulary 3D affordance prediction into trajectory generation.
GitHub stars n/a Velocity flat History pending Robotics Apr 13 Code High viability
Intelligent Approval of Access Control Flow in Office Automation Systems via Relational Modeling Build Now
RMIA is a relational modeling framework that automates access control flow approval in office automation systems by fusing binary and ternary relation models for intelligent decision-making.
GitHub stars n/a Velocity flat History pending Office Automation AI Apr 13 Code High viability
Sanity Checks for Agentic Data Science Build Now
Lightweight sanity checks for agentic data science pipelines that use perturbations to ensure reliable signal detection and expose unsupported conclusions.
GitHub stars n/a Velocity flat History pending Agents Apr 13 Code High viability
Multi-ORFT: Stable Online Reinforcement Fine-Tuning for Multi-Agent Diffusion Planning in Cooperative Driving Build Now
A novel reinforcement learning framework for cooperative driving that significantly improves safety and efficiency by stabilizing online fine-tuning of diffusion-based trajectory planners.
GitHub stars n/a Velocity flat History pending Autonomous Driving Apr 13 Code High viability
Towards Adaptive Open-Set Object Detection via Category-Level Collaboration Knowledge Mining Build Now
An adaptive open-set object detection method that mines category-level collaboration knowledge to improve generalization to novel categories across domains.
GitHub stars n/a Velocity flat History pending Computer Vision Apr 13 Code High viability
EdgeCIM: A Hardware-Software Co-Design for CIM-Based Acceleration of Small Language Models Build Now
EdgeCIM is a hardware-software co-design framework that dramatically improves the energy efficiency and throughput of small language model inference on edge devices.
GitHub stars n/a Velocity flat History pending Edge AI Hardware Apr 13 Code High viability
Grounded World Model for Semantically Generalizable Planning Build Now
A Grounded World Model that enables visuomotor Model Predictive Control to generalize semantically across unseen environments and tasks by aligning vision and language goals.
GitHub stars n/a Velocity flat History pending Robotics Agents Apr 13 Code High viability
ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation Build Now
ReSpinQuant offers efficient layer-wise LLM quantization by fusing activation rotations into weights, achieving state-of-the-art accuracy with minimal inference overhead.
GitHub stars n/a Velocity flat History pending LLM Optimization Apr 13 Code High viability
Hodoscope: Unsupervised Monitoring for AI Misbehaviors Build Now
Hodoscope offers unsupervised AI monitoring by highlighting distinctive behavioral anomalies, reducing review effort and aiding in the discovery of novel vulnerabilities.
GitHub stars n/a Velocity flat History pending AI Monitoring Apr 13 Code High viability
Semantic-Geometric Dual Compression: Training-Free Visual Token Reduction for Ultra-High-Resolution Remote Sensing Understanding Build Now
DualComp is a task-adaptive dual-stream token compression framework for efficient and accurate ultra-high-resolution remote sensing understanding.
GitHub stars n/a Velocity flat History pending Remote Sensing AI Apr 13 Code High viability
The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems Build Now
Salami Attack is a novel framework that exploits cumulative low-risk inputs to bypass LLM security, achieving over 90% success rate on GPT-4o and Gemini, with proposed defense strategies.
GitHub stars n/a Velocity flat History pending LLM Security Apr 13 Code High viability
BoxTuning: Directly Injecting the Object Box for Multimodal Model Fine-Tuning Build Now
BoxTuning injects object spatial-temporal information directly into the visual modality for more efficient and accurate video question answering.
GitHub stars n/a Velocity flat History pending Multimodal AI Apr 13 Code High viability
Learning from Contrasts: Synthesizing Reasoning Paths from Diverse Search Trajectories Build Now
A framework that synthesizes reasoning paths by learning from the contrasts between successful and failed search trajectories, enabling 20x data reduction for LLM fine-tuning.
GitHub stars n/a Velocity flat History pending LLM Reasoning Apr 13 Code High viability
Do LLMs Know Tool Irrelevance? Demystifying Structural Alignment Bias in Tool Invocations Build Now
A new dataset and method to address structural alignment bias in LLM tool invocations, improving reliability.
GitHub stars n/a Velocity flat History pending LLM Agents Apr 13 Code High viability
Mem$^2$Evolve: Towards Self-Evolving Agents via Co-Evolutionary Capability Expansion and Experience Distillation Build Now
A co-evolutionary framework for self-evolving AI agents that expands capabilities by integrating experience and asset creation, outperforming existing methods.
GitHub stars n/a Velocity flat History pending Agents Apr 13 Code High viability
AbLWR:A Context-Aware Listwise Ranking Framework for Antibody-Antigen Binding Affinity Prediction via Positive-Unlabeled Learning Build Now
AbLWR is a context-aware listwise ranking framework for antibody-antigen binding affinity prediction that uses positive-unlabeled learning and homologous antigen sampling to outperform state-of-the-art baselines.
GitHub stars n/a Velocity flat History pending Drug Discovery Apr 13 Code High viability
Do Agent Rules Shape or Distort? Guardrails Beat Guidance in Coding Agents Build Now
This research reveals that constraining AI coding agents on what NOT to do, rather than prescribing what to do, significantly improves performance and reduces unintended negative consequences, offering a safer configuration principle.
GitHub stars n/a Velocity flat History pending Agents Apr 13 Code High viability
RPA-Check: A Multi-Stage Automated Framework for Evaluating Dynamic LLM-based Role-Playing Agents Build Now
An automated framework for evaluating LLM-based role-playing agents in complex environments, providing objective metrics for performance assessment.
GitHub stars n/a Velocity flat History pending Agents Apr 13 Code High viability
Time is Not a Label: Continuous Phase Rotation for Temporal Knowledge Graphs and Agentic Memory Build Now
A drop-in temporal knowledge graph module that uses continuous phase rotation to manage evolving and persistent facts for agentic memory.
GitHub stars n/a Velocity flat History pending Agentic Memory Apr 13 Code High viability
MAFIG: Multi-agent Driven Formal Instruction Generation Framework Build Now
MAFIG is a multi-agent framework that uses LLMs to rapidly generate formal instructions for repairing scheduling logic during emergencies, achieving high success rates with low latency.
GitHub stars n/a Velocity flat History pending Scheduling Agents Apr 13 Code High viability
E2E-REME: Towards End-to-End Microservices Auto-Remediation via Experience-Simulation Reinforcement Fine-Tuning Build Now
E2E-REME is an end-to-end auto-remediation model for microservices, trained via experience-simulation reinforcement fine-tuning, that generates executable playbooks from diagnosis reports.
GitHub stars n/a Velocity flat History pending Microservices Remediation Apr 13 Code High viability
CoRe-ECG: Advancing Self-Supervised Representation Learning for 12-Lead ECG via Contrastive and Reconstructive Synergy Build Now
CoRe-ECG is a self-supervised learning framework for ECG analysis that combines contrastive and reconstructive methods with novel augmentation techniques to achieve state-of-the-art performance.
GitHub stars n/a Velocity flat History pending Medical AI Apr 13 Code High viability
Rethinking Token-Level Credit Assignment in RLVR: A Polarity-Entropy Analysis Build Now
A novel method to improve LLM reasoning by focusing learning signals on high-entropy tokens, addressing credit assignment in RLVR.
GitHub stars n/a Velocity flat History pending LLM Reasoning Apr 13 Code High viability
Anthropogenic Regional Adaptation in Multimodal Vision-Language Model Build Now
A new paradigm and method for adapting multimodal vision-language models to specific regional contexts while maintaining global generalization, showing significant gains in cultural relevance.
GitHub stars n/a Velocity flat History pending Multimodal AI Apr 13 Code High viability
MathAgent: Adversarial Evolution of Constraint Graphs for Mathematical Reasoning Data Synthesis Build Now
A hierarchical framework for synthesizing high-quality mathematical reasoning data by adversarially evolving structured generation blueprints and instantiating them into natural language scenarios.
GitHub stars n/a Velocity flat History pending LLM Data Synthesis Apr 13 Code High viability
ZoomR: Memory Efficient Reasoning through Multi-Granularity Key Value Retrieval Build Now
ZoomR reduces LLM inference memory by 4x for long output generation through adaptive summarization and dynamic KV cache selection.
GitHub stars n/a Velocity flat History pending LLM Optimization Apr 13 Code High viability
Hardening x402: PII-Safe Agentic Payments via Pre-Execution Metadata Filtering Watch
Middleware for privacy-safe payments by filtering metadata in agent protocols.
GitHub stars n/a Velocity flat History 1 snapshot Privacy-Preserving Technology Apr 13 Pending
From Redaction to Restoration: Deep Learning for Medical Image Anonymization and Reconstruction Build Now
An end-to-end deep learning pipeline that anonymizes medical images by redacting sensitive information and inpainting plausible anatomy, preserving downstream analysis utility.
GitHub stars n/a Velocity flat History pending Medical AI Apr 13 Code High viability
Detecting Safety Violations Across Many Agent Traces Build Now
Meerkat uses clustering and agentic search to detect complex, rare, and hidden safety violations across many agent traces.
GitHub stars n/a Velocity flat History pending Agent Safety Apr 13 Code High viability
Towards Automated Solar Panel Integrity: Hybrid Deep Feature Extraction for Advanced Surface Defect Identification Build Now
An automated system for solar panel defect detection using a hybrid approach of handcrafted and deep learning features, achieving 99.17% accuracy.
GitHub stars n/a Velocity flat History pending Computer Vision Apr 13 Code High viability
Solving Physics Olympiad via Reinforcement Learning on Physics Simulators Build Now
Train LLMs for physics reasoning by generating synthetic data from physics simulators and using reinforcement learning.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Apr 13 Code High viability
Min-$k$ Sampling: Decoupling Truncation from Temperature Scaling via Relative Logit Dynamics Build Now
A novel sampling strategy for large language models that dynamically determines truncation boundaries to improve text quality and temperature invariance.
GitHub stars n/a Velocity flat History pending LLM Decoding Apr 13 Code High viability
A collaborative agent with two lightweight synergistic models for autonomous crystal materials research Build Now
A lightweight, collaborative agent system for autonomous crystal materials research that accelerates discovery by 100x.
GitHub stars n/a Velocity flat History pending Materials Science Agents Apr 13 Code High viability
General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks Build Now
General365 is a new benchmark designed to evaluate the general reasoning capabilities of LLMs across diverse and challenging tasks, revealing significant room for improvement beyond domain-specific expertise.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 13 Code High viability
QShield: Securing Neural Networks Against Adversarial Attacks using Quantum Circuits Build Now
Presents QShield, a hybrid quantum-classical neural network architecture that significantly enhances adversarial robustness for deep learning models.
GitHub stars n/a Velocity flat History pending AI Security Apr 13 Code High viability
Learning to Forget -- Hierarchical Episodic Memory for Lifelong Robot Deployment Build Now
H^2-EMV is a framework for robots to learn hierarchical episodic memory with language-model-based relevance estimation and user feedback, reducing memory size and improving query accuracy.
GitHub stars n/a Velocity flat History pending Robotics Memory Apr 13 Code High viability
Panoptic Pairwise Distortion Graph Build Now
A novel approach to image assessment that represents image pairs as structured graphs of their regions, enabling fine-grained distortion understanding and outperforming current multimodal models.
GitHub stars n/a Velocity flat History pending Computer Vision Apr 13 Code High viability
Think Before you Write: QA-Guided Reasoning for Character Descriptions in Books Build Now
A framework that decouples reasoning from generation for improved character description accuracy in long-form narratives.
GitHub stars n/a Velocity flat History pending LLM Applications Apr 13 Code High viability
Exploring Knowledge Conflicts for Faithful LLM Reasoning: Benchmark and Method Build Now
A new benchmark and a two-stage explanation-based thinking framework (XoT) are introduced to improve LLM reasoning over conflicting textual and knowledge graph evidence.
GitHub stars n/a Velocity flat History pending LLM Reasoning Apr 13 Code High viability
GenTac: Generative Modeling and Forecasting of Soccer Tactics Build Now
GenTac is a diffusion-based generative framework that models and forecasts stochastic soccer tactics, enabling diverse trajectory generation and controllable counterfactual simulations.
GitHub stars n/a Velocity flat History pending Generative Modeling Apr 13 Code High viability
ReXSonoVQA: A Video QA Benchmark for Procedure-Centric Ultrasound Understanding Build Now
A new video QA benchmark for ultrasound procedure understanding, enabling the development of AI systems for training, guidance, and robotic automation in medical imaging.
GitHub stars n/a Velocity flat History pending Medical AI Apr 13 Code High viability
CSPO: Alleviating Reward Ambiguity for Structured Table-to-LaTeX Generation Build Now
A novel reinforcement learning framework that disentangles optimization for table-to-LaTeX generation by assigning component-specific rewards to improve structural, style, and content fidelity.
GitHub stars n/a Velocity flat History pending Structured Data Generation Apr 13 Code High viability
Diffusion-CAM: Faithful Visual Explanations for dMLLMs Build Now
Diffusion-CAM is the first interpretability method tailored for diffusion multimodal large language models, providing faithful visual explanations by addressing challenges in parallel denoising architectures.
GitHub stars n/a Velocity flat History pending Multimodal AI Interpretability Apr 13 Code High viability
NimbusGuard: A Novel Framework for Proactive Kubernetes Autoscaling Using Deep Q-Networks Build Now
A proactive Kubernetes autoscaling system using deep reinforcement learning and LSTMs to forecast workload and optimize resource allocation for improved performance and cost efficiency.
GitHub stars n/a Velocity flat History pending Cloud Infrastructure Apr 13 Code High viability
Designing Adaptive Digital Nudging Systems with LLM-Driven Reasoning Build Now
An LLM-driven architecture for adaptive digital nudging systems is proposed, balancing behavioral effectiveness with ethical compliance and user modeling.
GitHub stars n/a Velocity flat History pending Adaptive Systems Apr 13 Code High viability
Beyond Statistical Co-occurrence: Unlocking Intrinsic Semantics for Tabular Data Clustering Build Now
A novel framework that leverages LLMs to inject intrinsic semantic knowledge into tabular data clustering, significantly outperforming existing methods.
GitHub stars n/a Velocity flat History pending Tabular Data AI Apr 13 Code High viability
FlowCoMotion: Text-to-Motion Generation via Token-Latent Flow Modeling Build Now
FlowCoMotion generates realistic human motion from text by unifying discrete semantic cues with continuous motion dynamics through a novel token-latent flow modeling approach.
GitHub stars n/a Velocity flat History pending Generative Video Apr 13 Code High viability
RAG-KT: Cross-platform Explainable Knowledge Tracing with Multi-view Fusion Retrieval Generation Build Now
RAG-KT is a retrieval-augmented framework for cross-platform knowledge tracing, enabling explainable predictions and improved generalization.
GitHub stars n/a Velocity flat History pending Educational AI Apr 13 Code High viability
ADD for Multi-Bit Image Watermarking Build Now
A novel multi-bit image watermarking method that offers high capacity, resilience to distortions, and significant computational gains with theoretical justification.
GitHub stars n/a Velocity flat History pending Image Watermarking Apr 13 Code High viability
Synthius-Mem: Brain-Inspired Hallucination-Resistant Persona Memory Achieving 94.4% Memory Accuracy and 99.6% Adversarial Robustness on LoCoMo Build Now
A brain-inspired persona memory system for AI agents that achieves high accuracy and adversarial robustness, reducing hallucinations.
GitHub stars n/a Velocity flat History pending Agents Apr 13 Code High viability
Uncertainty-Aware Web-Conditioned Scientific Fact-Checking Build Now
A fact-checking system that selectively uses web search for technical claims, improving accuracy and controlling costs.
GitHub stars n/a Velocity flat History pending Fact Checking Apr 13 Code High viability
THEIA: Learning Complete Kleene Three-Valued Logic in a Pure-Neural Modular Architecture Build Now
THEIA is a modular neural architecture that learns complete Kleene three-valued logic end-to-end, demonstrating superior compositional generalization and faster training than Transformer baselines.
GitHub stars n/a Velocity flat History pending Logic Systems Apr 13 Code High viability
Shared Emotion Geometry Across Small Language Models: A Cross-Architecture Study of Representation, Behavior, and Methodological Confounds Build Now
A cross-architecture study revealing universal emotion geometry in small LLMs and a layered decomposition of methodological confounds.
GitHub stars n/a Velocity flat History pending LLM Representation Apr 13 Code High viability
A Compact and Efficient 1.251 Million Parameter Machine Learning CNN Model PD36-C for Plant Disease Detection: A Case Study Build Now
A compact, efficient CNN model for plant disease detection with a user-friendly desktop application for edge deployment.
GitHub stars n/a Velocity flat History pending Medical AI Apr 13 Code High viability
Fairness is Not Flat: Geometric Phase Transitions Against Shortcut Learning Build Now
A geometric methodology using a Topological Auditor to mitigate shortcut learning in deep neural networks, forcing them to learn ethical representations and outperforming L1 Regularization and JTT.
GitHub stars n/a Velocity flat History pending AI Ethics Apr 13 Code High viability
Evaluating the Impact of Medical Image Reconstruction on Downstream AI Fairness and Performance Watch
A scalable framework evaluates the impact of medical image reconstruction on downstream AI fairness and performance, revealing that pixel-level metrics poorly track task outcomes and biases.
GitHub stars n/a Velocity flat History pending Medical AI Apr 13 Code
MADQRL: Distributed Quantum Reinforcement Learning Framework for Multi-Agent Environments Watch
MADQRL is a distributed quantum reinforcement learning framework that enables efficient learning in high-dimensional multi-agent environments.
GitHub stars n/a Velocity flat History pending Reinforcement Learning Apr 13 Code
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents Watch
ClawGUI streamlines the creation and deployment of advanced GUI automation agents with state-of-the-art performance.
GitHub stars n/a Velocity flat History 1 snapshot GUI Automation Apr 13 Code
Cost-optimal Sequential Testing via Doubly Robust Q-learning Watch
A doubly robust Q-learning framework for learning cost-optimal sequential testing policies from retrospective clinical data.
GitHub stars n/a Velocity flat History pending Clinical Decision Support Apr 13 Code
PaperScope: A Multi-Modal Multi-Document Benchmark for Agentic Deep Research Across Massive Scientific Papers Watch
PaperScope is a multi-modal, multi-document benchmark for evaluating agentic deep research across scientific papers, highlighting current system limitations.
GitHub stars n/a Velocity flat History pending AI Research Benchmarking Apr 13 Code
Three Roles, One Model: Role Orchestration at Inference Time to Close the Performance Gap Between Small and Large Agents Watch
A three-tier inference scaffolding pipeline that doubles the performance of small LLM agents on complex tasks without additional training.
GitHub stars n/a Velocity flat History pending LLM Agents Apr 13 Code
The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping Watch
A reinforcement learning framework that uses historical behavioral signals to shape rewards, reducing repeated errors and increasing sampling diversity.
GitHub stars n/a Velocity flat History pending Reinforcement Learning Apr 13 Code
CFMS: A Coarse-to-Fine Multimodal Synthesis Framework for Enhanced Tabular Reasoning Watch
A multimodal framework that synthesizes knowledge tuples to guide symbolic reasoning for enhanced tabular data comprehension.
GitHub stars n/a Velocity flat History pending Multimodal Reasoning Apr 13 Code
Minimal Embodiment Enables Efficient Learning of Number Concepts in Robot Watch
A robot learning system that uses minimal embodiment to achieve highly efficient and biologically plausible learning of number concepts, outperforming vision-only models.
GitHub stars n/a Velocity flat History pending Robotics Apr 13 Code
Intersectional Sycophancy: How Perceived User Demographics Shape False Validation in Large Language Models Watch
Investigating how perceived user demographics influence sycophancy in LLMs, this research uses Anthropic's Petri framework to reveal differential false validation rates in GPT-5-nano and Claude Haiku 4.5.
GitHub stars n/a Velocity flat History pending LLM Safety Apr 13 Code
Delving Aleatoric Uncertainty in Medical Image Segmentation via Vision Foundation Models Build Now
Leveraging vision foundation models to quantify aleatoric uncertainty in medical image segmentation for improved robustness and data filtering.
GitHub stars n/a Velocity flat History pending Medical AI Apr 13 Code High viability
S$^3$: Structured Sparsity Specification Watch
A mathematical framework for specifying and implementing structured sparsity patterns in neural networks, improving efficiency and performance.
GitHub stars n/a Velocity flat History pending LLM Training Apr 13 Code
MMR-AD: A Large-Scale Multimodal Dataset for Benchmarking General Anomaly Detection with Multimodal Large Language Models Build Now
A large-scale multimodal dataset and a reasoning-based model for general anomaly detection using large language models.
GitHub stars n/a Velocity flat History pending Anomaly Detection Apr 13 Code High viability
BankerToolBench: Evaluating AI Agents in End-to-End Investment Banking Workflows Build Now
An AI benchmark for investment banking workflows, enabling evaluation of agents on complex, multi-file deliverables with industry-defined rubrics.
GitHub stars n/a Velocity flat History pending Agents Apr 13 Code High viability
PAC-BENCH: Evaluating Multi-Agent Collaboration under Privacy Constraints Watch
A benchmark for evaluating multi-agent collaboration under privacy constraints, revealing significant performance degradation and coordination breakdowns.
GitHub stars n/a Velocity flat History pending Multi-Agent Systems Apr 13 Code
Lightweight Low-Light Image Enhancement via Distribution-Normalizing Preprocessing and Depthwise U-Net Build Now
A lightweight, two-stage framework for low-light image enhancement using distribution-normalizing preprocessing and a depthwise U-Net, achieving competitive quality with fewer parameters.
GitHub stars n/a Velocity flat History pending Image Enhancement Apr 13 Code High viability
Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models Build Now
Introduces an information-theoretic probing framework to diagnose and overcome 'pseudo-unification' in multimodal models, enabling more genuine synergy for text-to-image generation.
GitHub stars n/a Velocity flat History pending Multimodal AI Apr 13 Code High viability
CocoaBench: Evaluating Unified Digital Agents in the Wild Watch
CocoaBench is a new benchmark for unified LLM agents, evaluating their ability to compose vision, search, and coding capabilities on long-horizon tasks.
GitHub stars n/a Velocity flat History pending Agent Benchmarking Apr 13 Code
bacpipe: a Python package to make bioacoustic deep learning models accessible Watch
A Python package providing accessible bioacoustic deep learning models and evaluation pipelines for ecologists and computer scientists.
GitHub stars n/a Velocity flat History pending Bioacoustics AI Apr 13 Code
CLAY: Conditional Visual Similarity Modulation in Vision-Language Embedding Space Watch
An adaptive similarity computation method for image retrieval that reframes VLM embedding spaces without additional training.
GitHub stars n/a Velocity flat History pending Vision-Language Apr 13 Code
On the Robustness of Watermarking for Autoregressive Image Generation Ignore
Investigates the vulnerabilities of watermarking techniques for autoregressive image generation, revealing susceptibility to removal and forgery attacks.
GitHub stars n/a Velocity flat History pending AI Watermarking / Image Generation Apr 13 Code
AOP-Smart: A RAG-Enhanced Large Language Model Framework for Adverse Outcome Pathway Analysis Watch
A RAG-enhanced LLM framework, AOP-Smart, significantly improves the accuracy and reliability of toxicological Adverse Outcome Pathway analysis by mitigating hallucination.
RAG for Toxicology Apr 13
NovBench: Evaluating Large Language Models on Academic Paper Novelty Assessment Ignore
A new benchmark for evaluating LLMs on academic paper novelty assessment to improve peer review.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 13 Code
ActorMind: Emulating Human Actor Reasoning for Speech Role-Playing Ignore
ActorMind is a reasoning framework that enables AI to perform speech role-playing with personalized verbal traits by emulating human actor processes.
GitHub stars n/a Velocity flat History pending Speech AI Apr 13 Code
One Scale at a Time: Scale-Autoregressive Modeling for Fluid Flow Distributions Watch
Scale-autoregressive modeling (SAR) provides a practical tool for fast and accurate estimation of statistical flow quantities by hierarchically sampling fluid flows from coarse to fine.
GitHub stars n/a Velocity flat History pending Generative Simulation Apr 13 Code
DIB-OD: Preserving the Invariant Core for Robust Heterogeneous Graph Adaptation via Decoupled Information Bottleneck and Online Distillation Ignore
A novel framework for robust heterogeneous graph adaptation that preserves invariant knowledge across domains using decoupled information bottleneck and online distillation.
GitHub stars n/a Velocity flat History pending Graph Adaptation Apr 13 Code
Inspectable AI for Science: A Research Object Approach to Generative AI Governance Ignore
This paper proposes AI as a Research Object (AI-RO) for governing generative AI in science, treating AI interactions as inspectable components with a focus on provenance and accountability.
GitHub stars n/a Velocity flat History pending AI Governance Apr 13 Code
Emulating Non-Differentiable Metrics via Knowledge-Guided Learning: Introducing the Minkowski Image Loss Ignore
A framework for training models on non-differentiable scientific metrics by creating differentiable surrogate functions.
GitHub stars n/a Velocity flat History pending Scientific ML Apr 13 Code
DreamKG: A KG-Augmented Conversational System for People Experiencing Homelessness Ignore
A conversational AI system augmented with knowledge graphs to reliably provide information about community services for people experiencing homelessness.
GitHub stars n/a Velocity flat History pending Agents Apr 13 Code
From Attribution to Action: A Human-Centered Application of Activation Steering Watch
An interactive workflow combining attribution and activation steering for vision models, enabling practitioners to shift from inspecting explanations to intervening and testing hypotheses.
Explainable AI Apr 13
Escaping the Context Bottleneck: Active Context Curation for LLM Agents via Reinforcement Learning Watch
A symbiotic framework using reinforcement learning to actively curate context for LLM agents, improving performance and reducing token consumption.
LLM Agents Apr 13
PRISM Risk Signal Framework: Hierarchy-Based Red Lines for AI Behavioral Risk Watch
The PRISM framework introduces hierarchy-based red lines for AI behavioral risk, offering an anticipatory and measurable approach to AI safety beyond case-specific violations.
GitHub stars n/a Velocity flat History pending AI Safety Apr 13 Code
Regional Explanations: Bridging Local and Global Variable Importance Ignore
A new method for regional explanations that bridges local and global variable importance by segmenting the input space and applying attribution methods within regions.
GitHub stars n/a Velocity flat History pending Explainable AI Apr 13 Code
NetworkNet: A Deep Neural Network Approach for Random Networks with Sparse Nodal Attributes and Complex Nodal Heterogeneity Ignore
A deep neural network approach for modeling nodal heterogeneity and selecting influential attributes in random networks, offering interpretability and scalability.
GitHub stars n/a Velocity flat History pending Network Analysis Apr 13 Code
The Missing Knowledge Layer in Cognitive Architectures for AI Agents Ignore
This paper proposes a novel four-layer cognitive architecture for AI agents with distinct persistence semantics for knowledge, memory, wisdom, and intelligence, addressing a gap in current frameworks.
GitHub stars n/a Velocity flat History pending AI Agents Apr 13 Code
Evaluating Cooperation in LLM Social Groups through Elected Leadership Ignore
Develop AI models for understanding leadership dynamics in LLM-based social groups.
GitHub stars n/a Velocity flat History 1 snapshot Social AI Apr 13 Code
Consistency of AI-Generated Exercise Prescriptions: A Repeated Generation Study Using a Large Language Model Ignore
This study evaluates the consistency of AI-generated exercise prescriptions, highlighting areas of high semantic agreement and quantitative variability, suggesting a need for prompt refinement and expert validation before clinical use.
GitHub stars n/a Velocity flat History pending Medical AI Apr 13 Code
Governance by Design: A Parsonian Institutional Architecture for Internet-Wide Agent Societies Ignore
This paper proposes a Parsonian institutional architecture for governing internet-wide agent societies, identifying a significant governance gap in existing ecosystems like OpenClaw.
GitHub stars n/a Velocity flat History pending Agents Apr 13 Code
EvoNash-MARL: A Closed-Loop Multi-Agent Reinforcement Learning Framework for Medium-Horizon Equity Allocation Ignore
A closed-loop multi-agent reinforcement learning framework for robust medium-horizon equity allocation, outperforming benchmarks with a novel layered policy architecture.
Financial AI Apr 13
Ambiguity Detection and Elimination in Automated Executable Process Modeling Ignore
A framework to detect and eliminate ambiguity in LLM-generated executable process models by analyzing behavioral inconsistencies.
LLM Applications Apr 13
Budget-Aware Uncertainty for Radiotherapy Segmentation QA Using nnU-Net Ignore
A budget-aware uncertainty framework built on nnU-Net for quality assurance in radiotherapy segmentation.
Medical Imaging AI Apr 13
Back to the Barn with LLAMAs: Evolving Pretrained LLM Backbones in Finetuning Vision Language Models Ignore
This study investigates how evolving LLM backbones impact Vision-Language Model performance, finding that newer backbones don't always improve performance and the effect is task-dependent.
Vision Language Models Apr 13
From Answers to Arguments: Toward Trustworthy Clinical Diagnostic Reasoning with Toulmin-Guided Curriculum Goal-Conditioned Learning Ignore
A framework for trustworthy clinical diagnostic reasoning using Toulmin-guided curriculum learning to generate explicit arguments.
Trustworthy AI Apr 13
When Verification Fails: How Compositionally Infeasible Claims Escape Rejection Ignore
This paper investigates how current models fail at compositional claim verification by relying on salient constraints, proposing new benchmarks to expose this weakness.
GitHub stars n/a Velocity flat History pending LLM Reasoning Apr 13 Code
Beyond RAG for Cyber Threat Intelligence: A Systematic Evaluation of Graph-Based and Agentic Retrieval Ignore
A systematic evaluation of graph-based and agentic retrieval methods for improving cyber threat intelligence analysis beyond standard RAG.
Cybersecurity AI Apr 13
Continuous-time Online Learning via Mean-Field Neural Networks: Regret Analysis in Diffusion Environments Ignore
Develops a theoretical framework for continuous-time online learning in diffusion environments using mean-field neural networks, with potential applications in adaptive systems.
GitHub stars n/a Velocity flat History pending LLM Training Apr 13 Code
RTMC: Step-Level Credit Assignment via Rollout Trees Ignore
RTMC is a novel advantage estimation method for multi-step agentic reinforcement learning that aggregates return statistics across rollouts to improve credit assignment without learned critics.
Reinforcement Learning Apr 13
Beyond LLMs, Sparse Distributed Memory, and Neuromorphics <A Hyper-Dimensional SRAM-CAM "VaCoAl" for Ultra-High Speed, Ultra-Low Power, and Low Cost> Ignore
A novel hyperdimensional computing architecture offers a new paradigm for AI, addressing limitations of LLMs with ultra-high speed and low power consumption.
Hyperdimensional Computing Apr 13
A Triadic Suffix Tokenization Scheme for Numerical Reasoning Ignore
A novel tokenization scheme for LLMs to improve numerical reasoning by preserving digit structure and magnitude.
GitHub stars n/a Velocity flat History pending LLM Tokenization Apr 13 Code
ATANT v1.1: Positioning Continuity Evaluation Against Memory, Long-Context, and Agentic-Memory Benchmarks Ignore
A framework for evaluating LLM continuity that highlights the shortcomings of existing benchmarks.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 13 Code
Enabling and Inhibitory Pathways of Students' AI Use Concealment Intention in Higher Education: Evidence from SEM and fsQCA Ignore
Investigating student AI use concealment intentions in higher education through dual-method analysis.
GitHub stars n/a Velocity flat History pending Education AI Apr 13 Code
A Proposed Biomedical Data Policy Framework to Reduce Fragmentation, Improve Quality, and Incentivize Sharing in Indian Healthcare in the era of Artificial Intelligence and Digital Health Ignore
A proposed policy framework to reduce fragmentation, improve quality, and incentivize sharing of biomedical data in Indian healthcare.
GitHub stars n/a Velocity flat History pending Healthcare Data Policy Apr 13 Code
Quantization Dominates Rank Reduction for KV-Cache Compression Ignore
Demonstrates that quantization significantly outperforms rank reduction for KV-cache compression in transformer inference, achieving high accuracy with substantial reduction.
LLM Inference Optimization Apr 13
Brief2Design: A Multi-phased, Compositional Approach to Prompt-based Graphic Design Ignore
A graphic design tool that supports a structured workflow for translating abstract client briefs into visual designs by assisting with requirement extraction, element exploration, and recombination.
Generative Design Tools Apr 13
SVD-Prune: Training-Free Token Pruning For Efficient Vision-Language Models Ignore
A training-free method for pruning vision tokens in multimodal models using Singular Value Decomposition to improve efficiency without performance loss.
Vision-Language Models Apr 13
SCNO: Spiking Compositional Neural Operator -- Towards a Neuromorphic Foundation Model for Nuclear PDE Solving Ignore
A modular, spiking neural operator architecture for solving coupled PDEs, offering compositionality and reduced parameter count.
Scientific AI Apr 13
Efficient KernelSHAP Explanations for Patch-based 3D Medical Image Segmentation Ignore
An efficient KernelSHAP framework for 3D medical image segmentation that reduces computation by restricting analysis to regions of interest and caching predictions.
Medical AI Apr 13
Discourse Diversity in Multi-Turn Empathic Dialogue Ignore
A reinforcement learning framework that trains LLMs to diversify discourse moves in multi-turn empathic dialogue, improving empathy and reducing repetition.
LLM Dialogue Apr 13
FM-Agent: Scaling Formal Methods to Large Systems via LLM-Based Hoare-Style Reasoning Ignore
An LLM-powered framework for automated compositional reasoning and bug detection in large-scale software systems.
Software Engineering Apr 13
Network Effects and Agreement Drift in LLM Debates Ignore
This paper investigates how LLM agents exhibit agreement drift in debates, highlighting the need to distinguish structural effects from model biases before using LLMs as proxies for human groups.
GitHub stars n/a Velocity flat History pending LLM Agents Apr 13 Code
Retrieval Is Not Enough: Why Organizational AI Needs Epistemic Infrastructure Ignore
OIDA is a framework that structures organizational knowledge with epistemic properties to improve AI agent fidelity beyond simple retrieval.
Organizational AI Apr 13
Product Review Based on Optimized Facial Expression Detection Ignore
A faster and accurate facial expression recognition method for product review based on optimized feature point extraction.
Computer Vision Apr 13
SemaClaw: A Step Towards General-Purpose Personal AI Agents through Harness Engineering Ignore
An open-source multi-agent application framework focused on harness engineering for general-purpose personal AI agents.
Agents Apr 13
OOM-RL: Out-of-Money Reinforcement Learning Market-Driven Alignment for LLM-Based Multi-Agent Systems Ignore
OOM-RL leverages financial market loss as a tough learning tool to align multi-agent systems in high-stakes environments.
GitHub stars n/a Velocity flat History 1 snapshot AI Alignment for Multi-Agent Systems Apr 13 Code
Measuring the Authority Stack of AI Systems: Empirical Analysis of 366,120 Forced-Choice Responses Across 8 AI Models Ignore
This research empirically maps the 'Authority Stack' of 8 AI models across 366,120 forced-choice responses to reveal their value priorities, evidence preferences, and source trust hierarchies.
GitHub stars n/a Velocity flat History pending AI System Analysis Apr 13 Code
Agentic Driving Coach: Robustness and Determinism of Agentic AI-Powered Human-in-the-Loop Cyber-Physical Systems Ignore
A reactor-model-of-computation approach using the Lingua Franca framework to address nondeterminism in agentic AI-powered human-in-the-loop cyber-physical systems, demonstrated with an agentic driving coach.
Agents Apr 13
ShapShift: Explaining Model Prediction Shifts with Subgroup Conditional Shapley Values Ignore
A novel Shapley value method for attributing prediction shifts in machine learning models to changes in interpretable data subgroups, aiding model monitoring.
Model Interpretability Apr 13
Persona Non Grata: Single-Method Safety Evaluation Is Incomplete for Persona-Imbued LLMs Ignore
This research reveals that current safety evaluations for personalized LLMs are incomplete, as prompting and activation steering expose different vulnerabilities, leading to potentially missed failure modes.
LLM Safety Apr 13
Why Do Large Language Models Generate Harmful Content? Ignore
Identifies specific model layers and neurons responsible for harmful content generation in LLMs, providing insights for mitigation.
LLM Safety Apr 13
A molecular clock for writing systems reveals the quantitative impact of imperial power on cultural evolution Ignore
Analyzes the evolution of writing systems using a molecular clock model and quantitative data, revealing the impact of imperial power on cultural change.
AI for Humanities Apr 13
When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies Ignore
Investigating the effectiveness of LLM-generated features for RL trading agents, highlighting a gap between feature validity and policy robustness under distribution shifts.
LLM Training Apr 13
CASK: Core-Aware Selective KV Compression for Reasoning Traces Ignore
CASK is a core-aware selective KV compression method for LLM reasoning traces, prioritizing core preservation over elaborate scorer engineering for improved fidelity.
LLM Optimization Apr 13
Physics-Informed State Space Models for Reliable Solar Irradiance Forecasting in Off-Grid Systems Ignore
A thermodynamically consistent neural network for reliable solar irradiance forecasting in off-grid systems.
Forecasting AI Apr 13
Taking a Pulse on How Generative AI is Reshaping the Software Engineering Research Landscape Ignore
A survey of software engineering researchers on the adoption and implications of Generative AI in research practices, highlighting benefits, challenges, and governance needs.
AI in Software Engineering Research Apr 13
3D-Anchored Lookahead Planning for Persistent Robotic Scene Memory via World-Model-Based MCTS Ignore
A robotic manipulation system using Monte Carlo Tree Search and a 3D world model for persistent spatial memory and replanning.
Robotics Apr 13
Reasoning as Data: Representation-Computation Unity and Its Implementation in a Domain-Algebraic Inference Engine Ignore
A novel symbolic engine implements representation-computation unity for domain-specific inference, eliminating the separation of storage and computation.
Knowledge Representation Apr 13
SLALOM: Simulation Lifecycle Analysis via Longitudinal Observation Metrics for Social Simulation Ignore
A framework for validating LLM agent social simulations by assessing process fidelity rather than just final outcomes.
LLM Agents Apr 13
From Agent Loops to Structured Graphs:A Scheduler-Theoretic Framework for LLM Agent Execution Ignore
A theoretical framework for structuring LLM agent execution using graph-based scheduling to improve controllability and verifiability.
LLM Agents Apr 13
Environmental Footprint of GenAI Research: Insights from the Moshi Foundation Model Ignore
An analysis of the environmental footprint of GenAI research, providing guidelines for more sustainable AI development.
AI Sustainability Apr 13
Evolving Many Worlds: Towards Open-Ended Discovery in Petri Dish NCA via Population-Based Training Ignore
A meta-evolutionary algorithm that evolves a population of Neural Cellular Automata to generate sustained, open-ended complexity and emergent lifelike phenomena.
Artificial Life Apr 13
A Mechanistic Analysis of Looped Reasoning Language Models Ignore
A mechanistic analysis of looped reasoning language models, investigating how their internal dynamics differ from standard feedforward models.
LLM Internals Apr 13
Lectures on AI for Mathematics Ignore
A book introducing the principles and applications of AI for advancing mathematical research, pattern discovery, and theorem proving.
AI for Mathematics Apr 13
Optimal Stability of KL Divergence under Gaussian Perturbations Ignore
Theoretical analysis of KL divergence stability under Gaussian perturbations for out-of-distribution detection.
Theoretical ML Apr 13
Layerwise Dynamics for In-Context Classification in Transformers Ignore
This paper theoretically analyzes the internal dynamics of in-context classification in Transformers, identifying an emergent, geometry-driven update rule.
LLM Interpretability Apr 13
Compliant But Unsatisfactory: The Gap Between Auditing Standards and Practices for Probabilistic Genotyping Software Ignore
Examines the gap between auditing standards and practices for probabilistic genotyping software, highlighting design flaws that hinder effective AI governance.
AI Governance Apr 13
Limited Perfect Monotonical Surrogates constructed using low-cost recursive linkage discovery with guaranteed output Ignore
A parameterless surrogate model that can be trained on the fly to enable efficient optimization of complex problems by comparing solutions.
Optimization Apr 13
On the Complexity of the Discussion-based Semantics in Abstraction Argumentation Ignore
This paper analyzes the complexity of discussion-based semantics in argumentation theory, reducing it to automata equivalence.
AI Theory Apr 13
Minimizing classical resources in variational measurement-based quantum computation for generative modeling Ignore
A restricted variational measurement-based quantum computation model that uses fewer parameters for generative modeling.
Quantum Generative Modeling Apr 13
Endogenous Information in Routing Games: Memory-Constrained Equilibria, Recall Braess Paradoxes, and Memory Design Ignore
This paper explores endogenous information in routing games, developing theories for memory-constrained equilibria and recall paradoxes, with potential applications in traffic management systems.
Game Theory / Routing Apr 13
A Quantitative Definition of Intelligence Ignore
Proposes an operational, quantitative definition of intelligence based on intelligence density, distinguishing between memorization and knowing.
AI Theory Apr 13
Examining EAP Students' AI Disclosure Intention: A Cognition-Affect-Conation Perspective Ignore
Examining the psychological factors influencing EAP students' intention to disclose AI tool usage in academic writing.
AI Ethics Apr 13
AI Integrity: A New Paradigm for Verifiable AI Governance Ignore
AI Integrity proposes a new paradigm for verifiable AI governance by focusing on the auditable reasoning process and the protection of an AI's Authority Stack.
AI Governance Apr 13