WebForge: Breaking the Realism-Reproducibility-Scalability Trilemma in Browser Agent Benchmark Build Now
WebForge is an automated framework and benchmark for creating reproducible, realistic, and scalable browser agent environments without human annotation.
GitHub 8 stars Velocity flat History 1 snapshot Browser Agents Apr 13 Pending High viability
CodeTracer: Towards Traceable Agent States Build Now
CodeTracer reconstructs and localizes failures in complex code agent workflows by parsing run artifacts into a traceable state history.
GitHub 4 stars Velocity flat History 1 snapshot Agents Apr 13 Pending High viability
Problem Reductions at Scale: Agentic Integration of Computationally Hard Problems Build Now
Build a universal problem-solver reducer library for NP-hard challenges, scalable through AI agent integration.
GitHub 23 stars Velocity flat History 1 snapshot optimization Apr 13 Pending High viability
TorchUMM: A Unified Multimodal Model Codebase for Evaluation, Analysis, and Post-training Build Now
TorchUMM is a unified codebase for evaluating, analyzing, and post-training diverse unified multimodal models across various tasks and datasets.
GitHub 38 stars Velocity flat History 1 snapshot Multimodal AI Apr 12 Pending High viability
Enhancing Multimodal Large Language Models for Ancient Chinese Character Evolution Analysis via Glyph-Driven Fine-Tuning Build Now
A glyph-driven fine-tuning framework (GEVO) to enhance multimodal LLMs for ancient Chinese character evolution analysis, releasing a benchmark and trained models.
GitHub 0 stars Velocity flat History 1 snapshot Multimodal LLMs Apr 13 Pending High viability
Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning Build Now
GRIP is a unified framework that embeds retrieval control directly into generation, enabling end-to-end coordination for question answering.
GitHub 20 stars Velocity flat History 1 snapshot LLM Agents Apr 13 Pending High viability
CUTEv2: Unified and Configurable Matrix Extension for Diverse CPU Architectures with Minimal Design Overhead Build Now
A unified and configurable CPU matrix extension architecture that significantly boosts AI workload performance across diverse architectures with minimal overhead.
GitHub 9 stars Velocity flat History 1 snapshot CPU Architecture Apr 13 Pending High viability
METRO: Towards Strategy Induction from Expert Dialogue Transcripts for Non-collaborative Dialogues Build Now
A method to autonomously induce dialogue strategies from expert transcripts for scalable non-collaborative agent development.
GitHub 4 stars Velocity flat History 1 snapshot LLM Applications Apr 13 Pending High viability
Legal2LogicICL: Improving Generalization in Transforming Legal Cases to Logical Formulas via Diverse Few-Shot Learning Build Now
An LLM-based framework that transforms legal cases into logical formulas using few-shot learning, improving generalization and interpretability in legal reasoning.
GitHub 0 stars Velocity flat History 1 snapshot Legal AI Apr 13 Pending High viability
Hardening x402: PII-Safe Agentic Payments via Pre-Execution Metadata Filtering Build Now
Secure AI agent payments with PII-safe x402 metadata filtering.
GitHub 0 stars Velocity flat History 1 snapshot AI Privacy and Security Apr 13 Pending High viability
On the Robustness of Watermarking for Autoregressive Image Generation Build Now
This research reveals significant vulnerabilities in existing watermarking techniques for autoregressive image generation, showing they are susceptible to removal and forgery attacks, impacting content attribution and dataset filtering.
GitHub 13 stars Velocity flat History 1 snapshot AI Watermarking / Content Provenance Apr 13 Pending High viability
Mem$^2$Evolve: Towards Self-Evolving Agents via Co-Evolutionary Capability Expansion and Experience Distillation Build Now
A co-evolutionary framework that integrates experience memory and asset memory to enable self-evolving AI agents with expanded capabilities and guided asset creation.
GitHub 117300 stars Velocity flat History 1 snapshot Agents Apr 13 Pending High viability
UniToolCall: Unifying Tool-Use Representation, Data, and Evaluation for LLM Agents Build Now
A unified framework for LLM agent tool-use that standardizes data, representation, and evaluation, significantly outperforming commercial models.
GitHub 0 stars Velocity flat History 1 snapshot LLM Agents Apr 13 Pending High viability
A Benchmark for Gap and Overlap Analysis as a Test of KG Task Readiness Build Now
An executable benchmark for evaluating knowledge graph readiness in gap and overlap analysis for policy documents.
GitHub 0 stars Velocity flat History 1 snapshot Knowledge Graphs Apr 12 Pending High viability
SWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning Context Build Now
A software agent framework that manages dynamic reasoning context by using a sliding window and compressed digests to prevent context explosion and redundant re-analysis in complex software engineering tasks.
GitHub 1 stars Velocity flat History 1 snapshot Agents Apr 13 Pending High viability
Towards Proactive Information Probing: Customer Service Chatbots Harvesting Value from Conversation Build Now
PROCHATIP is a proactive chatbot framework that intelligently probes users for valuable business intelligence, significantly outperforming reactive support tools in both information gathering and service quality.
GitHub 0 stars Velocity flat History 1 snapshot Customer Service AI Apr 13 Pending High viability
ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection Build Now
ClawGuard is a runtime security framework for LLM agents that deterministically enforces user-confirmed rules at tool-call boundaries to prevent indirect prompt injection.
GitHub 0 stars Velocity flat History 1 snapshot Agents Apr 13 Pending High viability
EmbodiedGovBench: A Benchmark for Governance, Recovery, and Upgrade Safety in Embodied Agent Systems Build Now
A benchmark for evaluating the safety and governability of embodied AI systems, moving beyond simple task completion metrics.
GitHub stars n/a Velocity flat History 1 snapshot Embodied Agents Apr 13 Pending High viability
Federated Single-Agent Robotics: Multi-Robot Coordination Without Intra-Robot Multi-Agent Fragmentation Build Now
A runtime architecture for multi-robot coordination that maintains individual robot autonomy, improving governance and recovery containment.
GitHub 0 stars Velocity flat History 1 snapshot Robotics Apr 13 Pending High viability
A Mamba-Based Multimodal Network for Multiscale Blast-Induced Rapid Structural Damage Assessment Build Now
A Mamba-based multimodal network that integrates multi-scale blast-loading information with optical remote sensing images for rapid structural damage assessment after explosions.
GitHub 0 stars Velocity flat History 1 snapshot Multimodal AI Apr 13 Pending High viability
C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts Build Now
A comprehensive Chinese benchmark for AI-generated text detection derived from real-world prompts, enabling reliable detection and generalization to unseen LLMs.
GitHub 0 stars Velocity flat History 1 snapshot LLM Security Apr 13 Pending High viability
StarVLA-$α$: Reducing Complexity in Vision-Language-Action Systems Build Now
A simplified baseline for Vision-Language-Action models that achieves state-of-the-art performance across multiple robotic benchmarks, enabling systematic study of design choices.
GitHub 1829 stars Velocity flat History 1 snapshot Robotics Agents Apr 13 Pending High viability
Grounded World Model for Semantically Generalizable Planning Build Now
A Grounded World Model that enables visuomotor Model Predictive Control to generalize to unseen environments and instructions by aligning vision and language.
GitHub 4 stars Velocity flat History 1 snapshot Robotics Agents Apr 13 Pending High viability
Mobile GUI Agent Privacy Personalization with Trajectory Induced Preference Optimization Build Now
A novel preference optimization method for mobile GUI agents that personalizes user privacy by analyzing and weighting execution trajectories, improving alignment and task success.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 13 Pending High viability
RECIPER: A Dual-View Retrieval Pipeline for Procedure-Oriented Materials Question Answering Build Now
A dual-view retrieval pipeline that combines paragraph context with LLM-extracted procedural summaries to improve evidence retrieval for materials science question answering.
GitHub 0 stars Velocity flat History 1 snapshot Information Retrieval Apr 13 Pending High viability
Efficient Training for Cross-lingual Speech Language Models Build Now
CSLM is an efficient training method for cross-lingual speech LLMs that aligns modalities and languages without massive speech data, enabling scalable natural human-AI interaction.
GitHub 1 stars Velocity flat History 1 snapshot Cross-lingual Speech LLMs Apr 13 Pending High viability
Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind Build Now
Develop AI agents that learn to steer user beliefs by understanding and manipulating their intentions, outperforming current frontier models in adversarial scenarios.
GitHub 0 stars Velocity flat History 1 snapshot Agents Apr 13 Pending High viability
Pando: Do Interpretability Methods Work When Models Won't Explain Themselves? Build Now
A benchmark and tools to evaluate AI interpretability methods by breaking the elicitation confounder, showing when white-box methods truly add value.
GitHub 1 stars Velocity flat History 1 snapshot AI Interpretability Apr 13 Pending High viability
METER: Evaluating Multi-Level Contextual Causal Reasoning in Large Language Models Build Now
METER is a new benchmark and analysis framework for evaluating multi-level contextual causal reasoning in LLMs, with code and dataset available.
GitHub 0 stars Velocity flat History 1 snapshot LLM Evaluation Apr 13 Pending High viability
Beyond A Fixed Seal: Adaptive Stealing Watermark in Large Language Models Build Now
An adaptive watermark stealing algorithm for LLMs that dynamically selects attack perspectives to significantly increase steal efficiency against target watermarks.
GitHub 0 stars Velocity flat History 1 snapshot LLM Security Apr 13 Pending High viability
Delving Aleatoric Uncertainty in Medical Image Segmentation via Vision Foundation Models Build Now
Leveraging vision foundation models to quantify aleatoric uncertainty in medical image segmentation for improved robustness and data quality.
GitHub 713 stars Velocity flat History 1 snapshot Medical AI Apr 13 Pending High viability
Context Kubernetes: Declarative Orchestration of Enterprise Knowledge for Agentic AI Systems Build Now
A new architecture for orchestrating enterprise knowledge in agentic AI systems, akin to Kubernetes for containers, with proven security and freshness benefits.
GitHub 0 stars Velocity flat History 1 snapshot Agents Apr 13 Pending High viability
MMR-AD: A Large-Scale Multimodal Dataset for Benchmarking General Anomaly Detection with Multimodal Large Language Models Build Now
A large-scale multimodal dataset and baseline model for general anomaly detection using MLLMs.
GitHub 713 stars Velocity flat History 1 snapshot Anomaly Detection Apr 13 Pending High viability
Hodoscope: Unsupervised Monitoring for AI Misbehaviors Build Now
Hodoscope offers unsupervised AI monitoring by detecting behavioral anomalies, reducing review effort and discovering novel vulnerabilities.
GitHub 0 stars Velocity flat History 1 snapshot AI Monitoring Apr 13 Pending High viability
BankerToolBench: Evaluating AI Agents in End-to-End Investment Banking Workflows Build Now
A benchmark for evaluating AI agents in end-to-end investment banking workflows, designed with input from 502 investment bankers.
GitHub 1865 stars Velocity flat History 1 snapshot Agents Apr 13 Pending High viability
Learning from Contrasts: Synthesizing Reasoning Paths from Diverse Search Trajectories Build Now
A framework that synthesizes reasoning paths by learning from the contrasts between successful and failed search trajectories, enabling 20x data reduction for model training.
GitHub 0 stars Velocity flat History 1 snapshot Reasoning Apr 13 Pending High viability
Do LLMs Know Tool Irrelevance? Demystifying Structural Alignment Bias in Tool Invocations Build Now
A novel dataset and method to address structural alignment bias in LLMs' tool invocation, improving reliability.
GitHub 0 stars Velocity flat History 1 snapshot LLM Agents Apr 13 Pending High viability
Introspective Diffusion Language Models Build Now
Introspective Diffusion Language Models improve efficiency and quality of language model generation by ensuring introspective consistency.
GitHub stars n/a Velocity flat History 1 snapshot Language Model Enhancement Apr 13 Code High viability
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time Build Now
RationalRewards teaches reward models to provide multi-dimensional critiques, improving visual generation at both training and test time through reasoning.
GitHub stars n/a Velocity flat History 1 snapshot Generative AI Apr 13 Code High viability
Evaluating Cooperation in LLM Social Groups through Elected Leadership Build Now
An open-source framework simulating LLM social groups with elected leadership, demonstrating significant improvements in cooperation and social welfare for common-pool resource management.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents / Multi-Agent Systems Apr 13 Code High viability
You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass Build Now
A multimodal reward model that efficiently scores multiple responses in a single forward pass, outperforming existing models and improving RL policies.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal AI Apr 13 Code High viability
Sanity Checks for Agentic Data Science Build Now
Lightweight sanity checks based on the PCS framework to ensure agentic data science pipelines reliably distinguish signal from noise, exposing unsupported conclusions and improving trustworthiness.
GitHub stars n/a Velocity flat History 1 snapshot Agentic AI Apr 13 Code High viability
Select Smarter, Not More: Prompt-Aware Evaluation Scheduling with Submodular Guarantees Build Now
An intelligent evaluation scheduling system for prompt optimization that significantly improves accuracy and reduces token consumption.
GitHub stars n/a Velocity flat History 1 snapshot LLM Optimization Apr 13 Code High viability
Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration Build Now
A novel framework that models and extrapolates low-rank parameter trajectories non-linearly to accelerate LLM reinforcement learning with verifiable rewards.
GitHub 3 stars Velocity flat History 1 snapshot LLM Training Apr 13 Pending High viability
Lightweight Low-Light Image Enhancement via Distribution-Normalizing Preprocessing and Depthwise U-Net Build Now
A lightweight, two-stage framework for low-light image enhancement that achieves competitive quality with fewer parameters.
GitHub 713 stars Velocity flat History 1 snapshot Image Enhancement Apr 13 Pending High viability
The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems Build Now
An automatic framework for multi-turn LLM jailbreaking that chains low-risk inputs to bypass security and achieve high attack success rates.
GitHub stars n/a Velocity flat History 1 snapshot LLM Security Apr 13 Code High viability
Verify Before You Fix: Agentic Execution Grounding for Trustworthy Cross-Language Code Analysis Build Now
A cross-language code analysis framework that grounds LLM reasoning in execution-based validation to ensure trustworthy vulnerability detection and repair.
GitHub stars n/a Velocity flat History 1 snapshot Agentic Code Analysis Apr 12 Code High viability
Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models Build Now
Introduces an information-theoretic framework to diagnose 'pseudo-unification' in multimodal models, revealing divergence in encoding and response patterns to enable genuine synergy.
GitHub 713 stars Velocity flat History 1 snapshot Multimodal AI Apr 13 Pending High viability
Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music Build Now
Audio Flamingo Next is an open-source audio-language model capable of understanding and reasoning over 30 minutes of speech, sound, and music, with a novel temporal reasoning paradigm.
GitHub stars n/a Velocity flat History 1 snapshot Audio-Language Models Apr 13 Code High viability
MathAgent: Adversarial Evolution of Constraint Graphs for Mathematical Reasoning Data Synthesis Build Now
A framework for synthesizing high-quality mathematical reasoning data by adversarially evolving constraint graphs, outperforming existing datasets.
GitHub stars n/a Velocity flat History 1 snapshot LLM Data Synthesis Apr 13 Code High viability
bacpipe: a Python package to make bioacoustic deep learning models accessible Watch
A Python package that makes bioacoustic deep learning models accessible for analysis and research.
GitHub 40 stars Velocity flat History 1 snapshot Bioacoustics AI Apr 13 Pending
Solving Physics Olympiad via Reinforcement Learning on Physics Simulators Build Now
Leveraging physics simulators and reinforcement learning to train LLMs for deep physical reasoning, achieving significant zero-shot sim-to-real transfer on physics benchmarks.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Apr 13 Code High viability
Min-$k$ Sampling: Decoupling Truncation from Temperature Scaling via Relative Logit Dynamics Build Now
A novel sampling strategy for LLMs that dynamically adjusts truncation based on logit distribution to improve text quality and temperature invariance.
GitHub stars n/a Velocity flat History 1 snapshot LLM Sampling Strategies Apr 13 Code High viability
From Translation to Superset: Benchmark-Driven Evolution of a Production AI Agent from Rust to Python Build Now
A methodology for LLM-assisted code translation and evolution of production AI agents, demonstrating code reduction and feature expansion.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents Apr 13 Code High viability
From Attribution to Action: A Human-Centered Application of Activation Steering Watch
An interactive workflow combining activation steering with XAI for instance-level analysis in vision models, enabling practitioners to shift from inspection to intervention-based hypothesis testing.
GitHub 713 stars Velocity flat History 1 snapshot Explainable AI Apr 13 Pending
TInR: Exploring Tool-Internalized Reasoning in Large Language Models Watch
A framework for Tool-Internalized Reasoning (TInR) that integrates tool knowledge directly into LLMs, improving reasoning efficiency and performance without external documentation.
GitHub stars n/a Velocity flat History 1 snapshot Tool-Integrated Reasoning Apr 12 Pending
From Topology to Trajectory: LLM-Driven World Models For Supply Chain Resilience Build Now
ReflectiChain is an LLM-driven world model framework that enhances supply chain resilience by integrating latent trajectory rehearsal and retrospective agentic RL for autonomous policy evolution.
GitHub stars n/a Velocity flat History 1 snapshot Supply Chain AI Apr 13 Code High viability
Collaborative Multi-Agent Scripts Generation for Enhancing Imperfect-Information Reasoning in Murder Mystery Games Build Now
A collaborative multi-agent framework for generating game scripts to enhance vision-language models' reasoning in imperfect-information multiplayer games.
GitHub stars n/a Velocity flat History 1 snapshot Multi-Agent Reasoning Apr 13 Code High viability
Evaluating the Impact of Medical Image Reconstruction on Downstream AI Fairness and Performance Build Now
A framework evaluates how medical image reconstruction impacts downstream AI diagnostic performance and fairness, revealing that pixel-level metrics don't capture task performance or bias amplification.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 13 Code High viability
EmergentBridge: Improving Zero-Shot Cross-Modal Transfer in Unified Multimodal Embedding Models Build Now
EmergentBridge improves zero-shot cross-modal transfer in multimodal embedding models by learning a bridging framework that strengthens connections between unpaired modalities without requiring exhaustive pairwise supervision.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal AI Apr 13 Code High viability
Efficient Emotion-Aware Iconic Gesture Prediction for Robot Co-Speech Build Now
A lightweight transformer predicts robot co-speech gestures from text and emotion, outperforming GPT-4o and suitable for real-time deployment.
GitHub stars n/a Velocity flat History 1 snapshot Robotics AI Apr 13 Code High viability
Dynamic Summary Generation for Interpretable Multimodal Depression Detection Build Now
A multi-stage framework using LLMs for interpretable multimodal depression detection generates clinical summaries to guide fusion of text, audio, and video features, improving accuracy and transparency.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 13 Code High viability
Towards Autonomous Mechanistic Reasoning in Virtual Cells Build Now
A multi-agent framework for autonomous mechanistic reasoning in virtual cells, generating and validating biological explanations to improve scientific discovery.
GitHub stars n/a Velocity flat History 1 snapshot Scientific Discovery Apr 13 Code High viability
AffordSim: A Scalable Data Generator and Benchmark for Affordance-Aware Robotic Manipulation Build Now
A scalable simulation framework that generates affordance-aware robotic manipulation data by integrating open-vocabulary 3D affordance prediction into trajectory generation.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 13 Code High viability
Intelligent Approval of Access Control Flow in Office Automation Systems via Relational Modeling Build Now
RMIA is a relational modeling framework that automates access control flow approval in office automation systems by fusing binary and ternary relation modeling for intelligent decision-making.
GitHub stars n/a Velocity flat History 1 snapshot Office Automation AI Apr 13 Code High viability
Multi-ORFT: Stable Online Reinforcement Fine-Tuning for Multi-Agent Diffusion Planning in Cooperative Driving Build Now
A novel multi-agent reinforcement learning framework for cooperative driving that significantly improves safety and efficiency by stabilizing online fine-tuning of diffusion planners.
GitHub stars n/a Velocity flat History 1 snapshot Autonomous Driving Apr 13 Code High viability
Bottleneck Tokens for Unified Multimodal Retrieval Build Now
Bottleneck Tokens (BToks) and Generative Information Condensation enable unified multimodal retrieval by providing explicit pooling and token-level supervision for semantic compression.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal Retrieval Apr 13 Code High viability
Towards Adaptive Open-Set Object Detection via Category-Level Collaboration Knowledge Mining Build Now
An adaptive open-set object detection system that mines category-level collaboration knowledge to generalize to novel categories with significant performance gains.
GitHub stars n/a Velocity flat History 1 snapshot Computer Vision Apr 13 Code High viability
EdgeCIM: A Hardware-Software Co-Design for CIM-Based Acceleration of Small Language Models Build Now
EdgeCIM is a hardware-software co-design framework that dramatically improves the energy efficiency and throughput of small language model inference on edge devices.
GitHub stars n/a Velocity flat History 1 snapshot Edge AI Hardware Apr 13 Code High viability
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents Build Now
ClawGUI is a cutting-edge framework for building and deploying efficient GUI automation agents.
GitHub stars n/a Velocity flat History 1 snapshot AI Frameworks & Tools Apr 13 Code High viability
ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation Build Now
ReSpinQuant offers efficient layer-wise LLM quantization by fusing activation rotations offline and matching bases in residual subspaces, achieving state-of-the-art accuracy with minimal overhead.
GitHub stars n/a Velocity flat History 1 snapshot LLM Optimization Apr 13 Code High viability
Semantic-Geometric Dual Compression: Training-Free Visual Token Reduction for Ultra-High-Resolution Remote Sensing Understanding Build Now
DualComp is a task-adaptive dual-stream token compression framework for ultra-high-resolution remote sensing understanding, improving efficiency and accuracy.
GitHub stars n/a Velocity flat History 1 snapshot Remote Sensing MLLMs Apr 13 Code High viability
BoxTuning: Directly Injecting the Object Box for Multimodal Model Fine-Tuning Build Now
BoxTuning injects object spatial-temporal information directly into the visual modality for multimodal models, reducing token costs and improving video question answering accuracy.
GitHub stars n/a Velocity flat History 1 snapshot Video MLLMs Apr 13 Code High viability
AbLWR:A Context-Aware Listwise Ranking Framework for Antibody-Antigen Binding Affinity Prediction via Positive-Unlabeled Learning Build Now
AbLWR is a context-aware listwise ranking framework for antibody-antigen binding affinity prediction that uses positive-unlabeled learning and homologous antigen sampling to outperform state-of-the-art baselines.
GitHub stars n/a Velocity flat History 1 snapshot Biotech AI Apr 13 Code High viability
Do Agent Rules Shape or Distort? Guardrails Beat Guidance in Coding Agents Build Now
This research reveals that constraining AI coding agents on what NOT to do, rather than prescribing actions, significantly improves performance and reduces reliability risks, offering a clear principle for safer agent configuration.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 13 Code High viability
RPA-Check: A Multi-Stage Automated Framework for Evaluating Dynamic LLM-based Role-Playing Agents Build Now
An automated framework for evaluating LLM-based role-playing agents, providing objective metrics for complex, constraint-heavy environments.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 13 Code High viability
Time is Not a Label: Continuous Phase Rotation for Temporal Knowledge Graphs and Agentic Memory Build Now
A drop-in temporal knowledge graph module for agentic memory that uses continuous phase rotation to manage evolving and persistent facts.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 13 Code High viability
PaperScope: A Multi-Modal Multi-Document Benchmark for Agentic Deep Research Across Massive Scientific Papers Build Now
PaperScope is a multi-modal, multi-document benchmark for evaluating agentic deep research systems across scientific papers.
GitHub stars n/a Velocity flat History 1 snapshot AI Research Benchmarking Apr 13 Code High viability
BridgeSim: Unveiling the OL-CL Gap in End-to-End Autonomous Driving Build Now
A test-time adaptation framework to bridge the open-loop to closed-loop gap in autonomous driving policies.
GitHub stars n/a Velocity flat History 1 snapshot Autonomous Driving Apr 12 Code High viability
MAFIG: Multi-agent Driven Formal Instruction Generation Framework Build Now
MAFIG is a multi-agent framework that uses LLMs to rapidly generate formal instructions for repairing scheduling logic during emergencies, reducing latency.
GitHub stars n/a Velocity flat History 1 snapshot Scheduling Agents Apr 13 Code High viability
E2E-REME: Towards End-to-End Microservices Auto-Remediation via Experience-Simulation Reinforcement Fine-Tuning Build Now
E2E-REME is an end-to-end auto-remediation model for microservices, trained via experience-simulation reinforcement fine-tuning, that generates executable playbooks from diagnosis reports.
GitHub stars n/a Velocity flat History 1 snapshot Microservices Auto-Remediation Apr 13 Code High viability
CoRe-ECG: Advancing Self-Supervised Representation Learning for 12-Lead ECG via Contrastive and Reconstructive Synergy Build Now
CoRe-ECG is a self-supervised learning framework for ECG analysis that combines contrastive and reconstructive methods with novel augmentation techniques to achieve state-of-the-art performance.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 13 Code High viability
Rethinking Token-Level Credit Assignment in RLVR: A Polarity-Entropy Analysis Build Now
A novel framework and optimization technique to improve LLM reasoning by addressing the credit assignment problem in RLVR through entropy analysis.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Apr 13 Code High viability
Anthropogenic Regional Adaptation in Multimodal Vision-Language Model Build Now
A novel paradigm and method for adapting multimodal vision-language models to specific regional contexts while maintaining global generalization, showing significant gains in cultural relevance.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal AI Apr 13 Code High viability
ZoomR: Memory Efficient Reasoning through Multi-Granularity Key Value Retrieval Build Now
A novel approach to reduce LLM inference memory by adaptively compressing reasoning thoughts and dynamically selecting KV cache details, achieving over 4x memory reduction.
GitHub stars n/a Velocity flat History 1 snapshot LLM Optimization Apr 13 Code High viability
DIB-OD: Preserving the Invariant Core for Robust Heterogeneous Graph Adaptation via Decoupled Information Bottleneck and Online Distillation Build Now
A novel framework for robust heterogeneous graph adaptation that preserves invariant knowledge across domains using decoupled information bottleneck and online distillation.
GitHub stars n/a Velocity flat History 1 snapshot Graph Adaptation Apr 13 Code High viability
From Redaction to Restoration: Deep Learning for Medical Image Anonymization and Reconstruction Build Now
An end-to-end deep learning pipeline that anonymizes medical images by redacting sensitive information and inpainting plausible anatomy, preserving downstream analysis utility.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 13 Code High viability
A Systematic Analysis of the Impact of Persona Steering on LLM Capabilities Build Now
A framework and adaptive strategy to induce and dynamically route personas in LLMs, showing stable shifts in cognitive capabilities and outperforming static personas.
GitHub stars n/a Velocity flat History 1 snapshot LLM Persona Apr 13 Code High viability
Detecting Safety Violations Across Many Agent Traces Build Now
Meerkat uses clustering and agentic search to detect rare and complex safety violations across many agent traces, improving auditing scalability.
GitHub stars n/a Velocity flat History 1 snapshot Agent Safety Apr 13 Code High viability
Towards Automated Solar Panel Integrity: Hybrid Deep Feature Extraction for Advanced Surface Defect Identification Build Now
An automated system for solar panel defect detection using a hybrid approach of handcrafted and deep learning features, achieving 99.17% accuracy.
GitHub stars n/a Velocity flat History 1 snapshot Computer Vision Apr 13 Code High viability
A collaborative agent with two lightweight synergistic models for autonomous crystal materials research Build Now
A lightweight collaborative agent system for crystal materials research that significantly outperforms larger models and accelerates discovery.
GitHub stars n/a Velocity flat History 1 snapshot Materials Science Agents Apr 13 Code High viability
Minimal Embodiment Enables Efficient Learning of Number Concepts in Robot Build Now
A robot learning system that uses minimal embodiment to achieve highly efficient acquisition of abstract number concepts, mirroring human cognitive development.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 13 Code High viability
General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks Build Now
General365 is a new benchmark designed to rigorously assess and improve the general reasoning capabilities of large language models across diverse, challenging tasks.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 13 Code High viability
QShield: Securing Neural Networks Against Adversarial Attacks using Quantum Circuits Build Now
Presents QShield, a hybrid quantum-classical neural network architecture that enhances adversarial robustness of deep learning models by integrating quantum circuits for feature encoding.
GitHub stars n/a Velocity flat History 1 snapshot AI Security Apr 13 Code High viability
Panoptic Pairwise Distortion Graph Build Now
A novel approach to image assessment that uses distortion graphs to represent region-level degradation, outperforming current multimodal LLMs and offering a new direction for fine-grained image analysis.
GitHub stars n/a Velocity flat History 1 snapshot Computer Vision Apr 13 Code High viability
Think Before you Write: QA-Guided Reasoning for Character Descriptions in Books Build Now
A framework that decouples reasoning from generation for improved character description accuracy in long-form narratives.
GitHub stars n/a Velocity flat History 1 snapshot LLM Applications Apr 13 Code High viability
Exploring Knowledge Conflicts for Faithful LLM Reasoning: Benchmark and Method Build Now
A new benchmark and a two-stage reasoning framework address LLMs' struggles with conflicting knowledge from text and knowledge graphs, improving faithful reasoning.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Apr 13 Code High viability
DreamKG: A KG-Augmented Conversational System for People Experiencing Homelessness Build Now
A conversational AI system that uses knowledge graphs to provide reliable, location-aware information about community services for people experiencing homelessness.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 13 Code High viability
GenTac: Generative Modeling and Forecasting of Soccer Tactics Build Now
GenTac is a diffusion-based generative framework that models and forecasts soccer tactics as a stochastic process of player trajectories and tactical events.
GitHub stars n/a Velocity flat History 1 snapshot Generative Sports Analytics Apr 13 Code High viability
ReXSonoVQA: A Video QA Benchmark for Procedure-Centric Ultrasound Understanding Build Now
A video QA benchmark for procedure-centric ultrasound understanding, enabling the development of AI systems for training, guidance, and robotic automation in medical procedures.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 13 Code High viability
CSPO: Alleviating Reward Ambiguity for Structured Table-to-LaTeX Generation Build Now
A novel reinforcement learning framework that disentangles optimization across table structure, style, and content to improve fidelity in table-to-LaTeX generation.
GitHub stars n/a Velocity flat History 1 snapshot Structured Generation Apr 13 Code High viability
Diffusion-CAM: Faithful Visual Explanations for dMLLMs Build Now
The first interpretability method specifically designed for diffusion multimodal large language models to provide faithful visual explanations.
GitHub stars n/a Velocity flat History 1 snapshot AI Interpretability Apr 13 Code High viability
NimbusGuard: A Novel Framework for Proactive Kubernetes Autoscaling Using Deep Q-Networks Build Now
Proactive Kubernetes autoscaling using deep reinforcement learning to optimize performance and cost efficiency.
GitHub stars n/a Velocity flat History 1 snapshot Cloud Infrastructure Apr 13 Code High viability
Retinal Cyst Detection from Optical Coherence Tomography Images Build Now
An AI model that accurately detects and quantifies retinal cysts from OCT images, improving upon existing methods for early disease detection.
GitHub stars n/a Velocity flat History 1 snapshot Medical Imaging AI Apr 12 Code High viability
Designing Adaptive Digital Nudging Systems with LLM-Driven Reasoning Build Now
An LLM-driven architecture for digital nudging systems balances behavioral effectiveness with ethical compliance, validated by user studies.
GitHub stars n/a Velocity flat History 1 snapshot Adaptive Systems Apr 13 Code High viability
Beyond Statistical Co-occurrence: Unlocking Intrinsic Semantics for Tabular Data Clustering Build Now
A novel framework that uses LLMs to distill intrinsic semantics from tabular data for improved clustering in finance and healthcare.
GitHub stars n/a Velocity flat History 1 snapshot LLM Applications Apr 13 Code High viability
FlowCoMotion: Text-to-Motion Generation via Token-Latent Flow Modeling Build Now
FlowCoMotion is a novel text-to-motion generation framework that unifies continuous and discrete motion representations using token-latent coupling for improved semantic alignment and high-fidelity motion details.
GitHub stars n/a Velocity flat History 1 snapshot Generative Video Apr 13 Code High viability
RAG-KT: Cross-platform Explainable Knowledge Tracing with Multi-view Fusion Retrieval Generation Build Now
RAG-KT provides cross-platform explainable knowledge tracing using retrieval-augmented LLMs for improved accuracy and robustness.
GitHub stars n/a Velocity flat History 1 snapshot Educational AI Apr 13 Code High viability
ADD for Multi-Bit Image Watermarking Build Now
ADD is a fast and robust multi-bit image watermarking method that achieves 100% decoding accuracy with code available.
GitHub stars n/a Velocity flat History 1 snapshot Image Watermarking Apr 13 Code High viability
Synthius-Mem: Brain-Inspired Hallucination-Resistant Persona Memory Achieving 94.4% Memory Accuracy and 99.6% Adversarial Robustness on LoCoMo Build Now
A brain-inspired persona memory system for AI agents that achieves high accuracy and adversarial robustness, reducing hallucination.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 13 Code High viability
Uncertainty-Aware Web-Conditioned Scientific Fact-Checking Build Now
A fact-checking system that selectively uses web search only when uncertain, providing traceable rationales and predictable costs for specialized domains.
GitHub stars n/a Velocity flat History 1 snapshot Fact Checking Apr 13 Code High viability
THEIA: Learning Complete Kleene Three-Valued Logic in a Pure-Neural Modular Architecture Build Now
THEIA is a modular neural architecture that learns complete Kleene three-valued logic end-to-end, demonstrating superior compositional generalization and faster training than Transformer baselines.
GitHub stars n/a Velocity flat History 1 snapshot Logic AI Apr 13 Code High viability
Shared Emotion Geometry Across Small Language Models: A Cross-Architecture Study of Representation, Behavior, and Methodological Confounds Build Now
A cross-architecture study revealing shared emotion geometry in small LLMs and a layered decomposition of methodological confounds in representation analysis.
GitHub stars n/a Velocity flat History 1 snapshot LLM Representation Apr 13 Code High viability
A Compact and Efficient 1.251 Million Parameter Machine Learning CNN Model PD36-C for Plant Disease Detection: A Case Study Build Now
A compact and efficient CNN model for plant disease detection with a user-friendly desktop application for edge deployment.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 13 Code High viability
Fairness is Not Flat: Geometric Phase Transitions Against Shortcut Learning Build Now
A geometric methodology using a Topological Auditor to mitigate shortcut learning in deep neural networks by isolating gradient-monopolizing features, forcing higher geometric capacity for ethical representations.
GitHub stars n/a Velocity flat History 1 snapshot AI Ethics Apr 13 Code High viability
MeloTune: On-Device Arousal Learning and Peer-to-Peer Mood Coupling for Proactive Music Curation Build Now
An on-device iPhone music agent that proactively curates music based on individual user arousal and peer mood, with a learned personal arousal function and peer-to-peer mood coupling.
GitHub stars n/a Velocity flat History 1 snapshot Proactive Music Curation Apr 12 Code High viability
Learning Preference-Based Objectives from Clinical Narratives for Sequential Treatment Decision-Making Watch
This framework learns reward functions from clinical narratives to improve sequential treatment decision-making in healthcare, aligning with improved patient recovery outcomes.
GitHub stars n/a Velocity flat History 1 snapshot Healthcare AI Apr 12 Code
CocoaBench: Evaluating Unified Digital Agents in the Wild Watch
CocoaBench is a new benchmark and scaffold for evaluating unified LLM agents that combine vision, search, and coding capabilities on long-horizon tasks.
GitHub stars n/a Velocity flat History 1 snapshot Unified AI Agents Apr 13 Code
Three Roles, One Model: Role Orchestration at Inference Time to Close the Performance Gap Between Small and Large Agents Watch
A three-tier inference scaffolding pipeline that doubles the performance of small LLM agents on complex tasks without additional training.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents Apr 13 Code
One Scale at a Time: Scale-Autoregressive Modeling for Fluid Flow Distributions Watch
Scale-autoregressive modeling (SAR) offers a hierarchical approach to sampling fluid flow distributions, achieving higher accuracy and speed than diffusion models.
GitHub 1865 stars Velocity flat History 1 snapshot Generative Modeling Apr 13 Pending
The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping Watch
A memory-enhanced dynamic reward shaping framework (MEDS) to improve reinforcement learning for LLMs by penalizing recurring error patterns.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 13 Code
CFMS: A Coarse-to-Fine Multimodal Synthesis Framework for Enhanced Tabular Reasoning Watch
A coarse-to-fine multimodal framework for enhanced tabular reasoning using MLLMs and symbolic engines.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal Reasoning Apr 13 Code
Learning to Forget -- Hierarchical Episodic Memory for Lifelong Robot Deployment Watch
A framework for robots to learn selective forgetting of episodic memory through user interaction, reducing memory size and improving query time.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Memory Apr 13 Code
Emulating Non-Differentiable Metrics via Knowledge-Guided Learning: Introducing the Minkowski Image Loss Watch
A framework and loss function to bridge the differentiability gap for non-differentiable scientific metrics in Earth system modeling.
GitHub stars n/a Velocity flat History 1 snapshot Scientific AI Apr 13 Code
Task2vec Readiness: Diagnostics for Federated Learning from Pre-Training Embeddings Ignore
A diagnostic tool using pre-training embeddings to predict federated learning performance before training begins.
GitHub 1865 stars Velocity flat History 1 snapshot Federated Learning Diagnostics Apr 12 Pending
ActorMind: Emulating Human Actor Reasoning for Speech Role-Playing Ignore
ActorMind is a reasoning framework that enables AI to perform speech role-playing with personalized verbal traits by emulating human actor processes.
GitHub 1 stars Velocity flat History 1 snapshot Speech AI Apr 13 Pending
Intersectional Sycophancy: How Perceived User Demographics Shape False Validation in Large Language Models Watch
Investigates how perceived user demographics influence LLM sycophancy, revealing differential validation rates and proposing identity-aware safety evaluations.
GitHub stars n/a Velocity flat History 1 snapshot LLM Safety Apr 13 Code
OOM-RL: Out-of-Money Reinforcement Learning Market-Driven Alignment for LLM-Based Multi-Agent Systems Watch
Adopt financial market constraints for more robust AI multi-agent alignment.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning for Financial Systems Apr 13 Code
AOP-Smart: A RAG-Enhanced Large Language Model Framework for Adverse Outcome Pathway Analysis Watch
A RAG-enhanced LLM framework that significantly improves the accuracy and reliability of toxicological adverse outcome pathway analysis by retrieving relevant knowledge.
GitHub stars n/a Velocity flat History 1 snapshot RAG for Toxicology Apr 13 High viability
Prosociality by Coupling, Not Mere Observation: Homeostatic Sharing in an Inspectable Recurrent Artificial Life Agent Ignore
This paper explores how to induce prosocial behavior in artificial agents through a novel 'homeostatic sharing' mechanism, demonstrating its effectiveness in simulated environments.
GitHub 2 stars Velocity flat History 1 snapshot Artificial Life Agents Apr 12 Pending
PAC-BENCH: Evaluating Multi-Agent Collaboration under Privacy Constraints Watch
A benchmark for evaluating multi-agent collaboration under privacy constraints, revealing significant performance degradation and coordination breakdowns.
GitHub stars n/a Velocity flat History 1 snapshot Multi-Agent Systems Apr 13 Code
Cost-optimal Sequential Testing via Doubly Robust Q-learning Watch
A doubly robust Q-learning framework for learning cost-optimal sequential testing policies from retrospective clinical data.
GitHub stars n/a Velocity flat History 1 snapshot Clinical Decision Support Apr 13 Code
Do BERT Embeddings Encode Narrative Dimensions? A Token-Level Probing Analysis of Time, Space, Causality, and Character in Fiction Watch
This paper probes BERT embeddings to understand if they encode narrative dimensions like time, space, causality, and character in fiction, achieving 94% accuracy with a linear probe.
GitHub stars n/a Velocity flat History 1 snapshot LLM Analysis Apr 12 Code
Resilient Write: A Six-Layer Durable Write Surface for LLM Coding Agents Watch
A robust file writing system for LLM coding agents that prevents data loss and improves self-correction through a multi-layered approach.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agent Tools Apr 12
CLAY: Conditional Visual Similarity Modulation in Vision-Language Embedding Space Watch
CLAY enables adaptive and multi-conditioned image retrieval by reframing VLM embedding spaces without additional training.
GitHub stars n/a Velocity flat History 1 snapshot Vision-Language Apr 13 Code
Discourse Diversity in Multi-Turn Empathic Dialogue Ignore
A reinforcement learning framework to improve discourse move diversity in multi-turn empathic dialogue, addressing formulaic responses from LLMs.
GitHub 1 stars Velocity flat History 1 snapshot Empathic Dialogue Apr 13 Pending
Regional Explanations: Bridging Local and Global Variable Importance Watch
A new method for regional explanations that bridges local and global variable importance by segmenting the input space and applying attribution methods within regions.
GitHub stars n/a Velocity flat History 1 snapshot Explainable AI Apr 13 Code
CheeseBench: Evaluating Large Language Models on Rodent Behavioral Neuroscience Paradigms Watch
CheeseBench evaluates LLMs on rodent behavioral neuroscience paradigms, revealing current models lag behind animal baselines and are sensitive to interface parameters.
GitHub stars n/a Velocity flat History 1 snapshot LLM Benchmarking Apr 12 Code
NovBench: Evaluating Large Language Models on Academic Paper Novelty Assessment Ignore
A new benchmark for evaluating LLMs' ability to assess academic paper novelty, revealing current limitations in understanding scientific novelty.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 13 Code
SVD-Prune: Training-Free Token Pruning For Efficient Vision-Language Models Ignore
A training-free method for pruning tokens in vision-language models using Singular Value Decomposition to improve efficiency without sacrificing performance.
GitHub stars n/a Velocity flat History 1 snapshot Vision-Language Models Apr 13 Pending
MADQRL: Distributed Quantum Reinforcement Learning Framework for Multi-Agent Environments Ignore
MADQRL is a distributed quantum reinforcement learning framework that enables independent agent learning to tackle high-dimensional, multi-agent environments.
GitHub stars n/a Velocity flat History 1 snapshot Quantum Reinforcement Learning Apr 13 Code
Advancing Polish Language Modeling through Tokenizer Optimization in the Bielik v3 7B and 11B Series Ignore
Optimizing Polish language modeling by developing a dedicated tokenizer for the Bielik v3 LLM series, improving efficiency and context window utilization.
GitHub stars n/a Velocity flat History 1 snapshot LLM Tokenizer Optimization Apr 12 Code
Your Model Diversity, Not Method, Determines Reasoning Strategy Ignore
Optimal LLM reasoning strategies depend on model diversity, requiring characterization before exploration.
GitHub 1865 stars Velocity flat History 1 snapshot LLM Reasoning Apr 12 Pending
Budget-Aware Uncertainty for Radiotherapy Segmentation QA Using nnU-Net Watch
A budget-aware uncertainty framework built on nnU-Net for radiotherapy segmentation QA, guiding manual review by highlighting uncertain regions.
GitHub stars n/a Velocity flat History 1 snapshot Medical Imaging AI Apr 13
Inspectable AI for Science: A Research Object Approach to Generative AI Governance Ignore
This paper proposes AI as a Research Object (AI-RO) for governing generative AI in science, treating AI interactions as inspectable components of the research process with a focus on documentation and provenance.
GitHub stars n/a Velocity flat History 1 snapshot AI Governance Apr 13 Code
Escaping the Context Bottleneck: Active Context Curation for LLM Agents via Reinforcement Learning Watch
A symbiotic framework using reinforcement learning to actively curate context for LLM agents, improving performance and reducing token consumption on long-horizon tasks.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents Apr 13
S$^3$: Structured Sparsity Specification Ignore
An algebraic framework for defining and implementing structured sparsity patterns in machine learning models.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 13 Code
A Mechanistic Analysis of Looped Reasoning Language Models Ignore
A mechanistic analysis of looped reasoning language models, investigating how their internal dynamics differ from standard feedforward models.
GitHub 11 stars Velocity flat History 1 snapshot LLM Internals Apr 13 Pending
The Missing Knowledge Layer in Cognitive Architectures for AI Agents Ignore
This paper proposes a novel four-layer cognitive architecture for AI agents with distinct persistence semantics for knowledge, memory, wisdom, and intelligence, addressing a gap in current frameworks.
GitHub stars n/a Velocity flat History 1 snapshot AI Agents Apr 13 Code
Consistency of AI-Generated Exercise Prescriptions: A Repeated Generation Study Using a Large Language Model Ignore
This study evaluates the consistency of AI-generated exercise prescriptions, finding high semantic consistency but variability in quantitative components, suggesting a need for prompt structure and expert validation before clinical deployment.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 13 Code
Governance by Design: A Parsonian Institutional Architecture for Internet-Wide Agent Societies Ignore
This paper proposes a Parsonian institutional architecture for governing internet-wide agent societies, identifying significant governance gaps in existing ecosystems like OpenClaw.
GitHub stars n/a Velocity flat History 1 snapshot AI Agents Apr 13 Code
EvoNash-MARL: A Closed-Loop Multi-Agent Reinforcement Learning Framework for Medium-Horizon Equity Allocation Ignore
A closed-loop multi-agent reinforcement learning framework for robust medium-horizon equity allocation, integrating advanced techniques for improved performance and generalization.
GitHub stars n/a Velocity flat History 1 snapshot Finance AI Apr 13
Ambiguity Detection and Elimination in Automated Executable Process Modeling Ignore
A framework to detect and eliminate ambiguity in LLM-generated executable process models by analyzing behavioral inconsistency and repairing source text.
GitHub stars n/a Velocity flat History 1 snapshot LLM Applications Apr 13
Back to the Barn with LLAMAs: Evolving Pretrained LLM Backbones in Finetuning Vision Language Models Ignore
This study investigates how evolving LLM backbones impact Vision-Language Model performance across different downstream tasks.
GitHub stars n/a Velocity flat History 1 snapshot Vision Language Models Apr 13
Network Effects and Agreement Drift in LLM Debates Ignore
This paper investigates agreement drift in LLM debates to understand the reliability of LLM simulations for social systems.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents Apr 13 Code
From Answers to Arguments: Toward Trustworthy Clinical Diagnostic Reasoning with Toulmin-Guided Curriculum Goal-Conditioned Learning Ignore
A framework for trustworthy clinical diagnostic reasoning using Toulmin-guided curriculum learning to generate explicit diagnostic arguments.
GitHub stars n/a Velocity flat History 1 snapshot Trustworthy AI Apr 13
Retrieval Is Not Enough: Why Organizational AI Needs Epistemic Infrastructure Ignore
OIDA is a framework that structures organizational knowledge with epistemic properties to improve AI decision-making and identify knowledge gaps.
GitHub stars n/a Velocity flat History 1 snapshot Organizational AI Apr 13
When Verification Fails: How Compositionally Infeasible Claims Escape Rejection Ignore
This paper investigates how current models fail at compositional claim verification, identifying a shortcut reasoning issue.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Apr 13 Code
Beyond RAG for Cyber Threat Intelligence: A Systematic Evaluation of Graph-Based and Agentic Retrieval Ignore
This paper systematically evaluates graph-based and agentic retrieval methods for cyber threat intelligence analysis, showing hybrid approaches improve performance on complex queries.
GitHub stars n/a Velocity flat History 1 snapshot Cybersecurity AI Apr 13
Continuous-time Online Learning via Mean-Field Neural Networks: Regret Analysis in Diffusion Environments Ignore
Develops a theoretical framework for continuous-time online learning in diffusion environments using mean-field neural networks, providing regret bounds and simulation insights.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 13 Code
RTMC: Step-Level Credit Assignment via Rollout Trees Ignore
RTMC is a novel advantage estimation method for multi-step agentic reinforcement learning that aggregates return statistics across rollouts to improve credit assignment without a learned critic.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 13
Beyond LLMs, Sparse Distributed Memory, and Neuromorphics <A Hyper-Dimensional SRAM-CAM "VaCoAl" for Ultra-High Speed, Ultra-Low Power, and Low Cost> Ignore
A novel hyperdimensional computing architecture offers a new paradigm for AI, addressing limitations of LLMs with ultra-high speed, low power, and reversible multi-hop reasoning.
GitHub stars n/a Velocity flat History 1 snapshot Hyperdimensional Computing Apr 13
A Triadic Suffix Tokenization Scheme for Numerical Reasoning Ignore
A novel tokenization scheme for LLMs to improve numerical reasoning by explicitly encoding magnitude.
GitHub stars n/a Velocity flat History 1 snapshot LLM Tokenization Apr 13 Code
ATANT v1.1: Positioning Continuity Evaluation Against Memory, Long-Context, and Agentic-Memory Benchmarks Ignore
A framework for evaluating LLM continuity that highlights the limitations of existing benchmarks.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 13 Code
A Proposed Biomedical Data Policy Framework to Reduce Fragmentation, Improve Quality, and Incentivize Sharing in Indian Healthcare in the era of Artificial Intelligence and Digital Health Ignore
A proposed framework to reduce fragmentation, improve quality, and incentivize sharing of Indian biomedical data for AI and digital health initiatives.
GitHub stars n/a Velocity flat History 1 snapshot Biomedical Data Policy Apr 13 Code
NetworkNet: A Deep Neural Network Approach for Random Networks with Sparse Nodal Attributes and Complex Nodal Heterogeneity Ignore
A deep neural network approach for modeling nodal heterogeneity and selecting influential attributes in random networks.
GitHub stars n/a Velocity flat History 1 snapshot Network Analysis Apr 13 Code
Quantization Dominates Rank Reduction for KV-Cache Compression Ignore
Quantization significantly outperforms rank reduction for KV-cache compression in transformer inference, achieving high accuracy with substantial reduction.
GitHub stars n/a Velocity flat History 1 snapshot LLM Inference Optimization Apr 13
When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies Ignore
Investigating the effectiveness of LLM-generated features for RL trading agents, revealing a gap between feature validity and policy robustness under distribution shifts.
GitHub stars n/a Velocity flat History 1 snapshot LLM Applications Apr 13
CASK: Core-Aware Selective KV Compression for Reasoning Traces Ignore
CASK is a KV cache compression method for LLMs that preserves reasoning traces by partitioning into a core and mergeable scratch, outperforming existing methods at matched budgets.
GitHub stars n/a Velocity flat History 1 snapshot LLM Optimization Apr 13
Brief2Design: A Multi-phased, Compositional Approach to Prompt-based Graphic Design Ignore
A graphic design tool that supports professional designers by structuring ambiguous client briefs into visual elements.
GitHub stars n/a Velocity flat History 1 snapshot Generative Design Tools Apr 13
SCNO: Spiking Compositional Neural Operator -- Towards a Neuromorphic Foundation Model for Nuclear PDE Solving Ignore
A modular, spiking neural operator architecture for solving coupled PDEs, offering compositional expansion and reduced parameter count.
GitHub stars n/a Velocity flat History 1 snapshot Scientific AI Apr 13
Efficient KernelSHAP Explanations for Patch-based 3D Medical Image Segmentation Ignore
An efficient KernelSHAP framework for 3D medical image segmentation that reduces computation and improves interpretability of explanations.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 13
FM-Agent: Scaling Formal Methods to Large Systems via LLM-Based Hoare-Style Reasoning Ignore
An LLM-based framework for automated compositional reasoning and bug detection in large-scale software systems.
GitHub stars n/a Velocity flat History 1 snapshot Software Engineering Apr 13
Product Review Based on Optimized Facial Expression Detection Ignore
A faster and accurate facial expression recognition method for product review by optimizing feature extraction.
GitHub stars n/a Velocity flat History 1 snapshot Computer Vision Apr 13
SemaClaw: A Step Towards General-Purpose Personal AI Agents through Harness Engineering Ignore
An open-source multi-agent application framework for general-purpose personal AI agents, focusing on harness engineering.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 13
PRISM Risk Signal Framework: Hierarchy-Based Red Lines for AI Behavioral Risk Ignore
The PRISM framework defines hierarchy-based 'red lines' for AI behavioral risk, offering an anticipatory and measurable approach to AI safety.
GitHub stars n/a Velocity flat History 1 snapshot AI Safety Apr 13 Code
Measuring the Authority Stack of AI Systems: Empirical Analysis of 366,120 Forced-Choice Responses Across 8 AI Models Ignore
This research empirically maps the 'Authority Stack' of 8 AI models across 366,120 forced-choice responses to understand their value priorities, evidence preferences, and source trust hierarchies.
GitHub stars n/a Velocity flat History 1 snapshot AI System Analysis Apr 13 Code
Agentic Driving Coach: Robustness and Determinism of Agentic AI-Powered Human-in-the-Loop Cyber-Physical Systems Ignore
A reactor-model-of-computation approach using the Lingua Franca framework to address nondeterminism in agentic AI-powered human-in-the-loop cyber-physical systems, demonstrated with an agentic driving coach.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 13
ShapShift: Explaining Model Prediction Shifts with Subgroup Conditional Shapley Values Ignore
A novel Shapley value method for explaining prediction shifts in machine learning models by attributing them to changes in interpretable data subgroups.
GitHub stars n/a Velocity flat History 1 snapshot Model Interpretability Apr 13
Persona Non Grata: Single-Method Safety Evaluation Is Incomplete for Persona-Imbued LLMs Ignore
This research reveals that current LLM safety evaluations are incomplete, as they fail to account for different vulnerability profiles exposed by prompting versus activation steering, leading to a need for more comprehensive testing methodologies.
GitHub stars n/a Velocity flat History 1 snapshot LLM Safety Apr 13
3D-Anchored Lookahead Planning for Persistent Robotic Scene Memory via World-Model-Based MCTS Ignore
A system for robotic manipulation that uses Monte Carlo Tree Search with a 3D-consistent world model for improved spatial memory and replanning.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 13
Why Do Large Language Models Generate Harmful Content? Ignore
Identifies specific model layers and neurons responsible for harmful content generation in LLMs, providing insights for mitigation.
GitHub stars n/a Velocity flat History 1 snapshot LLM Safety Apr 13
Environmental Footprint of GenAI Research: Insights from the Moshi Foundation Model Ignore
An analysis of the environmental footprint of GenAI research, focusing on the compute and resource consumption of foundation model development.
GitHub 1865 stars Velocity flat History 1 snapshot AI Sustainability Apr 13 Pending
Lung Cancer Detection Using Deep Learning Ignore
This paper explores deep learning algorithms like InceptionV3, MobileNetV2, VGG16, and ResNet152 for lung cancer detection, proposing a 16-layer CNN model.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 12
Physics-Informed State Space Models for Reliable Solar Irradiance Forecasting in Off-Grid Systems Ignore
A thermodynamically consistent neural network for ultra-lightweight, zero-lag solar irradiance forecasting in off-grid systems.
GitHub stars n/a Velocity flat History 1 snapshot Forecasting AI Apr 13
Taking a Pulse on How Generative AI is Reshaping the Software Engineering Research Landscape Ignore
A survey analyzing the widespread adoption and implications of Generative AI in software engineering research practices.
GitHub stars n/a Velocity flat History 1 snapshot AI in Software Engineering Research Apr 13
Compliant But Unsatisfactory: The Gap Between Auditing Standards and Practices for Probabilistic Genotyping Software Ignore
Examines the disconnect between audit standards and actual practices for probabilistic genotyping software, highlighting how poorly designed standards can mask inadequate systems.
GitHub stars n/a Velocity flat History 1 snapshot AI Governance Apr 13
LLMs for Qualitative Data Analysis Fail on Security-specificComments in Human Experiments Ignore
LLMs struggle to reliably identify security-specific comments in human experiment data, failing to replace human annotators.
GitHub stars n/a Velocity flat History 1 snapshot LLM Application Apr 12
Reasoning as Data: Representation-Computation Unity and Its Implementation in a Domain-Algebraic Inference Engine Ignore
A novel symbolic engine unifies knowledge representation and computation by embedding domain context directly into data structures, enabling domain-scoped inference without external rules.
GitHub stars n/a Velocity flat History 1 snapshot Knowledge Representation Apr 13
SLALOM: Simulation Lifecycle Analysis via Longitudinal Observation Metrics for Social Simulation Ignore
A framework for validating LLM agent simulations by assessing the sociological plausibility of their trajectories, not just final outcomes.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents Apr 13
A molecular clock for writing systems reveals the quantitative impact of imperial power on cultural evolution Ignore
Analyzes the evolution of writing systems using a molecular clock approach, revealing the quantitative impact of imperial power on cultural change and script extinction.
GitHub stars n/a Velocity flat History 1 snapshot Cultural Evolution Apr 13
Evolving Many Worlds: Towards Open-Ended Discovery in Petri Dish NCA via Population-Based Training Ignore
A meta-evolutionary algorithm that evolves Neural Cellular Automata to generate sustained, open-ended complexity and emergent lifelike phenomena.
GitHub stars n/a Velocity flat History 1 snapshot Artificial Life Apr 13
Query Lower Bounds for Diffusion Sampling Ignore
Establishes theoretical lower bounds for score query acceleration in diffusion model sampling.
GitHub stars n/a Velocity flat History 1 snapshot Generative Models Apr 12
Lectures on AI for Mathematics Ignore
A book introducing the principles and applications of AI for advancing mathematical research, pattern discovery, and theorem proving.
GitHub stars n/a Velocity flat History 1 snapshot AI for Mathematics Apr 13
Enabling and Inhibitory Pathways of Students' AI Use Concealment Intention in Higher Education: Evidence from SEM and fsQCA Ignore
Investigating student AI use concealment intentions through cognitive, affective, and conative pathways.
GitHub stars n/a Velocity flat History 1 snapshot Educational AI Apr 13 Code
From Agent Loops to Structured Graphs:A Scheduler-Theoretic Framework for LLM Agent Execution Ignore
A theoretical framework for structuring LLM agent execution using graph-based methods to improve controllability and verifiability.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents Apr 13
Optimal Stability of KL Divergence under Gaussian Perturbations Ignore
A theoretical framework for understanding KL divergence stability under Gaussian perturbations for arbitrary distributions.
GitHub stars n/a Velocity flat History 1 snapshot Theoretical ML Apr 13
Speaking to No One: Ontological Dissonance and the Double Bind of Conversational AI Ignore
Conversational AI can induce delusional experiences through ontological dissonance and communicative double binds, impacting user mental well-being.
GitHub stars n/a Velocity flat History 1 snapshot Conversational AI Apr 12
Layerwise Dynamics for In-Context Classification in Transformers Ignore
This paper explores the internal dynamics of in-context classification within Transformers, identifying an emergent, geometry-driven update rule.
GitHub stars n/a Velocity flat History 1 snapshot Transformers Apr 13
Limited Perfect Monotonical Surrogates constructed using low-cost recursive linkage discovery with guaranteed output Ignore
A parameterless surrogate model that can be trained on the fly to enable efficient optimization of complex problems by comparing solutions.
GitHub stars n/a Velocity flat History 1 snapshot Optimization Apr 13
Harnessing Photonics for Machine Intelligence Ignore
A review of integrated photonics for AI acceleration, focusing on cross-layer co-design and electronic-photonic design automation.
GitHub stars n/a Velocity flat History 1 snapshot AI Hardware Acceleration Apr 12
On the Complexity of the Discussion-based Semantics in Abstraction Argumentation Ignore
This paper analyzes the complexity of discussion-based semantics in argumentation theory, reducing it to automata equivalence for polynomial-time decidability.
GitHub stars n/a Velocity flat History 1 snapshot AI Theory Apr 13
Minimizing classical resources in variational measurement-based quantum computation for generative modeling Ignore
A restricted variational measurement-based quantum computation model that uses fewer parameters for generative modeling.
GitHub stars n/a Velocity flat History 1 snapshot Quantum Generative Modeling Apr 13
Endogenous Information in Routing Games: Memory-Constrained Equilibria, Recall Braess Paradoxes, and Memory Design Ignore
This paper explores theoretical models of routing games where travelers' route choices are influenced by memory and recall mechanisms, introducing new equilibrium concepts and paradoxes.
GitHub stars n/a Velocity flat History 1 snapshot Game Theory / Optimization Apr 13
A Quantitative Definition of Intelligence Ignore
Proposes a quantitative definition of intelligence based on intelligence density, aiming to provide a substrate-independent continuum from logic gates to brains.
GitHub stars n/a Velocity flat History 1 snapshot AI Theory Apr 13
Examining EAP Students' AI Disclosure Intention: A Cognition-Affect-Conation Perspective Ignore
Examining the psychological factors influencing EAP students' intention to disclose AI use in academic writing, highlighting the importance of institutional policies and pedagogical environments.
GitHub stars n/a Velocity flat History 1 snapshot AI Ethics Apr 13
AI Integrity: A New Paradigm for Verifiable AI Governance Ignore
AI Integrity proposes a new paradigm for verifiable AI governance by focusing on the auditable reasoning process rather than just outcomes.
GitHub stars n/a Velocity flat History 1 snapshot AI Governance Apr 13