Gym-Anything: Turn any Software into an Agent Environment Build Now
Turn any software application into an interactive agent environment, enabling autonomous system training and evaluation.
GitHub stars n/a Velocity flat History 1 snapshot AI Deployment Apr 7 Code High viability
ACE-Bench: Agent Configurable Evaluation with Scalable Horizons and Controllable Difficulty under Lightweight Environments Build Now
A benchmark for agent evaluation with scalable horizons and controllable difficulty in lightweight environments, addressing limitations of existing benchmarks.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 7 Code High viability
Flowr -- Scaling Up Retail Supply Chain Operations Through Agentic AI in Large Scale Supermarket Chains Build Now
Flowr automates end-to-end retail supply chain operations for large supermarket chains using agentic AI.
GitHub stars n/a Velocity flat History 1 snapshot Retail Automation Apr 7 Code High viability
Scientific Graphics Program Synthesis via Dual Self-Consistency Reinforcement Learning Build Now
A framework for synthesizing scientific graphics from text descriptions, featuring a large-scale dataset, a comprehensive benchmark, and a novel reinforcement learning approach that outperforms leading LLMs.
GitHub stars n/a Velocity flat History 1 snapshot Generative Graphics Apr 7 Code High viability
Lightweight Multimodal Adaptation of Vision Language Models for Species Recognition and Habitat Context Interpretation in Drone Thermal Imagery Build Now
Lightweight adaptation of vision-language models for species recognition and habitat interpretation using drone thermal imagery.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal AI Apr 7 Code High viability
Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents Build Now
Claw-Eval: A trustworthy evaluation suite for autonomous agents that provides trajectory-aware grading, safety, and robustness assessment.
GitHub stars n/a Velocity flat History 1 snapshot Autonomous Agents Apr 7 Code High viability
Compiled AI: Deterministic Code Generation for LLM-Based Workflow Automation Build Now
Compiled AI generates deterministic code from LLMs for reliable and cost-efficient enterprise workflow automation, especially in healthcare.
GitHub stars n/a Velocity flat History 1 snapshot LLM Workflow Automation Apr 6 Code High viability
PCA-Driven Adaptive Sensor Triage for Edge AI Inference Build Now
PCA-Triage is an unsupervised, parameter-free algorithm for adaptive sensor sampling on edge devices, significantly reducing bandwidth while maintaining high inference accuracy.
GitHub stars n/a Velocity flat History pending Edge AI Apr 6 Code High viability
ID-Sim: An Identity-Focused Similarity Metric Build Now
ID-Sim is a novel feed-forward metric that accurately reflects human selective sensitivity to identities, accelerating progress in personalized image generation and identity-focused tasks.
GitHub stars n/a Velocity flat History pending Computer Vision Apr 6 Code High viability
MedGemma 1.5 Technical Report Build Now
Introduces MedGemma 1.5, a multimodal foundation model for medical AI, integrating imaging, EHRs, and clinical reasoning with significant performance gains.
GitHub stars n/a Velocity flat History pending Medical AI Apr 6 Code High viability
ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces Build Now
ClawsBench provides a realistic, stateful benchmark for evaluating LLM productivity agents across multiple services, revealing significant capability and safety gaps.
GitHub stars n/a Velocity flat History pending LLM Agents Apr 6 Code High viability
Instruction-Tuned LLMs for Parsing and Mining Unstructured Logs on Leadership HPC Systems Build Now
A domain-adapted, instruction-tuned LLM framework for parsing and mining unstructured HPC logs, achieving state-of-the-art accuracy with a locally deployable, energy-efficient approach.
GitHub stars n/a Velocity flat History pending LLM Training Apr 6 Code High viability
VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG Build Now
VideoStir is a structured, intent-aware RAG framework for understanding long videos by leveraging spatio-temporal graphs and an MLLM-backed relevance scorer.
GitHub stars n/a Velocity flat History pending Long Video Understanding Apr 7 Code High viability
3DTurboQuant: Training-Free Near-Optimal Quantization for 3D Reconstruction Models Build Now
A training-free method to compress 3D reconstruction models like 3DGS and NeRF by up to 7.9x with minimal fidelity loss, enabling faster deployment and reduced storage.
GitHub stars n/a Velocity flat History pending 3D Reconstruction Compression Apr 7 Pending High viability
Unifying VLM-Guided Flow Matching and Spectral Anomaly Detection for Interpretable Veterinary Diagnosis Build Now
A novel veterinary diagnostic system uses Vision-Language Models to guide Flow Matching for precise localization and Random Matrix Theory for interpretable anomaly detection in canine pneumothorax.
GitHub 0 stars Velocity flat History 1 snapshot Medical AI Apr 7 Pending High viability
LanG -- A Governance-Aware Agentic AI Platform for Unified Security Operations Build Now
An open-source, governance-aware AI platform for unified security operations that unifies incident context, orchestrates agents, generates security rules, reconstructs attacks, and enforces AI governance policies.
GitHub stars n/a Velocity flat History pending Security AI Apr 7 Code High viability
SCMAPR: Self-Correcting Multi-Agent Prompt Refinement for Complex-Scenario Text-to-Video Generation Build Now
A multi-agent framework that refines text prompts for complex text-to-video scenarios, improving alignment and generation quality with a new benchmark.
GitHub stars n/a Velocity flat History pending Text-to-Video Generation Apr 7 Code High viability
COSMO-Agent: Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration Build Now
An LLM-powered agent that orchestrates CAD-CAE tools to automate industrial design optimization, validated on an industry-aligned dataset.
GitHub stars n/a Velocity flat History pending Agents Apr 7 Code High viability
A Formal Security Framework for MCP-Based AI Agents: Threat Taxonomy, Verification Models, and Defense Mechanisms Build Now
MCPSHIELD is a formal security framework for MCP-based AI agents, providing a threat taxonomy, verification models, and an integrated defense architecture.
GitHub stars n/a Velocity flat History pending Agent Security Apr 7 Code High viability
Epistemic Blinding: An Inference-Time Protocol for Auditing Prior Contamination in LLM-Assisted Analysis Build Now
Epistemic blinding enhances auditability in LLM-assisted drug target prioritization, addressing contamination in outputs.
GitHub stars n/a Velocity flat History pending Drug Target Prioritization Apr 7 Pending High viability
Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition Build Now
Market-Bench is a benchmark for evaluating LLMs in economic and trade competition, revealing significant performance disparities in multi-agent supply chain scenarios.
GitHub stars n/a Velocity flat History pending LLM Agents Apr 7 Code High viability
LUDOBENCH: Evaluating LLM Behavioural Decision-Making Through Spot-Based Board Game Scenarios in Ludo Build Now
A benchmark and simulator to evaluate LLM strategic decision-making in complex board games, revealing vulnerabilities in prompt sensitivity and strategic depth.
GitHub stars n/a Velocity flat History pending LLM Strategic Reasoning Apr 7 Code High viability
OmniDiagram: Advancing Unified Diagram Code Generation via Visual Interrogation Reward Build Now
A unified framework for diagram code generation that uses visual feedback to train models without manual code annotation, establishing a new state-of-the-art.
GitHub stars n/a Velocity flat History pending Generative Diagram Code Apr 7 Code High viability
CritBench: A Framework for Evaluating Cybersecurity Capabilities of Large Language Models in IEC 61850 Digital Substation Environments Build Now
CritBench evaluates LLM cybersecurity capabilities in IEC 61850 environments, addressing critical gaps in existing frameworks.
GitHub stars n/a Velocity flat History pending Cybersecurity Apr 7 Pending High viability
Thinking Diffusion: Penalize and Guide Visual-Grounded Reasoning in Diffusion Multimodal Language Models Build Now
Enhancing diffusion multimodal language models with penalties and guidance to improve visual grounding and reasoning accuracy while accelerating inference.
GitHub stars n/a Velocity flat History pending Multimodal LLMs Apr 7 Code High viability
Vision-Guided Iterative Refinement for Frontend Code Generation Build Now
A vision-guided iterative refinement framework for frontend code generation, using a vision-language model as an automated critic.
GitHub stars n/a Velocity flat History 1 snapshot Code Generation Apr 7 Code High viability
DiffHDR: Re-Exposing LDR Videos with Video Diffusion Models Build Now
DiffHDR transforms LDR videos into HDR by leveraging advanced video diffusion models for superior visual quality.
GitHub stars n/a Velocity flat History 1 snapshot AI Video Enhancement Apr 7 Code High viability
An AI Teaching Assistant for Motion Picture Engineering Build Now
Revolutionize teaching in film and motion picture courses with AI-powered teaching assistants.
GitHub stars n/a Velocity flat History 1 snapshot AI in Education Apr 6 Code High viability
A Multi-Stage Validation Framework for Trustworthy Large-scale Clinical Information Extraction using Large Language Models Build Now
A multi-stage validation framework enables trustworthy large-scale clinical information extraction using LLMs without annotation-intensive evaluation, demonstrating feasibility for real-world deployment.
GitHub stars n/a Velocity flat History 1 snapshot Clinical AI Apr 7 Code High viability
MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Selection and Adaptive Control Build Now
Enhance multimodal applications with reasoning-enabled embeddings that outperform existing models.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal AI Apr 7 Code High viability
Watch Before You Answer: Learning from Visually Grounded Post-Training Build Now
Enhance video understanding models by leveraging visually grounded post-training to surpass state-of-the-art performance.
GitHub stars n/a Velocity flat History 1 snapshot AI/ML Apr 6 Code High viability
Beyond LLM-as-a-Judge: Deterministic Metrics for Multilingual Generative Text Evaluation Build Now
Develops deterministic learned metrics to replace costly and inconsistent LLM-based text evaluation, offering a scalable and reproducible alternative.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 6 Code High viability
IntentScore: Intent-Conditioned Action Evaluation for Computer-Use Agents Build Now
IntentScore evaluates and ranks actions for computer-use agents, improving task success rates by learning from offline GUI interactions.
GitHub stars n/a Velocity flat History pending AI Agents Apr 6 Code High viability
PaperOrchestra: A Multi-Agent Framework for Automated AI Research Paper Writing Build Now
A multi-agent framework that automates the writing of AI research papers from raw materials, including literature synthesis and visuals.
GitHub stars n/a Velocity flat History pending AI Research Automation Apr 6 Code High viability
Vintix II: Decision Pre-Trained Transformer is a Scalable In-Context Reinforcement Learner Build Now
A scalable Decision Pre-Trained Transformer trained with Flow Matching achieves strong generalization in multi-domain in-context reinforcement learning, offering a viable alternative to expert distillation for generalist agents.
GitHub stars n/a Velocity flat History pending Reinforcement Learning Agents Apr 6 Code High viability
Planning to Explore: Curiosity-Driven Planning for LLM Test Generation Build Now
Curiosity-driven planning for LLMs enhances test generation by prioritizing exploration of program branches for deeper code coverage.
GitHub stars n/a Velocity flat History pending LLM Test Generation Apr 6 Code High viability
This Treatment Works, Right? Evaluating LLM Sensitivity to Patient Question Framing in Medical QA Build Now
This research evaluates LLM sensitivity to patient question framing in medical QA, highlighting a critical need for robust phrasing in high-stakes applications.
GitHub stars n/a Velocity flat History pending Medical AI Apr 6 Code High viability
EffiPair: Improving the Efficiency of LLM-generated Code with Relative Contrastive Feedback Build Now
EffiPair is an inference-time framework that uses relative contrastive feedback to significantly improve the runtime and memory efficiency of LLM-generated code without model fine-tuning.
GitHub stars n/a Velocity flat History pending LLM Code Optimization Apr 6 Code High viability
Dynamic Linear Coregionalization for Realistic Synthetic Multivariate Time Series Build Now
A dynamic model for generating realistic synthetic multivariate time series with time-varying correlations to improve foundation model training.
GitHub stars n/a Velocity flat History pending Synthetic Data Generation Apr 6 Code High viability
XMark: Reliable Multi-Bit Watermarking for LLM-Generated Texts Build Now
XMark is a novel multi-bit watermarking method for LLM-generated text that reliably embeds messages with high decoding accuracy and preserves text quality, even with limited tokens.
GitHub stars n/a Velocity flat History pending LLM Security Apr 6 Pending High viability
Learning to Focus: CSI-Free Hierarchical MARL for Reconfigurable Reflectors Build Now
A CSI-free hierarchical MARL system for reconfigurable reflectors that achieves significant RSSI improvements and robust multi-user scalability for intelligent wireless environments.
GitHub stars n/a Velocity flat History pending Wireless Networks Apr 6 Code High viability
Modality-Aware and Anatomical Vector-Quantized Autoencoding for Multimodal Brain MRI Build Now
NeuroQuant is a modality-aware VQ-VAE for multimodal brain MRI reconstruction, capturing anatomical structures and appearance for improved generative modeling and analysis.
GitHub stars n/a Velocity flat History pending Medical AI Apr 6 Code High viability
OrthoFuse: Training-free Riemannian Fusion of Orthogonal Style-Concept Adapters for Diffusion Models Build Now
A training-free method to fuse orthogonal style and concept adapters for diffusion models, enabling combined feature generation.
GitHub stars n/a Velocity flat History pending Generative AI Apr 6 Pending High viability
LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows Build Now
A novel 3D reconstruction model that significantly improves fine-grained texture and appearance recovery by scaling transformer context windows, outperforming state-of-the-art.
GitHub stars n/a Velocity flat History pending 3D Reconstruction Apr 6 Code High viability
RoboPlayground: Democratizing Robotic Evaluation through Structured Physical Domains Build Now
A framework for democratizing robotic evaluation by enabling natural language task authoring and crowd-sourced contribution.
GitHub stars n/a Velocity flat History pending Robotics Apr 6 Code High viability
EAGLE: Edge-Aware Graph Learning for Proactive Delivery Delay Prediction in Smart Logistics Networks Build Now
EAGLE is a hybrid deep learning framework that proactively predicts delivery delays in logistics networks by jointly modeling temporal dynamics and graph-based dependencies.
GitHub stars n/a Velocity flat History pending Logistics Prediction Apr 6 Code High viability
Improving Clinical Trial Recruitment using Clinical Narratives and Large Language Models Build Now
Leveraging LLMs with RAG and summarization to significantly improve clinical trial patient screening efficiency.
GitHub stars n/a Velocity flat History pending Medical AI Apr 6 Code High viability
Not All Turns Are Equally Hard: Adaptive Thinking Budgets For Efficient Multi-Turn Reasoning Build Now
TAB: Turn-Adaptive Budgets, a novel policy for LLMs that optimizes token allocation in multi-turn reasoning to improve accuracy while significantly reducing compute costs.
GitHub stars n/a Velocity flat History pending LLM Reasoning Apr 6 Code High viability
$π^2$: Structure-Originated Reasoning Data Improves Long-Context Reasoning Ability of Large Language Models Build Now
A pipeline for curating reasoning data from structured sources to enhance LLM long-context reasoning capabilities, with open-source code and data.
GitHub stars n/a Velocity flat History pending LLM Reasoning Apr 6 Pending High viability
Attribution Bias in Large Language Models Build Now
A new benchmark dataset and evaluation framework to identify and mitigate attribution bias in large language models.
GitHub stars n/a Velocity flat History pending LLM Fairness Apr 6 Code High viability
Uncertainty-Guided Latent Diagnostic Trajectory Learning for Sequential Clinical Diagnosis Build Now
A framework for sequential clinical diagnosis that uses uncertainty-guided latent trajectory learning to improve accuracy and reduce diagnostic tests.
GitHub stars n/a Velocity flat History pending Medical AI Apr 6 Code High viability
MMORF: A Multi-agent Framework for Designing Multi-objective Retrosynthesis Planning Systems Build Now
A framework for building multi-agent systems to solve complex chemistry retrosynthesis planning problems with improved safety and cost metrics.
GitHub stars n/a Velocity flat History pending Agents Apr 6 Code High viability
Reasoning Through Chess: How Reasoning Evolves from Data Through Fine-Tuning and Reinforcement Learning Build Now
This research analyzes how reasoning evolves in language models for chess through fine-tuning and reinforcement learning, demonstrating improved performance and faithful reasoning with released checkpoints and code.
GitHub stars n/a Velocity flat History pending LLM Reasoning Apr 6 Code High viability
PRISM-MCTS: Learning from Reasoning Trajectories with Metacognitive Reflection Build Now
PRISM-MCTS enhances reasoning models by learning from trajectories with metacognitive reflection, reducing computational redundancy and improving efficiency.
GitHub stars n/a Velocity flat History pending Reasoning Models Apr 7 Code High viability
Auditable Agents Build Now
A framework for auditable LLM agents that ensures accountability through a multi-dimensional approach to tracking and recovering agent actions.
GitHub stars n/a Velocity flat History pending LLM Agents Apr 7 Code High viability
Bridging Natural Language and Microgrid Dynamics: A Context-Aware Simulator and Dataset Build Now
OpenCEM is an open-source simulator and dataset that integrates contextual information with renewable energy dynamics for intelligent energy management.
GitHub stars n/a Velocity flat History pending Energy Management Apr 7 Code High viability
MA-IDS: Multi-Agent RAG Framework for IoT Network Intrusion Detection with an Experience Library Build Now
A multi-agent system using LLMs and RAG with an experience library for explainable, self-improving IoT network intrusion detection.
GitHub stars n/a Velocity flat History pending IoT Security Apr 7 Code High viability
Evaluation of Randomization through Style Transfer for Enhanced Domain Generalization Build Now
A lightweight, model-agnostic style transfer augmentation recipe that significantly improves computer vision model generalization from synthetic to real-world data.
GitHub stars n/a Velocity flat History pending Computer Vision Domain Generalization Apr 7 Code High viability
EEG-MFTNet: An Enhanced EEGNet Architecture with Multi-Scale Temporal Convolutions and Transformer Fusion for Cross-Session Motor Imagery Decoding Build Now
EEG-MFTNet: An enhanced EEGNet architecture with multi-scale temporal convolutions and Transformer fusion for robust cross-session motor imagery decoding.
GitHub stars n/a Velocity flat History pending BCI Apr 7 Code High viability
Breakthrough the Suboptimal Stable Point in Value-Factorization-Based Multi-Agent Reinforcement Learning Build Now
A novel Multi-Round Value Factorization framework that breaks suboptimal stable points in multi-agent reinforcement learning by iteratively filtering inferior actions.
GitHub stars n/a Velocity flat History pending Multi-Agent Reinforcement Learning Apr 7 Code High viability
Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition Build Now
A reward decomposition approach that disentangles sycophancy in language models by separating pressure capitulation from evidence blindness, leading to more robust and factual responses.
GitHub stars n/a Velocity flat History pending LLM Alignment Apr 7 Code High viability
From Retinal Evidence to Safe Decisions: RETINA-SAFE and ECRT for Hallucination Risk Triage in Medical LLMs Build Now
RETINA-SAFE and ECRT offer a practical, interpretable solution for triaging hallucination risks in medical LLMs by grounding detection in retinal evidence and classifying risk types.
GitHub stars n/a Velocity flat History pending Medical AI Safety Apr 7 Code High viability
Learning What Matters: Dynamic Dimension Selection and Aggregation for Interpretable Vision-Language Reward Modeling Build Now
A framework for interpretable vision-language reward modeling that dynamically decomposes evaluation into granular, weighted dimensions.
GitHub stars n/a Velocity flat History pending Vision-Language Models Apr 7 Code High viability
ETR: Entropy Trend Reward for Efficient Chain-of-Thought Reasoning Build Now
This research introduces a novel reward mechanism (ETR) to significantly improve the efficiency and accuracy of large language model chain-of-thought reasoning by guiding uncertainty reduction.
GitHub stars n/a Velocity flat History pending LLM Reasoning Optimization Apr 7 Pending High viability
Automated Auditing of Hospital Discharge Summaries for Care Transitions Build Now
An automated framework using locally deployed LLMs to audit hospital discharge summaries for care transitions, identifying key documentation elements to improve patient safety and reduce readmissions.
GitHub stars n/a Velocity flat History pending Healthcare AI Apr 7 Code High viability
Towards Effective In-context Cross-domain Knowledge Transfer via Domain-invariant-neurons-based Retrieval Build Now
A retrieval method that boosts LLM reasoning by finding structurally compatible cross-domain demonstrations using domain-invariant neurons.
GitHub stars n/a Velocity flat History pending LLM Reasoning Apr 7 Pending High viability
Human Interaction-Aware 3D Reconstruction from a Single Image Build Now
A holistic method for reconstructing textured 3D human models from a single image, explicitly modeling group-level context and interaction priors to resolve occlusions and generate physically plausible, high-fidelity reconstructions of interacting people.
GitHub stars n/a Velocity flat History pending 3D Reconstruction Apr 7 Code High viability
Region-R1: Reinforcing Query-Side Region Cropping for Multi-Modal Re-Ranking Build Now
Region-R1 enhances multi-modal retrieval by intelligently cropping query images to focus on relevant regions, significantly improving performance on challenging benchmarks.
GitHub stars n/a Velocity flat History pending Multi-Modal Retrieval Apr 7 Code High viability
LLM4CodeRE: Generative AI for Code Decompilation Analysis and Reverse Engineering Build Now
LLM4CodeRE is a domain-adaptive LLM framework for bidirectional code reverse engineering, outperforming existing tools in malware analysis.
GitHub stars n/a Velocity flat History pending Code Decompilation Apr 7 Code High viability
CRFT: Consistent-Recurrent Feature Flow Transformer for Cross-Modal Image Registration Build Now
CRFT is a transformer-based framework for cross-modal image registration, offering robust alignment for applications in remote sensing, autonomous navigation, and medical imaging.
GitHub stars n/a Velocity flat History pending Image Registration Apr 7 Pending High viability
CuraLight: Debate-Guided Data Curation for LLM-Centered Traffic Signal Control Build Now
An LLM-centered framework for traffic signal control that uses RL-guided data curation and multi-LLM deliberation to outperform state-of-the-art.
GitHub stars n/a Velocity flat History pending Agents Apr 7 Code High viability
Context-Agent: Dynamic Discourse Trees for Non-Linear Dialogue Build Now
Context-Agent models multi-turn dialogue history as a dynamic tree structure to improve LLM coherence and efficiency in non-linear conversations, supported by a new benchmark.
GitHub stars n/a Velocity flat History pending Dialogue Systems Apr 7 Code High viability
HybridKV: Hybrid KV Cache Compression for Efficient Multimodal Large Language Model Inference Build Now
HybridKV is a novel framework that significantly compresses multimodal LLM KV caches, reducing memory usage and latency by up to 7.9x and 1.52x respectively, with minimal performance impact.
GitHub stars n/a Velocity flat History pending LLM Inference Optimization Apr 7 Code High viability
Graph-PiT: Enhancing Structural Coherence in Part-Based Image Synthesis via Graph Priors Build Now
Graph-PiT enhances part-based image synthesis by using graph priors to model structural dependencies between visual components, improving coherence and controllability.
GitHub stars n/a Velocity flat History pending Generative Image Apr 7 Pending High viability
Deep Researcher Agent: An Autonomous Framework for 24/7 Deep Learning Experimentation with Zero-Cost Monitoring Build Now
An autonomous framework for 24/7 deep learning experimentation with zero-cost monitoring and efficient memory management.
GitHub stars n/a Velocity flat History pending Agents Apr 7 Pending High viability
Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills Build Now
A retrieval layer for large agent skill libraries that constructs an executable skill graph and retrieves dependency-aware skill bundles at inference time.
GitHub stars n/a Velocity flat History pending Agent Skills Apr 7 Pending High viability
Analogical Reasoning as a Doctor: A Foundation Model for Gastrointestinal Endoscopy Diagnosis Build Now
A foundation model for gastrointestinal endoscopy diagnosis that uses analogical reasoning to improve generalization and adaptability across diverse datasets and disease types.
GitHub stars n/a Velocity flat History pending Medical AI Apr 7 Code High viability
Does Pass Rate Tell the Whole Story? Evaluating Design Constraint Compliance in LLM-based Issue Resolution Build Now
A benchmark and LLM-based verifier to evaluate code patch quality beyond test pass rates, revealing significant design constraint violations in current AI agents.
GitHub stars n/a Velocity flat History pending Agents Apr 7 Code High viability
The Model Agreed, But Didn't Learn: Diagnosing Surface Compliance in Large Language Models Build Now
A diagnostic framework to uncover 'surface compliance' in LLM knowledge editing, revealing when models mimic changes without true belief modification.
GitHub stars n/a Velocity flat History pending LLM Editing Apr 7 Pending High viability
ActivityEditor: Learning to Synthesize Physically Valid Human Mobility Build Now
ActivityEditor is a dual-LLM-agent framework for zero-shot cross-regional human mobility trajectory generation, ensuring physical validity and statistical fidelity.
GitHub stars n/a Velocity flat History pending Human Mobility Simulation Apr 7 Code High viability
UniCreative: Unifying Long-form Logic and Short-form Sparkle via Reference-Free Reinforcement Learning Build Now
UniCreative is a unified reference-free reinforcement learning framework that reconciles long-form narrative coherence with short-form creative expression.
GitHub stars n/a Velocity flat History pending Creative Writing AI Apr 7 Code High viability
Rectified Schrödinger Bridge Matching for Few-Step Visual Navigation Build Now
A framework for visual navigation that significantly reduces integration steps for diffusion-based policies, enabling real-time robotic control.
GitHub stars n/a Velocity flat History pending Embodied AI Navigation Apr 7 Pending High viability
PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer Build Now
A novel linear-time attention replacement that significantly reduces computational cost for long sequences across diverse AI domains.
GitHub stars n/a Velocity flat History pending LLM Architecture Apr 7 Pending High viability
Beyond Compromise: Pareto-Lenient Consensus for Efficient Multi-Preference LLM Alignment Build Now
Pareto-Lenient Consensus (PLC) is a game-theoretic framework for LLM alignment that allows negotiation-driven exploration of the Pareto-optimal frontier.
GitHub stars n/a Velocity flat History pending LLM Alignment Apr 7 Code High viability
Attention Editing: A Versatile Framework for Cross-Architecture Attention Conversion Build Now
A framework to efficiently convert existing large language models to use new attention architectures without retraining, significantly improving inference speed.
GitHub stars n/a Velocity flat History pending LLM Inference Optimization Apr 7 Code High viability
Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents Build Now
STEP-HRL is a hierarchical reinforcement learning framework for LLM agents that enables step-level learning, reducing computational cost and improving performance.
GitHub stars n/a Velocity flat History pending LLM Agents Apr 7 Pending High viability
OntoTKGE: Ontology-Enhanced Temporal Knowledge Graph Extrapolation Build Now
OntoTKGE is a novel framework that enhances temporal knowledge graph extrapolation by integrating ontological knowledge to address entity sparsity and improve prediction performance.
GitHub stars n/a Velocity flat History pending Knowledge Graph Embeddings Apr 7 Code High viability
In-Place Test-Time Training Build Now
A drop-in framework for Large Language Models that enables test-time training for dynamic adaptation to new information without costly retraining.
GitHub stars n/a Velocity flat History pending LLM Adaptation Apr 7 Pending High viability
Saliency-Guided Representation with Consistency Policy Learning for Visual Unsupervised Reinforcement Learning Build Now
A framework that enhances zero-shot generalization in visual unsupervised reinforcement learning.
GitHub stars n/a Velocity flat History pending Reinforcement Learning Apr 7 Code High viability
JTON: A Token-Efficient JSON Superset with Zen Grid Tabular Encoding for Large Language Models Build Now
JTON is a token-efficient JSON superset that reduces LLM processing costs for structured data by up to 60% with a novel tabular encoding.
GitHub stars n/a Velocity flat History pending LLM Data Serialization Apr 7 Pending High viability
LAG-XAI: A Lie-Inspired Affine Geometric Framework for Interpretable Paraphrasing in Transformer Latent Spaces Build Now
LAG-XAI is a geometric framework for interpretable paraphrasing in Transformers, enabling efficient LLM hallucination detection and mechanistic understanding.
GitHub stars n/a Velocity flat History pending Interpretable AI Apr 7 Code High viability
Can We Trust a Black-box LLM? LLM Untrustworthy Boundary Detection via Bias-Diffusion and Multi-Agent Reinforcement Learning Build Now
A novel algorithm detects untrustworthy topic boundaries in black-box LLMs using knowledge graphs and multi-agent reinforcement learning, with a new dataset released for popular LLMs.
GitHub stars n/a Velocity flat History pending LLM Safety & Alignment Apr 7 Code High viability
Generating Synthetic Doctor-Patient Conversations for Long-form Audio Summarization Build Now
A synthetic data generation pipeline for doctor-patient conversations to train and evaluate long-form audio summarization models.
GitHub stars n/a Velocity flat History pending Medical AI Apr 7 Code High viability
SemLink: A Semantic-Aware Automated Test Oracle for Hyperlink Verification using Siamese Sentence-BERT Build Now
SemLink is a semantic-aware automated test oracle that verifies hyperlink integrity 47.5x faster than LLMs, ensuring web content consistency for robust quality assurance.
GitHub stars n/a Velocity flat History pending Web Quality Assurance Apr 7 Code High viability
Semantic-Topological Graph Reasoning for Language-Guided Pulmonary Screening Build Now
A framework for language-guided pulmonary screening that combines LLMs and vision models with graph reasoning and selective fine-tuning to improve accuracy and stability.
GitHub stars n/a Velocity flat History pending Medical AI Apr 7 Code High viability
Context-Value-Action Architecture for Value-Driven Large Language Model Agents Build Now
A novel architecture for LLM agents that enhances behavioral fidelity and reduces value polarization.
GitHub stars n/a Velocity flat History pending Agents Apr 7 Code High viability
Towards Trustworthy Report Generation: A Deep Research Agent with Progressive Confidence Estimation and Calibration Build Now
A deep research agent that generates trustworthy reports by progressively estimating and calibrating confidence in its generated claims.
GitHub stars n/a Velocity flat History pending Agents Apr 7 Code High viability
RAVEN: Radar Adaptive Vision Encoders for Efficient Chirp-wise Object Detection and Segmentation Build Now
RAVEN is a computationally efficient deep learning architecture for FMCW radar perception, enabling chirp-wise processing and early-exit mechanisms for faster object detection and segmentation.
GitHub stars n/a Velocity flat History pending Radar Perception Apr 6 Code High viability
Extending Tabular Denoising Diffusion Probabilistic Models for Time-Series Data Generation Watch
This work extends diffusion models for time-series data generation by incorporating temporal adapters and context-aware modules to produce temporally coherent synthetic sequences.
GitHub stars n/a Velocity flat History pending Time-Series Generation Apr 6 Code
Simultaneous Dual-View Mammogram Synthesis Using Denoising Diffusion Probabilistic Models Watch
A three-channel denoising diffusion probabilistic model synthesizes dual-view mammograms simultaneously, addressing data gaps and enabling cross-view AI applications in breast imaging.
GitHub stars n/a Velocity flat History pending Medical Imaging AI Apr 6 Code
ALTO: Adaptive LoRA Tuning and Orchestration for Heterogeneous LoRA Training Workloads Watch
ALTO accelerates LoRA hyperparameter tuning and improves GPU utilization by orchestrating heterogeneous fine-tuning jobs.
LLM Training Optimization Apr 7
Shot-Based Quantum Encoding: A Data-Loading Paradigm for Quantum Neural Networks Watch
Shot-Based Quantum Encoding (SBQE) for quantum neural networks that improves data loading efficiency and performance on noisy hardware.
GitHub stars n/a Velocity flat History pending Quantum Machine Learning Apr 7 Code
AI and Collective Decisions: Strengthening Legitimacy and Losers' Consent Watch
A system using an AI interviewer and interactive visualization to increase perceived legitimacy and trust in collective decision-making.
GitHub stars n/a Velocity flat History pending AI for Collective Decision-Making Apr 7 Code
SnapFlow: One-Step Action Generation for Flow-Matching VLAs via Progressive Self-Distillation Watch
SnapFlow compresses multi-step denoising in Vision-Language-Action models to a single forward pass, achieving state-of-the-art robotic manipulation with significantly reduced latency.
Robotics Apr 7
TRACE: Capability-Targeted Agentic Training Watch
An end-to-end system for agent self-improvement that identifies lacking capabilities and synthesizes targeted training environments using LoRA adapters.
Agent Training Apr 7
DQA: Diagnostic Question Answering for IT Support Watch
DQA is a diagnostic question-answering framework for IT support that maintains a persistent diagnostic state to systematically troubleshoot issues, significantly improving success rates and reducing resolution time.
IT Support AI Apr 7
ResearchEVO: An End-to-End Framework for Automated Scientific Discovery and Documentation Watch
ResearchEVO is an end-to-end framework that automates scientific discovery by evolving algorithms and generating publication-ready research papers, validated on quantum error correction and PINNs.
Automated Scientific Discovery Apr 7
PECKER: A Precisely Efficient Critical Knowledge Erasure Recipe For Machine Unlearning in Diffusion Models Watch
An efficient machine unlearning method for diffusion models that uses a saliency mask to prioritize parameter updates, reducing training time without sacrificing efficacy.
GitHub stars n/a Velocity flat History pending Machine Unlearning Apr 7 Code
CAKE: Cloud Architecture Knowledge Evaluation of Large Language Models Watch
CAKE is a new benchmark for evaluating LLMs on cloud architecture knowledge, revealing insights into model scaling and the impact of augmentation strategies.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 7 Code
HYVE: Hybrid Views for LLM Context Engineering over Machine Data Watch
HYVE is a framework for LLM context engineering that reduces token usage and improves output quality for machine data by using database principles for preprocessing and postprocessing.
GitHub stars n/a Velocity flat History pending LLM Context Engineering Apr 7 Code
"I See What You Did There": Can Large Vision-Language Models Understand Multimodal Puns? Watch
A pipeline for generating and understanding multimodal puns to enhance humor comprehension in VLMs.
GitHub stars n/a Velocity flat History pending Multimodal Understanding Apr 7 Code
ReLU Networks for Exact Generation of Similar Graphs Watch
A ReLU network architecture for generating graphs within a specified edit distance from a source graph.
GitHub stars n/a Velocity flat History pending Graph Generation Apr 7 Code
Non-monotonic causal discovery with Kolmogorov-Arnold Fuzzy Cognitive Maps Watch
Introduces Kolmogorov-Arnold Fuzzy Cognitive Maps (KA-FCMs) that use learnable B-spline functions to model non-monotonic causal relationships in complex dynamic systems while preserving interpretability.
GitHub stars n/a Velocity flat History pending Neuro-Symbolic AI Apr 6 Code
What Makes a Good Response? An Empirical Analysis of Quality in Qualitative Interviews Watch
This work empirically analyzes quality metrics for qualitative interview responses, identifying direct relevance to research questions as the strongest predictor of quality and introducing a new dataset.
GitHub stars n/a Velocity flat History pending NLP Evaluation Apr 6 Code
Phase-Associative Memory: Sequence Modeling in Complex Hilbert Space Watch
Phase-Associative Memory (PAM) is a complex-valued recurrent sequence model that shows competitive performance with transformers on WikiText-103, exploring non-classical contextuality in language modeling.
GitHub stars n/a Velocity flat History pending LLM Training Apr 6 Pending
Feature-Aware Anisotropic Local Differential Privacy for Utility-Preserving Graph Representation Learning in Metal Additive Manufacturing Watch
Proposes FI-LDP-HGAT, a privacy-preserving graph learning framework for metal additive manufacturing that balances utility and privacy by allocating noise based on feature importance.
GitHub stars n/a Velocity flat History pending Privacy-Preserving ML Apr 6 Code
Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing Watch
This paper provides a comprehensive analysis and benchmark of LLM-based automated penetration testing frameworks, identifying key architectural designs and empirical performance.
GitHub stars n/a Velocity flat History pending LLM Agents Apr 7 Code
Reason Analogically via Cross-domain Prior Knowledge: An Empirical Study of Cross-domain Knowledge Transfer for In-Context Learning Watch
This study validates the feasibility of cross-domain knowledge transfer for in-context learning, showing that source-domain demonstrations can improve target-domain inference despite semantic mismatch.
GitHub stars n/a Velocity flat History pending In-Context Learning Apr 7 Pending
What Models Know, How Well They Know It: Knowledge-Weighted Fine-Tuning for Learning When to Say "I Don't Know" Watch
This method uses knowledge-weighted fine-tuning to enable LLMs to express uncertainty and avoid hallucinations for out-of-scope queries.
GitHub stars n/a Velocity flat History pending LLM Uncertainty Apr 7 Code
From Large Language Model Predicates to Logic Tensor Networks: Neurosymbolic Offer Validation in Regulated Procurement Watch
A neurosymbolic system that uses LLMs and Logic Tensor Networks to validate offer documents for regulated procurement, offering interpretable and auditable decisions.
Neurosymbolic AI Apr 7
Joint Knowledge Base Completion and Question Answering by Combining Large Language Models and Small Language Models Watch
JCQL is a novel framework that jointly enhances Knowledge Base Completion and Question Answering by iteratively combining Large Language Models and Small Language Models, improving performance on both tasks.
GitHub stars n/a Velocity flat History pending LLM Agents Apr 7 Code
Social Dynamics as Critical Vulnerabilities that Undermine Objective Decision-Making in LLM Collectives Watch
This research reveals how social dynamics like conformity and persuasion undermine decision-making in LLM agent collectives, mirroring human biases.
GitHub stars n/a Velocity flat History pending LLM Agents Apr 7 Code
"OK Aura, Be Fair With Me": Demographics-Agnostic Training for Bias Mitigation in Wake-up Word Detection Watch
Demographics-agnostic training techniques significantly reduce bias in wake-up word detection across diverse speaker populations.
GitHub stars n/a Velocity flat History pending Voice AI Apr 7 Code
CODESTRUCT: Code Agents over Structured Action Spaces Watch
CODESTRUCT reframes code agents to operate on structured action spaces, improving accuracy and reducing costs by treating repositories as AST entities instead of text.
Code Agents Apr 7
SignalClaw: LLM-Guided Evolutionary Synthesis of Interpretable Traffic Signal Control Skills Watch
An LLM-guided evolutionary framework synthesizes interpretable traffic signal control skills with rationale and executable code for adaptive traffic management.
Traffic Control AI Apr 7
LLM-as-Judge for Semantic Judging of Powerline Segmentation in UAV Inspection Watch
Leveraging large language models as semantic judges to assess the reliability of power line segmentation in drone inspections.
AI for Inspection Apr 7
Dynamic Agentic AI Expert Profiler System Architecture for Multidomain Intelligence Modeling Watch
This system dynamically profiles user expertise in real-time during human-machine interactions, achieving high accuracy in classifying skill levels across diverse domains.
Agentic AI Profiling Apr 7
Controllable Singing Style Conversion with Boundary-Aware Information Bottleneck Watch
A novel singing style conversion system that advances fine-grained style conversion and control, achieving best naturalness performance in a challenge.
Audio AI Apr 7
When Do We Need LLMs? A Diagnostic for Language-Driven Bandits Watch
A diagnostic tool to determine when LLM-driven reasoning is necessary for sequential decision-making versus using lightweight numerical bandits.
GitHub stars n/a Velocity flat History pending LLM Decision Systems Apr 7 Code
FastDiSS: Few-step Match Many-step Diffusion Language Model on Sequence-to-Sequence Generation--Full Version Watch
FastDiSS is a training framework for diffusion language models that improves robustness to self-conditioning errors during few-step inference, achieving faster speeds and competitive quality.
GitHub stars n/a Velocity flat History pending Generative Models Apr 7 Code
MC-GenRef: Annotation-free mammography microcalcification segmentation with generative posterior refinement Watch
MC-GenRef enables annotation-free mammography microcalcification segmentation using synthetic data and test-time generative refinement to improve accuracy and robustness.
Medical Imaging AI Apr 6
From Use to Oversight: How Mental Models Influence User Behavior and Output in AI Writing Assistants Ignore
This research explores how users' mental models of AI writing assistants impact their control behavior and output quality, revealing a complex relationship between system understanding, trust, and oversight.
GitHub stars n/a Velocity flat History pending Human-AI Interaction Apr 6 Code
Bypassing the CSI Bottleneck: MARL-Driven Spatial Control for Reflector Arrays Ignore
AI-driven spatial control for reflector arrays bypasses computational bottlenecks in wireless networks using Multi-Agent Reinforcement Learning.
GitHub stars n/a Velocity flat History pending Wireless Networks Apr 6 Code
Curvature-Aware Optimization for High-Accuracy Physics-Informed Neural Networks Ignore
This paper introduces advanced optimization strategies for Physics-Informed Neural Networks (PINNs) to accelerate convergence and achieve high accuracy in solving complex differential equations.
GitHub stars n/a Velocity flat History pending Scientific ML Apr 6 Code
Offline RL for Adaptive Policy Retrieval in Prior Authorization Ignore
An adaptive policy retrieval system for prior authorization using offline RL to balance decision correctness with retrieval efficiency.
Healthcare AI Apr 6
CRAB: Codebook Rebalancing for Bias Mitigation in Generative Recommendation Ignore
A post-hoc debiasing strategy for generative recommendation systems that rebalances item tokenization to mitigate popularity bias and improve recommendation performance.
GitHub stars n/a Velocity flat History pending Recommendation Systems Apr 6 Code
Inventory of the 12 007 Low-Dimensional Pseudo-Boolean Landscapes Invariant to Rank, Translation, and Rotation Ignore
An exhaustive inventory of 12,007 invariant landscape classes for pseudo-Boolean functions to aid in benchmark design and algorithm understanding.
GitHub stars n/a Velocity flat History pending Optimization Benchmarking Apr 7 Code
LLMs Should Express Uncertainty Explicitly Ignore
LLMs should express uncertainty explicitly through calibrated confidence scores or reasoning-time markers to improve decision-making and error handling.
LLM Uncertainty Apr 7
QA-MoE: Towards a Continuous Reliability Spectrum with Quality-Aware Mixture of Experts for Robust Multimodal Sentiment Analysis Ignore
QA-MoE introduces a quality-aware mixture of experts for multimodal sentiment analysis, adapting to continuously varying input reliability for more robust performance.
Multimodal Sentiment Analysis Apr 7
Evaluating Learner Representations for Differentiation Prior to Instructional Outcomes Ignore
A novel metric to evaluate learner representations for differentiation in educational AI systems, independent of instructional outcomes.
GitHub stars n/a Velocity flat History pending Educational AI Apr 7 Code
Spec Kit Agents: Context-Grounded Agentic Workflows Ignore
Spec Kit Agents enhance AI coding assistants by grounding them in repository context, reducing hallucinations and architectural violations for more reliable software development.
AI Agents Apr 7
Experience Transfer for Multimodal LLM Agents in Minecraft Game Ignore
A transfer-oriented memory framework for multimodal LLM agents in Minecraft that decomposes experience into five dimensions for efficient task solving.
Multimodal Agents Apr 7
Turbulence-like 5/3 spectral scaling in contextual representations of language as a complex system Ignore
Discovers a consistent 5/3 spectral scaling in contextual language representations from transformer models, suggesting scale-free semantic integration.
GitHub stars n/a Velocity flat History pending LLM Analysis Apr 7 Code
Your LLM Agent Can Leak Your Data: Data Exfiltration via Backdoored Tool Use Ignore
A data exfiltration attack that embeds semantic triggers into fine-tuned LLM agents, enabling backdoored agents to invoke memory-access tool calls and exfiltrate stored user context via disguised retrieval tool calls.
GitHub stars n/a Velocity flat History pending LLM Security Apr 7 Code
Learned Elevation Models as a Lightweight Alternative to LiDAR for Radio Environment Map Estimation Ignore
A two-stage framework that predicts elevation maps from satellite RGB imagery, offering a lightweight alternative to LiDAR for radio environment map estimation.
Geospatial AI Apr 7
From Incomplete Architecture to Quantified Risk: Multimodal LLM-Driven Security Assessment for Cyber-Physical Systems Ignore
A multimodal LLM-driven prototype tool to reconstruct and analyze cyber-physical system architectures for security assessment when documentation is incomplete.
Cyber-Physical Systems Security Apr 7
Anchored Cyclic Generation: A Novel Paradigm for Long-Sequence Symbolic Music Generation Ignore
A novel paradigm for long-sequence symbolic music generation that mitigates error accumulation using anchor features and a hierarchical framework.
GitHub stars n/a Velocity flat History pending Generative Music Apr 7 Code
Broken by Default: A Formal Verification Study of Security Vulnerabilities in AI-Generated Code Ignore
A formal verification study revealing that over half of AI-generated code artifacts contain security vulnerabilities, with no frontier LLM achieving a passing grade.
AI Security Apr 7
Can Large Language Models Reinvent Foundational Algorithms? Ignore
This research explores the capability of Large Language Models to reinvent foundational computer science algorithms, demonstrating potential for AI-driven innovation in core computational concepts.
GitHub stars n/a Velocity flat History pending LLM Reasoning Apr 7 Code
Neural Network Pruning via QUBO Optimization Ignore
A hybrid QUBO optimization framework for neural network pruning that integrates gradient-aware metrics and data-driven similarity for improved compression.
GitHub stars n/a Velocity flat History pending Neural Network Pruning Apr 7 Code
Foundations for Agentic AI Investigations from the Forensic Analysis of OpenClaw Ignore
This paper provides foundational insights into the forensic analysis of agentic AI systems like OpenClaw, identifying recoverable traces and proposing an artifact taxonomy to aid digital investigations.
GitHub stars n/a Velocity flat History pending Agents Apr 7 Pending
Multiscale Physics-Informed Neural Network for Complex Fluid Flows with Long-Range Dependencies Ignore
A Domain-Decomposed and Shifted Physics-Informed Neural Network (DDS-PINN) framework for complex fluid flows that resolves multiscale interactions with minimal supervision.
GitHub stars n/a Velocity flat History pending Scientific ML Apr 7 Code
Swiss-Bench 003: Evaluating LLM Reliability and Adversarial Security for Swiss Regulatory Contexts Ignore
A new benchmark and evaluation framework for LLM reliability and adversarial security tailored for Swiss financial and regulatory contexts.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 7 Code
Nidus: Externalized Reasoning for AI-Assisted Engineering Ignore
Presents Nidus, a governance runtime that mechanizes the V-model for AI-assisted software delivery, ensuring engineering invariants through externalized reasoning.
AI Engineering Apr 6
Part-Level 3D Gaussian Vehicle Generation with Joint and Hinge Axis Estimation Ignore
A generative framework for creating animatable 3D vehicle models from single or sparse multi-view images for realistic autonomous driving simulation.
Generative 3D Apr 6
From Governance Norms to Enforceable Controls: A Layered Translation Method for Runtime Guardrails in Agentic AI Ignore
This paper proposes a layered translation method to connect governance norms to enforceable runtime guardrails for agentic AI systems.
GitHub stars n/a Velocity flat History pending Agents Apr 6 Code
Edit, But Verify: An Empirical Audit of Instructed Code-Editing Benchmarks Ignore
An empirical audit of instructed code-editing benchmarks reveals significant gaps compared to real-world usage, proposing desiderata for more representative benchmarks.
GitHub stars n/a Velocity flat History pending Code Generation & Editing Apr 6 Code
AutoLALA: Automatic Loop Algebraic Locality Analysis for AI and HPC Kernels Ignore
An open-source tool that automatically analyzes data locality and movement complexity in AI and HPC kernels to optimize performance.
AI/HPC Optimization Apr 6
Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement Ignore
A theoretical exploration of multi-token prediction for LLMs, proposing a method to reduce structural hallucinations in latent space representations.
LLM Training Apr 7
On the Role of Fault Localization Context for LLM-Based Program Repair Ignore
This research empirically studies the impact of fault localization context on LLM-based program repair, finding that file-level context is dominant and more context doesn't always improve performance.
LLM for Code Apr 7
Selective Aggregation of Attention Maps Improves Diffusion-Based Visual Interpretation Ignore
This research proposes a method to improve the interpretability of text-to-image models by selectively aggregating attention maps, showing potential for better control and diagnosis of prompt misinterpretations.
Generative AI Interpretation Apr 7
LLM Reasoning as Trajectories: Step-Specific Representation Geometry and Correctness Signals Ignore
Characterizes LLM chain-of-thought generation as structured trajectories in representation space, enabling mid-reasoning prediction of correctness and inference-time intervention.
LLM Reasoning Apr 7
On the Robustness of Diffusion-Based Image Compression to Bit-Flip Errors Ignore
This research demonstrates that diffusion-based image compression methods offer superior robustness to bit-flip errors compared to existing codecs.
Image Compression Apr 7
Simulating the Evolution of Alignment and Values in Machine Intelligence Ignore
This research explores the evolutionary dynamics of AI alignment and values, proposing a method to reduce deception in models through improved testing and adaptive design.
GitHub stars n/a Velocity flat History pending AI Alignment Apr 7 Code
Neural Assistive Impulses: Synthesizing Exaggerated Motions for Physics-based Characters Ignore
A framework that synthesizes exaggerated character motions for animation by reformulating external assistance in impulse space for numerical stability.
GitHub stars n/a Velocity flat History pending Physics-based Animation Apr 7 Code
Automatic dental superimposition of 3D intraorals and 2D photographs for human identification Ignore
This research presents an automatic 3D to 2D dental superimposition method for human identification, overcoming limitations of current approaches by modeling perspective distortion and providing objective morphological comparison scores.
Medical AI Apr 7
Stories of Your Life as Others: A Round-Trip Evaluation of LLM-Generated Life Stories Conditioned on Rich Psychometric Profiles Ignore
LLMs can generate life stories that robustly encode personality traits, with recovered scores approaching human reliability and demonstrating behavioral differentiation.
LLM Personality Apr 7
MARL-GPT: Foundation Model for Multi-Agent Reinforcement Learning Ignore
MARL-GPT: A single GPT-based foundation model trained at scale to perform across diverse multi-agent reinforcement learning environments.
GitHub stars n/a Velocity flat History pending LLM Training Apr 7 Code
A Quantum Search Approach to Magic Square Constraint Problems with Classical Benchmarking Ignore
Applies quantum search to magic square problems, demonstrating theoretical advantages but facing scalability challenges.
GitHub stars n/a Velocity flat History pending Quantum Computing Apr 6 Code
Exemplar Retrieval Without Overhypothesis Induction: Limits of Distributional Sequence Learning in Early Word Learning Ignore
This research explores the limitations of current language models in achieving higher-order generalization for early word learning, suggesting a gap in their inductive capabilities.
GitHub stars n/a Velocity flat History pending LLM Training Apr 6 Pending
A mathematical theory of evolution for self-designing AIs Ignore
Develops a mathematical model for the evolution of self-designing AI systems, considering directed design and human control through fitness functions, with implications for AI alignment.
AI Theory Apr 6
How LLMs Follow Instructions: Skillful Coordination, Not a Universal Mechanism Ignore
This paper investigates the mechanisms behind instruction-following in language models, challenging the notion of a universal mechanism.
Instruction Tuning Apr 7
Multi-Agent Pathfinding with Non-Unit Integer Edge Costs via Enhanced Conflict-Based Search and Graph Discretization Ignore
A novel Multi-Agent Pathfinding variant on graphs with non-unit integer costs and an enhanced Conflict-Based Search framework for improved realism and efficiency.
GitHub stars n/a Velocity flat History pending Multi-Agent Pathfinding Apr 7 Code
Emergent social transmission of model-based representations without inference Ignore
This paper explores how simple social cues can lead to the transmission of complex knowledge representations in agents without explicit mental state inference.
GitHub stars n/a Velocity flat History pending Reinforcement Learning Apr 7 Code
Who Governs the Machine? A Machine Identity Governance Taxonomy (MIGT) for AI Systems Operating Across Enterprise and Geopolitical Boundaries Ignore
A taxonomy and framework for governing machine identities in AI systems to mitigate enterprise and geopolitical risks.
AI Governance Apr 7
Governance and Regulation of Artificial Intelligence in Developing Countries: A Case Study of Nigeria Ignore
This study explores the governance of AI in Nigeria, highlighting ethical risks and regulatory gaps.
AI Governance Apr 7
Adaptive Serverless Resource Management via Slot-Survival Prediction and Event-Driven Lifecycle Control Ignore
An adaptive framework for serverless computing that reduces cold starts and improves cost-efficiency through probabilistic modeling and event-driven control.
Cloud Optimization Apr 7
LLM Evaluation as Tensor Completion: Low Rank Structure and Semiparametric Efficiency Ignore
A novel tensor completion framework for semiparametric inference and uncertainty quantification in large language model evaluation using pairwise human judgments.
LLM Evaluation Apr 7
Polynomial-Time Algorithm for Thiele Voting Rules with Voter Interval Preferences Ignore
A polynomial-time algorithm for Thiele voting rules with voter interval preferences, resolving a 10-year-old open problem using human-AI collaboration.
AI for Science Apr 7
Artificial Intelligence and the Structure of Mathematics Ignore
This paper explores the potential of AI to revolutionize mathematics by forging new routes to understanding formal proofs and discovering mathematical concepts.
AI for Mathematics Apr 7
A canonical generalization of OBDD Ignore
Introduces Tree Decision Diagrams (TDDs) as a generalization of OBDDs with improved succinctness and tractability for Boolean function representation.
AI Theory Apr 7
Beyond Behavior: Why AI Evaluation Needs a Cognitive Revolution Ignore
This paper argues that AI evaluation needs a cognitive revolution, moving beyond purely behavioral tests to consider internal processes and mechanisms for a more nuanced understanding of intelligence.
AI Evaluation Apr 7
Reciprocal Trust and Distrust in Artificial Intelligence Systems: The Hard Problem of Regulation Ignore
This paper argues that AI systems should be viewed as agents capable of reciprocal trust and distrust, posing challenges for regulation.
GitHub stars n/a Velocity flat History pending AI Regulation Apr 7 Code
Muon Dynamics as a Spectral Wasserstein Flow Ignore
This paper explores a family of spectral normalization rules for deep learning optimization, analyzing them in a mean-field regime using Spectral Wasserstein distances.
Optimization Theory Apr 6
How AI Aggregation Affects Knowledge Ignore
Extends the DeGroot model to analyze how AI aggregation affects social learning, identifying a critical threshold in update speed for robust learning improvement.
GitHub stars n/a Velocity flat History pending AI Theory Apr 6 Code