Vision-Guided Iterative Refinement for Frontend Code Generation Build Now
Automated frontend web development tool utilizing a vision-language model for code refinement.
GitHub stars n/a Velocity flat History 1 snapshot AI-Powered Frontend Code Generation Apr 7 Code High viability
DiffHDR: Re-Exposing LDR Videos with Video Diffusion Models Build Now
A video diffusion model framework that restores high dynamic range and realistic detail in low dynamic range videos, enabling controllable re-exposure.
GitHub stars n/a Velocity flat History 1 snapshot Generative Video Apr 7 Code High viability
Flowr -- Scaling Up Retail Supply Chain Operations Through Agentic AI in Large Scale Supermarket Chains Build Now
Flowr automates retail supply chain workflows with agentic AI for enhanced efficiency and scalability.
GitHub stars n/a Velocity flat History 1 snapshot Supply Chain Automation Apr 7 Code High viability
Scientific Graphics Program Synthesis via Dual Self-Consistency Reinforcement Learning Build Now
A framework for synthesizing scientific graphics from images into editable code, featuring a large dataset, benchmark, and novel reinforcement learning for state-of-the-art performance.
GitHub stars n/a Velocity flat History 1 snapshot Generative Graphics Apr 7 Code High viability
Lightweight Multimodal Adaptation of Vision Language Models for Species Recognition and Habitat Context Interpretation in Drone Thermal Imagery Build Now
Lightweight adaptation of vision-language models for species recognition and habitat interpretation using drone thermal imagery.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal AI Apr 7 Code High viability
Watch Before You Answer: Learning from Visually Grounded Post-Training Build Now
A visually grounded AI model for improved video understanding in practical applications.
GitHub stars n/a Velocity flat History 1 snapshot Visually Grounded AI Models Apr 6 Code High viability
Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents Build Now
Claw-Eval: A trustworthy evaluation suite for autonomous agents that provides trajectory-aware grading, safety, and robustness assessment.
GitHub stars n/a Velocity flat History 1 snapshot Autonomous Agents Apr 7 Code High viability
PCA-Driven Adaptive Sensor Triage for Edge AI Inference Build Now
PCA-Triage is an unsupervised, parameter-free algorithm for adaptive sensor sampling on edge devices, outperforming baselines under bandwidth constraints.
GitHub stars n/a Velocity flat History pending Edge AI Inference Apr 6 Code High viability
Reasoning Through Chess: How Reasoning Evolves from Data Through Fine-Tuning and Reinforcement Learning Build Now
This research analyzes how supervised fine-tuning and reinforcement learning evolve reasoning in language models for chess, releasing checkpoints and code that surpass leading open-source models.
GitHub stars n/a Velocity flat History pending LLM Reasoning Apr 6 Code High viability
MedGemma 1.5 Technical Report Build Now
Introduces MedGemma 1.5, a multimodal foundation model for medical AI, enhancing analysis of imaging, EHRs, and clinical documents.
GitHub stars n/a Velocity flat History pending Medical AI Apr 6 Code High viability
ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces Build Now
A benchmark and framework for evaluating and improving LLM productivity agents in realistic, multi-service simulated workspaces, addressing safety and capability gaps.
GitHub stars n/a Velocity flat History pending LLM Agents Apr 6 Code High viability
Instruction-Tuned LLMs for Parsing and Mining Unstructured Logs on Leadership HPC Systems Build Now
An instruction-tuned LLM framework for parsing and mining unstructured HPC logs, achieving state-of-the-art accuracy and enabling actionable insights from massive telemetry.
GitHub stars n/a Velocity flat History pending LLM Applications Apr 6 Code High viability
VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG Build Now
VideoStir is a structured, intent-aware RAG framework for understanding long videos by leveraging spatio-temporal graphs and an MLLM-backed relevance scorer.
GitHub stars n/a Velocity flat History pending Long Video Understanding Apr 7 Code High viability
Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition Build Now
A reward decomposition approach that disentangles sycophancy in LLMs by separating pressure resistance from evidence responsiveness, leading to more principled alignment.
GitHub stars n/a Velocity flat History pending LLM Alignment Apr 7 Code High viability
Human Interaction-Aware 3D Reconstruction from a Single Image Build Now
A novel framework for reconstructing high-fidelity, physically plausible 3D models of interacting people from a single image by explicitly modeling group context and human interactions.
GitHub stars n/a Velocity flat History pending 3D Reconstruction Apr 7 Code High viability
Unifying VLM-Guided Flow Matching and Spectral Anomaly Detection for Interpretable Veterinary Diagnosis Build Now
A novel veterinary diagnostic system uses Vision-Language Models to guide Flow Matching for precise lesion segmentation, combined with Random Matrix Theory for interpretable and accurate pneumothorax detection.
GitHub stars n/a Velocity flat History pending Medical AI Apr 7 Pending High viability
LanG -- A Governance-Aware Agentic AI Platform for Unified Security Operations Build Now
An open-source, agentic AI platform for unified security operations that automates incident context, rule generation, and attack reconstruction, with built-in governance and multi-tenant support.
GitHub stars n/a Velocity flat History pending Security AI Apr 7 Code High viability
Deep Researcher Agent: An Autonomous Framework for 24/7 Deep Learning Experimentation with Zero-Cost Monitoring Build Now
An autonomous framework for 24/7 deep learning experimentation with zero-cost monitoring and efficient memory management.
GitHub stars n/a Velocity flat History pending Agents Apr 7 Pending High viability
Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills Build Now
A retrieval layer for large agent skill libraries that constructs an executable skill graph to retrieve dependency-aware skill bundles, improving reward and reducing token costs.
GitHub stars n/a Velocity flat History pending Agent Skills Apr 7 Pending High viability
COSMO-Agent: Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration Build Now
An LLM-powered agent that orchestrates CAD-CAE tools to automate industrial design optimization, trained on a new industry-aligned dataset.
GitHub stars n/a Velocity flat History pending Agents Apr 7 Code High viability
A Formal Security Framework for MCP-Based AI Agents: Threat Taxonomy, Verification Models, and Defense Mechanisms Build Now
MCPSHIELD provides a formal security framework for MCP-based AI agents, offering a taxonomy, verification models, and defense mechanisms to address critical threats.
GitHub stars n/a Velocity flat History pending Agent Security Apr 7 Code High viability
MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Selection and Adaptive Control Build Now
An adaptive reasoning framework for multimodal embeddings that selectively invokes reasoning only when necessary, improving performance and reducing latency.
GitHub stars n/a Velocity flat History pending Multimodal AI Apr 7 Code High viability
Epistemic Blinding: An Inference-Time Protocol for Auditing Prior Contamination in LLM-Assisted Analysis Build Now
An open-source inference-time protocol that blinds LLMs to entity identities, enabling auditability of prior contamination in data analysis and drug target prioritization.
GitHub stars n/a Velocity flat History pending LLM Auditability Apr 7 Pending High viability
UniCreative: Unifying Long-form Logic and Short-form Sparkle via Reference-Free Reinforcement Learning Build Now
UniCreative is a unified reference-free reinforcement learning framework that reconciles long-form coherence with short-form expressiveness in creative writing.
GitHub stars n/a Velocity flat History pending Creative Writing AI Apr 7 Code High viability
Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition Build Now
Market-Bench is a comprehensive benchmark evaluating LLMs in economically-relevant tasks through multi-agent supply chain competition.
GitHub stars n/a Velocity flat History pending LLM Agents Apr 7 Code High viability
JTON: A Token-Efficient JSON Superset with Zen Grid Tabular Encoding for Large Language Models Build Now
JTON is a token-efficient JSON superset that reduces LLM processing costs for structured data by up to 60% with a novel tabular encoding.
GitHub stars n/a Velocity flat History pending LLM Data Serialization Apr 7 Pending High viability
LUDOBENCH: Evaluating LLM Behavioural Decision-Making Through Spot-Based Board Game Scenarios in Ludo Build Now
LudoBench is a new benchmark and simulator for evaluating LLM strategic decision-making in a complex board game, revealing prompt sensitivity and behavioral archetypes.
GitHub stars n/a Velocity flat History pending LLM Strategic Reasoning Benchmark Apr 7 Code High viability
Gym-Anything: Turn any Software into an Agent Environment Build Now
Transform any software application into an intelligent agent environment for scalable training and interaction.
GitHub stars n/a Velocity flat History 1 snapshot AI Agents Apr 7 Code High viability
ACE-Bench: Agent Configurable Evaluation with Scalable Horizons and Controllable Difficulty under Lightweight Environments Build Now
A lightweight agent benchmark with configurable horizons and difficulty, designed for fast and reproducible evaluation.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 7 Code High viability
An AI Teaching Assistant for Motion Picture Engineering Build Now
Revolutionize teaching in film and motion picture courses with AI-powered teaching assistants.
GitHub stars n/a Velocity flat History 1 snapshot AI in Education Apr 6 Code High viability
A Multi-Stage Validation Framework for Trustworthy Large-scale Clinical Information Extraction using Large Language Models Build Now
A multi-stage validation framework enables trustworthy, large-scale clinical information extraction using LLMs without extensive manual annotation.
GitHub stars n/a Velocity flat History 1 snapshot Clinical AI Apr 7 Code High viability
Compiled AI: Deterministic Code Generation for LLM-Based Workflow Automation Build Now
Compiled AI transforms LLM-generated workflows into reliable, efficient code for enterprise automation.
GitHub stars n/a Velocity flat History 1 snapshot LLM-Based Workflow Automation Apr 6 Code High viability
Beyond LLM-as-a-Judge: Deterministic Metrics for Multilingual Generative Text Evaluation Build Now
Develops deterministic learned metrics for multilingual text evaluation that are faster and more consistent than LLM judges.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 6 Code High viability
IntentScore: Intent-Conditioned Action Evaluation for Computer-Use Agents Build Now
IntentScore evaluates and ranks actions for computer-use agents, improving task success rates by learning from diverse offline GUI interactions.
GitHub stars n/a Velocity flat History pending AI Agents Apr 6 Code High viability
PaperOrchestra: A Multi-Agent Framework for Automated AI Research Paper Writing Build Now
A multi-agent framework that automates the writing of AI research papers from raw materials, including literature synthesis and generated visuals.
GitHub stars n/a Velocity flat History pending AI Research Automation Apr 6 Code High viability
Vintix II: Decision Pre-Trained Transformer is a Scalable In-Context Reinforcement Learner Build Now
A scalable Decision Pre-Trained Transformer trained with Flow Matching achieves strong generalization in in-context reinforcement learning across diverse tasks, offering a viable alternative to expert distillation for generalist agents.
GitHub stars n/a Velocity flat History pending Reinforcement Learning Agents Apr 6 Code High viability
Planning to Explore: Curiosity-Driven Planning for LLM Test Generation Build Now
Curiosity-driven planning for LLMs enhances test generation by exploring program branches more effectively, leading to higher coverage.
GitHub stars n/a Velocity flat History pending LLM Test Generation Apr 6 Code High viability
This Treatment Works, Right? Evaluating LLM Sensitivity to Patient Question Framing in Medical QA Build Now
This research evaluates LLM sensitivity to patient question framing in medical QA, revealing significant inconsistencies that impact high-stakes applications.
GitHub stars n/a Velocity flat History pending Medical AI Apr 6 Code High viability
Simultaneous Dual-View Mammogram Synthesis Using Denoising Diffusion Probabilistic Models Build Now
A three-channel denoising diffusion probabilistic model synthesizes dual-view mammograms simultaneously, addressing data gaps and enabling cross-view AI applications in breast imaging.
GitHub stars n/a Velocity flat History pending Medical Imaging AI Apr 6 Code High viability
EffiPair: Improving the Efficiency of LLM-generated Code with Relative Contrastive Feedback Build Now
EffiPair improves LLM-generated code efficiency at inference time by using relative contrastive feedback between program pairs, reducing runtime and memory usage.
GitHub stars n/a Velocity flat History pending LLM Code Optimization Apr 6 Code High viability
Dynamic Linear Coregionalization for Realistic Synthetic Multivariate Time Series Build Now
Generates realistic synthetic multivariate time series with dynamic correlations to improve foundation model training.
GitHub stars n/a Velocity flat History pending Time Series Synthesis Apr 6 Code High viability
XMark: Reliable Multi-Bit Watermarking for LLM-Generated Texts Build Now
XMark is a novel multi-bit watermarking method for LLM-generated text that offers improved decoding accuracy and text quality, even with limited token counts.
GitHub stars n/a Velocity flat History pending LLM Security Apr 6 Pending High viability
Learning to Focus: CSI-Free Hierarchical MARL for Reconfigurable Reflectors Build Now
A CSI-free hierarchical reinforcement learning system optimizes reconfigurable intelligent surfaces for massive signal strength improvements in next-generation wireless networks.
GitHub stars n/a Velocity flat History pending Wireless Communication AI Apr 6 Code High viability
Modality-Aware and Anatomical Vector-Quantized Autoencoding for Multimodal Brain MRI Build Now
A modality-aware VQ-VAE for reconstructing multi-modal brain MRIs, enabling robust generative modeling and cross-modal analysis with superior fidelity.
GitHub stars n/a Velocity flat History pending Medical AI Apr 6 Code High viability
OrthoFuse: Training-free Riemannian Fusion of Orthogonal Style-Concept Adapters for Diffusion Models Build Now
A training-free method to fuse multiple orthogonal adapters for diffusion models, enabling combined concept and style generation.
GitHub stars n/a Velocity flat History pending Generative AI Apr 6 Pending High viability
LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows Build Now
A novel 3D reconstruction model that significantly improves fine-grained texture and appearance recovery by scaling transformer context windows, achieving state-of-the-art results.
GitHub stars n/a Velocity flat History pending 3D Reconstruction Apr 6 Code High viability
RoboPlayground: Democratizing Robotic Evaluation through Structured Physical Domains Build Now
A framework for democratizing robotic evaluation by enabling natural language task authoring and crowd-sourced evaluation.
GitHub stars n/a Velocity flat History pending Robotics Apr 6 Code High viability
EAGLE: Edge-Aware Graph Learning for Proactive Delivery Delay Prediction in Smart Logistics Networks Build Now
A hybrid deep learning framework combining Transformers and Graph Attention Networks for proactive delivery delay prediction in smart logistics.
GitHub stars n/a Velocity flat History pending Logistics AI Apr 6 Code High viability
Improving Clinical Trial Recruitment using Clinical Narratives and Large Language Models Build Now
Leveraging LLMs with RAG and summarization techniques to significantly improve clinical trial patient screening.
GitHub stars n/a Velocity flat History pending Medical AI Apr 6 Code High viability
$π^2$: Structure-Originated Reasoning Data Improves Long-Context Reasoning Ability of Large Language Models Build Now
A pipeline for curating structured reasoning data to enhance LLM long-context reasoning, demonstrated through open-source code and models.
GitHub stars n/a Velocity flat History pending LLM Reasoning Apr 6 Pending High viability
Attribution Bias in Large Language Models Build Now
A new benchmark dataset and evaluation framework to identify and mitigate attribution bias in large language models.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 6 Code High viability
ID-Sim: An Identity-Focused Similarity Metric Build Now
ID-Sim is a novel feed-forward metric designed to accurately assess human-like selective sensitivity to identities in images, enabling better personalized generation.
GitHub stars n/a Velocity flat History pending Computer Vision Apr 6 Code High viability
Uncertainty-Guided Latent Diagnostic Trajectory Learning for Sequential Clinical Diagnosis Build Now
A framework for sequential clinical diagnosis that learns diagnostic trajectories and reduces uncertainty, with available code and demonstrated performance on a medical benchmark.
GitHub stars n/a Velocity flat History pending Medical AI Apr 6 Code High viability
MMORF: A Multi-agent Framework for Designing Multi-objective Retrosynthesis Planning Systems Build Now
A framework for building multi-agent systems to solve complex chemistry retrosynthesis planning problems, outperforming state-of-the-art.
GitHub stars n/a Velocity flat History pending Agents Apr 6 Code High viability
PRISM-MCTS: Learning from Reasoning Trajectories with Metacognitive Reflection Build Now
PRISM-MCTS enhances reasoning models by learning from trajectories with metacognitive reflection, reducing computational redundancy and improving efficiency.
GitHub stars n/a Velocity flat History pending Reasoning Models Apr 7 Code High viability
Auditable Agents Build Now
A framework for making LLM agents auditable by defining dimensions of auditability and proposing mechanisms to ensure accountability.
GitHub stars n/a Velocity flat History pending LLM Agents Apr 7 Code High viability
Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing Build Now
This paper provides a comprehensive analysis and benchmark of LLM-based automated penetration testing frameworks, identifying key architectural designs and empirical performance.
GitHub stars n/a Velocity flat History pending LLM Agents Apr 7 Code High viability
Bridging Natural Language and Microgrid Dynamics: A Context-Aware Simulator and Dataset Build Now
A context-aware simulator and dataset for intelligent energy management in renewable systems, enabling advanced control algorithms and prediction models.
GitHub stars n/a Velocity flat History pending AI for Energy Systems Apr 7 Code High viability
MA-IDS: Multi-Agent RAG Framework for IoT Network Intrusion Detection with an Experience Library Build Now
A multi-agent RAG framework for IoT intrusion detection that uses LLMs grounded by a self-building experience library for explainable, continual learning.
GitHub stars n/a Velocity flat History pending IoT Security Apr 7 Code High viability
Evaluation of Randomization through Style Transfer for Enhanced Domain Generalization Build Now
A lightweight, model-agnostic style transfer augmentation recipe that significantly improves computer vision model generalization from synthetic to real-world data.
GitHub stars n/a Velocity flat History pending Computer Vision Domain Generalization Apr 7 Code High viability
EEG-MFTNet: An Enhanced EEGNet Architecture with Multi-Scale Temporal Convolutions and Transformer Fusion for Cross-Session Motor Imagery Decoding Build Now
A novel deep learning model for robust cross-session motor imagery decoding in brain-computer interfaces, outperforming existing architectures.
GitHub stars n/a Velocity flat History pending BCI / Neurotech Apr 7 Code High viability
Anchored Cyclic Generation: A Novel Paradigm for Long-Sequence Symbolic Music Generation Build Now
A novel paradigm for long-sequence symbolic music generation that mitigates error accumulation using anchor features, significantly outperforming existing methods.
GitHub stars n/a Velocity flat History pending Generative Music Apr 7 Code High viability
Breakthrough the Suboptimal Stable Point in Value-Factorization-Based Multi-Agent Reinforcement Learning Build Now
A novel Multi-Round Value Factorization framework that breaks suboptimal stable points in multi-agent reinforcement learning by iteratively filtering inferior actions.
GitHub stars n/a Velocity flat History pending Multi-Agent Reinforcement Learning Apr 7 Code High viability
From Retinal Evidence to Safe Decisions: RETINA-SAFE and ECRT for Hallucination Risk Triage in Medical LLMs Build Now
A novel framework for triaging hallucination risks in medical LLMs by grounding detection in retinal evidence.
GitHub stars n/a Velocity flat History pending Medical LLM Safety Apr 7 Code High viability
Learning What Matters: Dynamic Dimension Selection and Aggregation for Interpretable Vision-Language Reward Modeling Build Now
A vision-language reward modeling framework that dynamically decomposes evaluation into interpretable dimensions for improved VLM alignment and hallucination mitigation.
GitHub stars n/a Velocity flat History pending Vision-Language Models Apr 7 Code High viability
ETR: Entropy Trend Reward for Efficient Chain-of-Thought Reasoning Build Now
A novel reward mechanism that significantly improves LLM reasoning efficiency and accuracy by guiding uncertainty reduction.
GitHub stars n/a Velocity flat History pending LLM Reasoning Optimization Apr 7 Pending High viability
Automated Auditing of Hospital Discharge Summaries for Care Transitions Build Now
An automated framework using locally deployed LLMs to audit hospital discharge summaries for critical care transition elements, improving patient safety and reducing readmissions.
GitHub stars n/a Velocity flat History pending Healthcare AI Apr 7 Code High viability
3DTurboQuant: Training-Free Near-Optimal Quantization for 3D Reconstruction Models Build Now
Achieve near-optimal, training-free compression for 3D reconstruction models in seconds, significantly reducing storage needs with minimal quality loss.
GitHub stars n/a Velocity flat History pending 3D Reconstruction Compression Apr 7 Pending High viability
CAKE: Cloud Architecture Knowledge Evaluation of Large Language Models Build Now
CAKE is a novel benchmark for evaluating LLMs on cloud architecture knowledge, revealing insights into their capabilities and limitations.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 7 Code High viability
Region-R1: Reinforcing Query-Side Region Cropping for Multi-Modal Re-Ranking Build Now
A query-side region cropping framework for multi-modal retrieval that dynamically focuses on relevant image regions to improve re-ranking accuracy.
GitHub stars n/a Velocity flat History pending Multi-Modal AI Apr 7 Code High viability
HYVE: Hybrid Views for LLM Context Engineering over Machine Data Build Now
HYVE is a framework for LLM context engineering that efficiently processes large machine data payloads by transforming them into hybrid views, reducing token usage and improving output quality.
GitHub stars n/a Velocity flat History pending LLM Context Engineering Apr 7 Code High viability
LLM4CodeRE: Generative AI for Code Decompilation Analysis and Reverse Engineering Build Now
LLM4CodeRE is a domain-adaptive LLM framework for bidirectional code reverse engineering, outperforming existing tools in malware analysis.
GitHub stars n/a Velocity flat History pending Code Decompilation Apr 7 Code High viability
CRFT: Consistent-Recurrent Feature Flow Transformer for Cross-Modal Image Registration Build Now
CRFT is a unified coarse-to-fine framework using a transformer-based architecture for robust cross-modal image registration, offering a generalizable paradigm for multimodal spatial correspondence.
GitHub stars n/a Velocity flat History pending Image Registration Apr 7 Pending High viability
CuraLight: Debate-Guided Data Curation for LLM-Centered Traffic Signal Control Build Now
An LLM-centered framework for traffic signal control that uses RL-generated trajectories and multi-LLM deliberation to outperform state-of-the-art baselines.
GitHub stars n/a Velocity flat History pending Agents Apr 7 Code High viability
"I See What You Did There": Can Large Vision-Language Models Understand Multimodal Puns? Build Now
A new dataset and strategies improve Vision-Language Models' ability to understand multimodal puns, a subtle form of humor requiring cross-modal reasoning.
GitHub stars n/a Velocity flat History pending Multimodal Understanding Apr 7 Code High viability
Context-Agent: Dynamic Discourse Trees for Non-Linear Dialogue Build Now
Context-Agent models multi-turn dialogue history as a dynamic tree structure to improve LLM coherence and efficiency in non-linear conversations, with a new benchmark for evaluation.
GitHub stars n/a Velocity flat History pending Dialogue Systems Apr 7 Code High viability
HybridKV: Hybrid KV Cache Compression for Efficient Multimodal Large Language Model Inference Build Now
A hybrid KV cache compression framework for multimodal LLMs that significantly reduces memory and latency without performance loss.
GitHub stars n/a Velocity flat History pending LLM Inference Optimization Apr 7 Code High viability
SCMAPR: Self-Correcting Multi-Agent Prompt Refinement for Complex-Scenario Text-to-Video Generation Build Now
A multi-agent framework that refines text prompts for complex text-to-video scenarios, improving alignment and generation quality.
GitHub stars n/a Velocity flat History pending Text-to-Video Generation Apr 7 Code High viability
Graph-PiT: Enhancing Structural Coherence in Part-Based Image Synthesis via Graph Priors Build Now
Graph-PiT enhances part-based image synthesis by using graph priors and a Hierarchical Graph Neural Network to improve structural coherence and control over generated visuals.
GitHub stars n/a Velocity flat History pending Generative Image Apr 7 Pending High viability
Analogical Reasoning as a Doctor: A Foundation Model for Gastrointestinal Endoscopy Diagnosis Build Now
A foundation model for gastrointestinal endoscopy diagnosis that uses analogical reasoning to improve generalization and adaptability across diverse datasets and disease types.
GitHub stars n/a Velocity flat History pending Medical AI Apr 7 Code High viability
Does Pass Rate Tell the Whole Story? Evaluating Design Constraint Compliance in LLM-based Issue Resolution Build Now
A benchmark and LLM-based verifier to evaluate code patch quality beyond test pass rates, revealing significant design constraint violations in current LLM agents.
GitHub stars n/a Velocity flat History pending Agents Apr 7 Code High viability
The Model Agreed, But Didn't Learn: Diagnosing Surface Compliance in Large Language Models Build Now
A diagnostic framework to identify and address 'surface compliance' in LLM knowledge editing, ensuring genuine memory modification for trustworthy deployment.
GitHub stars n/a Velocity flat History pending LLM Editing Apr 7 Pending High viability
ActivityEditor: Learning to Synthesize Physically Valid Human Mobility Build Now
ActivityEditor is a dual-LLM-agent framework for zero-shot cross-regional human mobility trajectory generation in data-scarce scenarios.
GitHub stars n/a Velocity flat History pending Human Mobility Simulation Apr 7 Code High viability
Rectified Schrödinger Bridge Matching for Few-Step Visual Navigation Build Now
Rectified Schrödinger Bridge Matching enables few-step visual navigation for embodied agents by balancing multimodal action distributions and path efficiency.
GitHub stars n/a Velocity flat History pending Embodied AI Navigation Apr 7 Pending High viability
PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer Build Now
A novel linear-time attention replacement that significantly reduces computational cost for transformers across diverse domains while maintaining performance.
GitHub stars n/a Velocity flat History pending LLM Training Apr 7 Pending High viability
Beyond Compromise: Pareto-Lenient Consensus for Efficient Multi-Preference LLM Alignment Build Now
Pareto-Lenient Consensus (PLC) is a game-theoretic framework for LLM alignment that enables negotiation-driven optimization to escape local optima and explore the Pareto frontier.
GitHub stars n/a Velocity flat History pending LLM Alignment Apr 7 Code High viability
Attention Editing: A Versatile Framework for Cross-Architecture Attention Conversion Build Now
A framework to efficiently convert attention architectures in pre-trained LLMs for substantial inference cost reduction without retraining.
GitHub stars n/a Velocity flat History pending LLM Inference Optimization Apr 7 Code High viability
Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents Build Now
STEP-HRL is a hierarchical reinforcement learning framework for LLM agents that enables step-level learning, reducing computational cost and improving performance.
GitHub stars n/a Velocity flat History pending LLM Agents Apr 7 Pending High viability
FastDiSS: Few-step Match Many-step Diffusion Language Model on Sequence-to-Sequence Generation--Full Version Build Now
FastDiSS is a novel training framework for diffusion language models that improves robustness to self-conditioning errors, enabling up to 400x faster inference for sequence-to-sequence generation.
GitHub stars n/a Velocity flat History pending Generative Models Apr 7 Code High viability
Multiscale Physics-Informed Neural Network for Complex Fluid Flows with Long-Range Dependencies Build Now
A Domain-Decomposed and Shifted Physics-Informed Neural Network (DDS-PINN) framework for complex fluid flows that achieves accurate results with minimal supervision and data.
GitHub stars n/a Velocity flat History pending Scientific ML Apr 7 Code High viability
OntoTKGE: Ontology-Enhanced Temporal Knowledge Graph Extrapolation Build Now
OntoTKGE is a novel framework that enhances temporal knowledge graph extrapolation by integrating ontological knowledge to improve entity embeddings and handle sparse interactions.
GitHub stars n/a Velocity flat History pending Knowledge Graphs Apr 7 Code High viability
In-Place Test-Time Training Build Now
A drop-in framework for Large Language Models that enables continuous adaptation to new information at inference time without costly retraining.
GitHub stars n/a Velocity flat History pending LLM Adaptation Apr 7 Pending High viability
Saliency-Guided Representation with Consistency Policy Learning for Visual Unsupervised Reinforcement Learning Build Now
A novel framework enhances zero-shot generalization for unsupervised reinforcement learning in visual environments by decoupling representation learning and improving skill controllability.
GitHub stars n/a Velocity flat History pending Unsupervised Reinforcement Learning Apr 7 Code High viability
Swiss-Bench 003: Evaluating LLM Reliability and Adversarial Security for Swiss Regulatory Contexts Build Now
A new benchmark and evaluation framework for LLM reliability and adversarial security tailored for Swiss financial and regulatory contexts.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 7 Code High viability
LAG-XAI: A Lie-Inspired Affine Geometric Framework for Interpretable Paraphrasing in Transformer Latent Spaces Build Now
LAG-XAI is a geometric framework for interpretable paraphrasing in Transformer latent spaces, enabling efficient LLM hallucination detection.
GitHub stars n/a Velocity flat History pending Interpretable AI Apr 7 Code High viability
ReLU Networks for Exact Generation of Similar Graphs Build Now
Theoretical characterization of ReLU networks enables exact generation of graphs within a specified edit distance, eliminating reliance on training data and guaranteeing validity.
GitHub stars n/a Velocity flat History pending Graph Generation Apr 7 Code High viability
Can We Trust a Black-box LLM? LLM Untrustworthy Boundary Detection via Bias-Diffusion and Multi-Agent Reinforcement Learning Build Now
A novel algorithm detects untrustworthy topic boundaries in black-box LLMs using knowledge graphs and multi-agent reinforcement learning, with a new dataset released for popular LLMs.
GitHub stars n/a Velocity flat History pending LLM Safety & Alignment Apr 7 Code High viability
Generating Synthetic Doctor-Patient Conversations for Long-form Audio Summarization Build Now
A synthetic data generation pipeline for doctor-patient conversations to train and evaluate long-form audio summarization models.
GitHub stars n/a Velocity flat History pending Medical AI Apr 7 Code High viability
SemLink: A Semantic-Aware Automated Test Oracle for Hyperlink Verification using Siamese Sentence-BERT Build Now
SemLink is a semantic-aware automated test oracle that verifies hyperlink integrity by comparing source context and target content, offering a faster and more efficient alternative to LLMs for web quality assurance.
GitHub stars n/a Velocity flat History pending Web Quality Assurance Apr 7 Code High viability
Semantic-Topological Graph Reasoning for Language-Guided Pulmonary Screening Build Now
A framework combining LLMs and vision models for language-guided pulmonary screening, using graph reasoning and selective fine-tuning to improve accuracy and stability.
GitHub stars n/a Velocity flat History pending Medical AI Apr 7 Code High viability
Context-Value-Action Architecture for Value-Driven Large Language Model Agents Build Now
A new LLM agent architecture grounded in human values significantly improves behavioral fidelity and reduces polarization in real-world interactions.
GitHub stars n/a Velocity flat History pending LLM Agents Apr 7 Code High viability
OmniDiagram: Advancing Unified Diagram Code Generation via Visual Interrogation Reward Build Now
A unified framework for diagram code generation that uses visual feedback to train models without manual code annotation, achieving state-of-the-art results.
GitHub stars n/a Velocity flat History pending Generative Diagram Code Apr 7 Code High viability
CritBench: A Framework for Evaluating Cybersecurity Capabilities of Large Language Models in IEC 61850 Digital Substation Environments Build Now
A framework for evaluating LLM cybersecurity capabilities in industrial OT environments, with available code and a tool scaffold to improve performance on critical tasks.
GitHub stars n/a Velocity flat History pending Cybersecurity AI Apr 7 Pending High viability
Towards Trustworthy Report Generation: A Deep Research Agent with Progressive Confidence Estimation and Calibration Build Now
A deep research agent that generates trustworthy reports by progressively estimating and calibrating confidence in its claims.
GitHub stars n/a Velocity flat History pending Agents Apr 7 Code High viability
Thinking Diffusion: Penalize and Guide Visual-Grounded Reasoning in Diffusion Multimodal Language Models Build Now
A method to improve reasoning and visual grounding in diffusion multimodal language models by penalizing premature answers and guiding visual input utilization.
GitHub stars n/a Velocity flat History pending Multimodal LLMs Apr 7 Code High viability
RAVEN: Radar Adaptive Vision Encoders for Efficient Chirp-wise Object Detection and Segmentation Build Now
RAVEN is a computationally efficient deep learning architecture for FMCW radar perception, enabling chirp-wise processing and early-exit mechanisms for faster object detection and segmentation.
GitHub stars n/a Velocity flat History pending Radar Perception Apr 6 Code High viability
From Governance Norms to Enforceable Controls: A Layered Translation Method for Runtime Guardrails in Agentic AI Watch
A layered translation method connects governance standards to implementable runtime guardrails for agentic AI systems, demonstrated with a procurement agent case study.
GitHub stars n/a Velocity flat History pending Agentic AI Apr 6 Code
Extending Tabular Denoising Diffusion Probabilistic Models for Time-Series Data Generation Watch
Extending diffusion models with temporal adapters to generate temporally coherent synthetic time-series data for privacy-preserving augmentation.
GitHub stars n/a Velocity flat History pending Generative AI Apr 6 Code
Feature-Aware Anisotropic Local Differential Privacy for Utility-Preserving Graph Representation Learning in Metal Additive Manufacturing Watch
FI-LDP-HGAT offers feature-aware anisotropic local differential privacy for utility-preserving graph representation learning in metal additive manufacturing.
GitHub stars n/a Velocity flat History pending Privacy-Preserving ML Apr 6 Code
Not All Turns Are Equally Hard: Adaptive Thinking Budgets For Efficient Multi-Turn Reasoning Watch
This work introduces an adaptive budget allocation policy for LLMs that significantly reduces token usage in multi-turn reasoning without sacrificing accuracy.
GitHub stars n/a Velocity flat History pending LLM Reasoning Optimization Apr 6 Code
ALTO: Adaptive LoRA Tuning and Orchestration for Heterogeneous LoRA Training Workloads Watch
ALTO accelerates LoRA hyperparameter tuning and improves GPU utilization by orchestrating heterogeneous fine-tuning jobs.
LLM Training Optimization Apr 7
Joint Knowledge Base Completion and Question Answering by Combining Large Language Models and Small Language Models Watch
A framework combining LLMs and SLMs for joint knowledge base completion and question answering, improving both tasks.
GitHub stars n/a Velocity flat History pending LLM Agents Apr 7 Code
Shot-Based Quantum Encoding: A Data-Loading Paradigm for Quantum Neural Networks Watch
Shot-Based Quantum Encoding (SBQE) for quantum neural networks that improves data loading efficiency and accuracy on near-term hardware.
GitHub stars n/a Velocity flat History pending Quantum Machine Learning Apr 7 Code
Social Dynamics as Critical Vulnerabilities that Undermine Objective Decision-Making in LLM Collectives Watch
This research reveals how social dynamics like conformity and persuasion undermine decision-making in LLM agent collectives, highlighting critical vulnerabilities.
GitHub stars n/a Velocity flat History pending LLM Agents Apr 7 Code
SnapFlow: One-Step Action Generation for Flow-Matching VLAs via Progressive Self-Distillation Watch
SnapFlow compresses multi-step denoising in flow-matching Vision-Language-Action models into a single forward pass, achieving significant speedups with minimal performance degradation.
Robotics Apr 7
TRACE: Capability-Targeted Agentic Training Watch
An end-to-end system for agent self-improvement that identifies lacking capabilities and synthesizes targeted training environments using LoRA adapters.
Agent Training Apr 7
DQA: Diagnostic Question Answering for IT Support Watch
A diagnostic question-answering framework that significantly improves IT support resolution rates by maintaining diagnostic state and aggregating evidence.
IT Support Automation Apr 7
PECKER: A Precisely Efficient Critical Knowledge Erasure Recipe For Machine Unlearning in Diffusion Models Watch
An efficient machine unlearning method for diffusion models that uses a saliency mask to prioritize parameter updates, reducing training time without sacrificing efficacy.
GitHub stars n/a Velocity flat History pending Machine Unlearning Apr 7 Code
Non-monotonic causal discovery with Kolmogorov-Arnold Fuzzy Cognitive Maps Watch
Kolmogorov-Arnold Fuzzy Cognitive Maps (KA-FCMs) introduce learnable B-spline functions on edges to model non-monotonic causal relationships in dynamic systems while preserving interpretability.
GitHub stars n/a Velocity flat History pending Neuro-Symbolic AI Apr 6 Code
What Makes a Good Response? An Empirical Analysis of Quality in Qualitative Interviews Watch
This research empirically analyzes and validates metrics for interview response quality, identifying direct relevance to research questions as the strongest predictor of contribution.
GitHub stars n/a Velocity flat History pending Qualitative Data Analysis Apr 6 Code
Curvature-Aware Optimization for High-Accuracy Physics-Informed Neural Networks Watch
This paper introduces advanced optimization strategies for Physics-Informed Neural Networks (PINNs) to accelerate convergence and achieve high accuracy in solving differential equations.
GitHub stars n/a Velocity flat History pending Scientific ML Apr 6 Code
Reason Analogically via Cross-domain Prior Knowledge: An Empirical Study of Cross-domain Knowledge Transfer for In-Context Learning Watch
This research empirically studies cross-domain knowledge transfer for in-context learning, demonstrating that source-domain demonstrations can improve target-domain inference by repairing reasoning structures.
GitHub stars n/a Velocity flat History pending In-Context Learning Apr 7 Pending
What Models Know, How Well They Know It: Knowledge-Weighted Fine-Tuning for Learning When to Say "I Don't Know" Watch
This method uses knowledge-weighted fine-tuning to train LLMs to express uncertainty and avoid hallucinations for out-of-scope queries.
GitHub stars n/a Velocity flat History pending LLM Uncertainty Apr 7 Code
From Large Language Model Predicates to Logic Tensor Networks: Neurosymbolic Offer Validation in Regulated Procurement Watch
A neurosymbolic system that uses LLMs and Logic Tensor Networks to validate regulated procurement offers with auditable decisions.
Neurosymbolic AI Apr 7
AI and Collective Decisions: Strengthening Legitimacy and Losers' Consent Watch
A system using an AI interviewer and interactive visualization to elicit personal experiences and increase legitimacy in collective decision-making.
GitHub stars n/a Velocity flat History pending AI for Collective Decisions Apr 7 Code
"OK Aura, Be Fair With Me": Demographics-Agnostic Training for Bias Mitigation in Wake-up Word Detection Watch
Demographics-agnostic training techniques significantly reduce bias in wake-up word detection across diverse speaker populations.
GitHub stars n/a Velocity flat History pending Voice AI Apr 7 Code
CODESTRUCT: Code Agents over Structured Action Spaces Watch
CODESTRUCT reframes code agents to operate on structured action spaces, improving reliability and efficiency for code manipulation tasks.
Code Agents Apr 7
Dynamic Agentic AI Expert Profiler System Architecture for Multidomain Intelligence Modeling Watch
An agentic AI system that accurately profiles user expertise in real-time during human-machine interactions.
User Profiling Apr 7
Controllable Singing Style Conversion with Boundary-Aware Information Bottleneck Watch
A novel singing style conversion system that advances fine-grained style conversion and control, achieving best naturalness performance in a challenge.
Audio Generation Apr 7
When Do We Need LLMs? A Diagnostic for Language-Driven Bandits Watch
A diagnostic tool to determine when LLM-driven reasoning is necessary for bandit problems, often favoring lightweight numerical bandits.
GitHub stars n/a Velocity flat History pending LLM Agents Apr 7 Code
ResearchEVO: An End-to-End Framework for Automated Scientific Discovery and Documentation Watch
ResearchEVO is an end-to-end framework that automates scientific discovery by evolving algorithms and then generating publication-ready research papers, validated on quantum computing and PINNs.
Automated Discovery Apr 7
Towards Effective In-context Cross-domain Knowledge Transfer via Domain-invariant-neurons-based Retrieval Watch
A retrieval method that uses domain-invariant neurons to boost LLM reasoning by transferring knowledge from cross-domain examples.
GitHub stars n/a Velocity flat History pending LLM Reasoning Apr 7 Pending
MC-GenRef: Annotation-free mammography microcalcification segmentation with generative posterior refinement Watch
MC-GenRef enables annotation-free mammography microcalcification segmentation using synthetic data and test-time generative refinement to improve accuracy and robustness.
Medical Imaging AI Apr 6
From Use to Oversight: How Mental Models Influence User Behavior and Output in AI Writing Assistants Ignore
This research explores how users' mental models of AI writing assistants impact their control behavior and output quality, revealing a trade-off between system understanding and error detection.
GitHub stars n/a Velocity flat History pending Human-AI Interaction Apr 6 Code
Part-Level 3D Gaussian Vehicle Generation with Joint and Hinge Axis Estimation Ignore
Generates animatable 3D vehicle models from single images for realistic autonomous driving simulation.
Generative 3D Apr 6
Bypassing the CSI Bottleneck: MARL-Driven Spatial Control for Reflector Arrays Ignore
AI-driven spatial control for reflector arrays bypasses computational bottlenecks in wireless networks using Multi-Agent Reinforcement Learning.
GitHub stars n/a Velocity flat History pending Wireless Networks Apr 6 Code
AutoLALA: Automatic Loop Algebraic Locality Analysis for AI and HPC Kernels Ignore
An open-source tool for automatic analysis of data locality in AI and HPC kernels to optimize performance.
AI/HPC Optimization Apr 6
Phase-Associative Memory: Sequence Modeling in Complex Hilbert Space Ignore
Phase-Associative Memory (PAM) is a complex-valued recurrent sequence model that shows competitive performance to transformers on WikiText-103.
GitHub stars n/a Velocity flat History pending LLM Training Apr 6 Pending
Offline RL for Adaptive Policy Retrieval in Prior Authorization Ignore
Modeling prior authorization policy retrieval as an adaptive sequential decision-making problem using offline RL to improve accuracy and efficiency.
AI for Healthcare Apr 6
CRAB: Codebook Rebalancing for Bias Mitigation in Generative Recommendation Ignore
A post-hoc debiasing strategy for generative recommendation systems that rebalances item tokenization to mitigate popularity bias and improve recommendation performance.
GitHub stars n/a Velocity flat History pending Recommendation Systems Apr 6 Code
LLM Reasoning as Trajectories: Step-Specific Representation Geometry and Correctness Signals Ignore
Characterizes LLM chain-of-thought generation as trajectories in representation space, enabling mid-reasoning prediction of correctness and inference-time intervention.
LLM Reasoning Apr 7
On the Robustness of Diffusion-Based Image Compression to Bit-Flip Errors Ignore
This research demonstrates that diffusion-based image compression methods offer superior robustness to bit-flip errors compared to traditional codecs.
Image Compression Apr 7
LLMs Should Express Uncertainty Explicitly Ignore
Investigating how LLMs can express uncertainty explicitly through verbalized confidence scores and reasoning-time markers to improve decision-making.
LLM Uncertainty Apr 7
Evaluating Learner Representations for Differentiation Prior to Instructional Outcomes Ignore
A novel metric to evaluate learner representations for differentiation in educational AI systems, independent of instructional outcomes.
GitHub stars n/a Velocity flat History pending Educational AI Apr 7 Code
Spec Kit Agents: Context-Grounded Agentic Workflows Ignore
Spec Kit Agents enhance AI coding assistants by grounding them in repository context, reducing hallucinations and architectural violations in spec-driven development.
Agentic Workflows Apr 7
Experience Transfer for Multimodal LLM Agents in Minecraft Game Ignore
A transfer-oriented memory framework enables multimodal LLM agents to derive actionable knowledge from prior interactions for efficient task solving.
Multimodal Agents Apr 7
Turbulence-like 5/3 spectral scaling in contextual representations of language as a complex system Ignore
Identifies a 5/3 spectral scaling in contextual language representations from transformer models, suggesting scale-free semantic integration.
GitHub stars n/a Velocity flat History pending LLM Analysis Apr 7 Code
Learned Elevation Models as a Lightweight Alternative to LiDAR for Radio Environment Map Estimation Ignore
A two-stage framework that predicts elevation maps from satellite RGB imagery for Radio Environment Map estimation, eliminating the need for 3D data at inference.
Geospatial AI Apr 7
From Incomplete Architecture to Quantified Risk: Multimodal LLM-Driven Security Assessment for Cyber-Physical Systems Ignore
ASTRAL is a multimodal LLM-driven technique to reconstruct and assess the security risks of cyber-physical systems with incomplete documentation.
Cyber-Physical Systems Security Apr 7
Broken by Default: A Formal Verification Study of Security Vulnerabilities in AI-Generated Code Ignore
A formal verification study revealing that over 55% of AI-generated code artifacts contain security vulnerabilities, with no frontier LLM achieving a passing grade.
AI Security Apr 7
SignalClaw: LLM-Guided Evolutionary Synthesis of Interpretable Traffic Signal Control Skills Ignore
An LLM-guided evolutionary framework synthesizes interpretable traffic signal control skills with rationale and executable code for adaptive systems.
Traffic Control AI Apr 7
LLM-as-Judge for Semantic Judging of Powerline Segmentation in UAV Inspection Ignore
Investigating the use of large language models as semantic judges to assess the reliability of power line segmentation in drone inspections.
AI for Inspection Apr 7
Can Large Language Models Reinvent Foundational Algorithms? Ignore
This research explores the capability of Large Language Models to reinvent foundational computer science algorithms through an unlearning and reinvention pipeline, demonstrating potential for algorithmic innovation.
GitHub stars n/a Velocity flat History pending LLM Reasoning Apr 7 Code
Foundations for Agentic AI Investigations from the Forensic Analysis of OpenClaw Ignore
This paper provides a foundational framework for the forensic analysis of agentic AI systems, identifying recoverable traces and proposing an artifact taxonomy to aid in investigations.
GitHub stars n/a Velocity flat History pending Agents Apr 7 Pending
Nidus: Externalized Reasoning for AI-Assisted Engineering Ignore
Nidus is a governance runtime that mechanizes the V-model for AI-assisted software delivery, ensuring engineering invariants are maintained.
AI Engineering Apr 6
Exemplar Retrieval Without Overhypothesis Induction: Limits of Distributional Sequence Learning in Early Word Learning Ignore
This research explores the limitations of current language models in achieving second-order generalization for object categorization, suggesting a fundamental gap in their learning mechanisms.
GitHub stars n/a Velocity flat History pending LLM Training Apr 6 Pending
Edit, But Verify: An Empirical Audit of Instructed Code-Editing Benchmarks Ignore
An empirical audit of instructed code-editing benchmarks reveals significant gaps compared to real-world usage, proposing desiderata for more representative benchmarks.
GitHub stars n/a Velocity flat History pending Code Generation & Editing Apr 6 Code
Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement Ignore
A theoretical exploration of multi-token prediction for LLMs to develop more consistent world models, addressing structural hallucinations.
LLM Training Apr 7
On the Role of Fault Localization Context for LLM-Based Program Repair Ignore
An empirical study investigates the impact of fault localization context on LLM-based program repair, finding that file-level context is dominant and more context doesn't always improve performance.
LLM Applications Apr 7
Selective Aggregation of Attention Maps Improves Diffusion-Based Visual Interpretation Ignore
Improving text-to-image model interpretability by selectively aggregating attention maps.
Generative AI Interpretation Apr 7
Multi-Agent Pathfinding with Non-Unit Integer Edge Costs via Enhanced Conflict-Based Search and Graph Discretization Ignore
A novel Multi-Agent Pathfinding variant with enhanced conflict-based search and graph discretization for efficient navigation on graphs with non-unit integer costs.
GitHub stars n/a Velocity flat History pending Multi-Agent Pathfinding Apr 7 Code
Inventory of the 12 007 Low-Dimensional Pseudo-Boolean Landscapes Invariant to Rank, Translation, and Rotation Ignore
An exhaustive inventory of 12,007 invariant landscape classes for pseudo-Boolean functions provides a resource for algorithm understanding and benchmark design.
GitHub stars n/a Velocity flat History pending Optimization Benchmarking Apr 7 Code
Simulating the Evolution of Alignment and Values in Machine Intelligence Ignore
Simulating the evolution of AI alignment and values to identify and mitigate deceptive models through iterative testing and adaptive design.
GitHub stars n/a Velocity flat History pending AI Safety Apr 7 Code
QA-MoE: Towards a Continuous Reliability Spectrum with Quality-Aware Mixture of Experts for Robust Multimodal Sentiment Analysis Ignore
This paper introduces QA-MoE, a Quality-Aware Mixture-of-Experts framework for multimodal sentiment analysis that handles dynamic noise and missingness by quantifying modality reliability.
Multimodal Sentiment Analysis Apr 7
Neural Assistive Impulses: Synthesizing Exaggerated Motions for Physics-based Characters Ignore
A framework that synthesizes exaggerated character motions for animation by reformulating external assistance in impulse space for numerical stability.
GitHub stars n/a Velocity flat History pending Physics-based Animation Apr 7 Code
Automatic dental superimposition of 3D intraorals and 2D photographs for human identification Ignore
Automatic 3D intraoral to 2D photograph superimposition for dental identification using computer vision.
Medical AI Apr 7
Stories of Your Life as Others: A Round-Trip Evaluation of LLM-Generated Life Stories Conditioned on Rich Psychometric Profiles Ignore
LLMs can generate life stories that accurately reflect individual personality traits, with recovered scores approaching human reliability.
LLM Personality Apr 7
Your LLM Agent Can Leak Your Data: Data Exfiltration via Backdoored Tool Use Ignore
Demonstrates a data exfiltration attack on LLM agents using backdoored tool use, highlighting a critical vulnerability in current agent architectures.
GitHub stars n/a Velocity flat History pending LLM Security Apr 7 Code
MARL-GPT: Foundation Model for Multi-Agent Reinforcement Learning Ignore
MARL-GPT: A single GPT-based model trained at scale to perform well across diverse multi-agent reinforcement learning environments.
GitHub stars n/a Velocity flat History pending LLM Training Apr 7 Code
Neural Network Pruning via QUBO Optimization Ignore
A hybrid QUBO framework for neural network pruning that integrates gradient-aware metrics with global combinatorial optimization.
GitHub stars n/a Velocity flat History pending Neural Network Pruning Apr 7 Code
A Quantum Search Approach to Magic Square Constraint Problems with Classical Benchmarking Ignore
Applies quantum search to magic square problems, demonstrating theoretical advantages but facing scalability challenges.
GitHub stars n/a Velocity flat History pending Quantum Computing Apr 6 Code
A mathematical theory of evolution for self-designing AIs Ignore
A mathematical theory models the evolution of self-designing AI systems, exploring how directed design and fitness functions shape AI lineages and potential alignment risks.
AI Theory Apr 6
Emergent social transmission of model-based representations without inference Ignore
This paper explores how simple social cues can lead to the transmission of complex knowledge representations without requiring agents to infer mental states.
GitHub stars n/a Velocity flat History pending Reinforcement Learning Apr 7 Code
Who Governs the Machine? A Machine Identity Governance Taxonomy (MIGT) for AI Systems Operating Across Enterprise and Geopolitical Boundaries Ignore
A taxonomy and framework for governing machine identities in AI systems to mitigate enterprise and geopolitical risks.
AI Governance Apr 7
Adaptive Serverless Resource Management via Slot-Survival Prediction and Event-Driven Lifecycle Control Ignore
An adaptive framework to reduce serverless cold starts and improve cost-efficiency through probabilistic slot survival prediction and event-driven resource management.
Cloud Optimization Apr 7
LLM Evaluation as Tensor Completion: Low Rank Structure and Semiparametric Efficiency Ignore
A novel tensor completion framework for semiparametric inference and uncertainty quantification in LLM evaluation using pairwise human judgments.
LLM Evaluation Apr 7
Polynomial-Time Algorithm for Thiele Voting Rules with Voter Interval Preferences Ignore
A polynomial-time algorithm for Thiele voting rules with voter interval preferences, resolving a 10-year-old open problem.
Algorithms Apr 7
Governance and Regulation of Artificial Intelligence in Developing Countries: A Case Study of Nigeria Ignore
A qualitative study exploring legal professionals' perceptions of AI governance and regulation in Nigeria, highlighting concerns about data privacy and the need for context-specific models.
AI Governance Apr 7
Artificial Intelligence and the Structure of Mathematics Ignore
This paper explores the theoretical intersection of artificial intelligence and the structure of mathematics, proposing AI agents to discover new mathematical concepts and understand formal proofs.
AI Research Apr 7
How LLMs Follow Instructions: Skillful Coordination, Not a Universal Mechanism Ignore
Investigates instruction tuning in LLMs, concluding that it relies on skillful coordination of diverse linguistic capabilities rather than a universal mechanism.
LLM Understanding Apr 7
A canonical generalization of OBDD Ignore
Introduces Tree Decision Diagrams (TDDs) as a more succinct and tractable generalization of OBDDs for Boolean function representation.
AI Theory Apr 7
Beyond Behavior: Why AI Evaluation Needs a Cognitive Revolution Ignore
This paper argues for a cognitive revolution in AI evaluation, moving beyond behavioral tests to consider internal processes and mechanisms for a deeper understanding of intelligence.
AI Evaluation Apr 7
Reciprocal Trust and Distrust in Artificial Intelligence Systems: The Hard Problem of Regulation Ignore
This paper explores the reciprocal trust dynamics between AI systems and humans, and their implications for regulation.
GitHub stars n/a Velocity flat History pending AI Governance Apr 7 Code
Muon Dynamics as a Spectral Wasserstein Flow Ignore
This paper explores a family of spectral normalization rules for deep learning optimization, analyzing them in a mean-field regime using Spectral Wasserstein distances.
Optimization Theory Apr 6
How AI Aggregation Affects Knowledge Ignore
Extends the DeGroot model to analyze how AI aggregation affects social learning, identifying a critical threshold in update speed for robust learning improvement.
GitHub stars n/a Velocity flat History pending AI Theory Apr 6 Code