IDOBE: Infectious Disease Outbreak forecasting Benchmark Ecosystem Build Now
A comprehensive benchmark ecosystem for infectious disease outbreak forecasting, enabling standardized evaluation of statistical and machine learning models.
GitHub stars n/a Velocity flat History 1 snapshot Epidemic Forecasting Apr 20 Pending High viability
DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization Build Now
DuQuant++ enhances microscaling FP4 quantization for LLM inference by adapting outlier-aware rotation to the MXFP4 format, achieving state-of-the-art performance with reduced computational cost.
GitHub stars n/a Velocity flat History 1 snapshot LLM Quantization Apr 20 Pending High viability
Evaluating Multi-Hop Reasoning in RAG Systems: A Comparison of LLM-Based Retriever Evaluation Strategies Build Now
CARE, a context-aware retriever evaluation strategy for RAG systems, outperforms existing methods in evaluating multi-hop reasoning, especially for complex queries.
GitHub stars n/a Velocity flat History 1 snapshot RAG Evaluation Apr 20 Pending High viability
Soft Label Pruning and Quantization for Large-Scale Dataset Distillation Build Now
LPQLD reduces soft label storage by up to 500x in dataset distillation while improving accuracy, enabling efficient large-scale dataset compression.
GitHub stars n/a Velocity flat History 1 snapshot Dataset Distillation Apr 20 Pending High viability
Scalable Neighborhood-Based Multi-Agent Actor-Critic Build Now
MADDPG-K scales multi-agent reinforcement learning by restricting critics to nearby agents, offering competitive performance and faster convergence.
GitHub stars n/a Velocity flat History 1 snapshot Multi-Agent RL Apr 20 Pending High viability
EVE: Verifiable Self-Evolution of MLLMs via Executable Visual Transformations Build Now
EVE enables verifiable self-evolution of MLLMs through executable visual transformations, generating diverse and challenging training data with execution-verified ground truth.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal LLM Evolution Apr 20 Pending High viability
DSAINet: An Efficient Dual-Scale Attentive Interaction Network for General EEG Decoding Build Now
A novel dual-scale attentive network for generalizable EEG decoding that outperforms existing methods across multiple datasets with a single architecture.
GitHub stars n/a Velocity flat History 1 snapshot EEG Decoding Apr 20 Pending High viability
DocQAC: Adaptive Trie-Guided Decoding for Effective In-Document Query Auto-Completion Build Now
An adaptive trie-guided decoding framework for effective in-document query auto-completion that steers language models towards high-quality completions using document context.
GitHub stars n/a Velocity flat History 1 snapshot Document Search Apr 20 Pending High viability
Brain-Inspired Capture: Evidence-Driven Neuromimetic Perceptual Simulation for Visual Decoding Build Now
A neuromimetic simulation paradigm for visual decoding from neurophysiological signals, improving brain-computer interfaces by emulating human visual system processing.
GitHub stars n/a Velocity flat History 1 snapshot Brain-Computer Interfaces Apr 20 Pending High viability
One Pass for All: A Discrete Diffusion Model for Knowledge Graph Triple Set Prediction Build Now
A discrete diffusion model that predicts entire sets of missing knowledge graph triples in one pass, ensuring consistency and achieving state-of-the-art performance.
GitHub stars n/a Velocity flat History 1 snapshot Knowledge Graph AI Apr 20 Pending High viability
Evolutionary Negative Module Pruning for Better LoRA Merging Build Now
A plug-and-play method to prune detrimental LoRA modules before merging, improving performance across language and vision tasks.
GitHub stars n/a Velocity flat History 1 snapshot LLM Optimization Apr 20 Pending High viability
Reverse Constitutional AI: A Framework for Controllable Toxic Data Generation via Probability-Clamped RLAIF Build Now
A framework for generating controllable toxic data to improve LLM safety and red teaming.
GitHub stars n/a Velocity flat History 1 snapshot LLM Safety Apr 20 Pending High viability
Negative Advantage Is a Double-Edged Sword: Calibrating Advantage in GRPO for Deep Search Build Now
CalibAdv, an advantage calibration method for deep search agents, improves performance and stability by downscaling negative advantages and rebalancing positive/negative advantages.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents Apr 20 Pending High viability
Stability Implies Redundancy: Delta Attention Selective Halting for Efficient Long-Context Prefilling Build Now
DASH is a training-free method that uses attention dynamics to selectively halt stabilized tokens, significantly speeding up LLM prefilling while preserving accuracy.
GitHub stars n/a Velocity flat History 1 snapshot LLM Inference Optimization Apr 20 Pending High viability
OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning Build Now
A novel framework that unifies offline teacher guidance and online reinforcement learning for improved LLM reasoning and exploration.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 20 Pending High viability
LoReC: Rethinking Large Language Models for Graph Data Analysis Build Now
LoReC is a plug-and-play method that enhances Large Language Models for graph data analysis by improving their understanding of graph structures.
GitHub stars n/a Velocity flat History 1 snapshot Graph LLMs Apr 20 Pending High viability
Heterogeneity in Formal Linguistic Competence of Language Models: Is Data the Real Bottleneck? Build Now
This research demonstrates that targeted data augmentation, not architectural limitations, is key to improving LLMs' formal linguistic competence, with code available.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Data Apr 20 Pending High viability
Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation Build Now
Asset Harvester is an end-to-end pipeline that converts sparse, in-the-wild object observations from autonomous driving logs into complete, simulation-ready 3D assets.
GitHub stars n/a Velocity flat History 1 snapshot 3D Asset Generation Apr 20 Pending High viability
TacticGen: Grounding Adaptable and Scalable Generation of Football Tactics Build Now
Generate adaptable and scalable football tactics using a multi-agent diffusion transformer, grounded in game context and expert validation.
GitHub stars n/a Velocity flat History 1 snapshot Generative AI Apr 20 Code High viability
A multimodal and temporal foundation model for virtual patient representations at healthcare system scale Build Now
Develop a multimodal foundation model for predicting patient outcomes and improving hospital operations using integrated health records.
GitHub stars n/a Velocity flat History 1 snapshot Healthcare AI Apr 20 Code High viability
WebUncertainty: Dual-Level Uncertainty Driven Planning and Reasoning For Autonomous Web Agent Build Now
An autonomous web agent framework that tackles dual-level uncertainty in planning and reasoning using adaptive planning and Monte Carlo Tree Search for robust decision-making.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 20 Code High viability
Region-Grounded Report Generation for 3D Medical Imaging: A Fine-Grained Dataset and Graph-Enhanced Framework Build Now
HiRRA is a graph-enhanced framework for 3D medical imaging report generation that mimics radiologist workflow, achieving SOTA performance and significant clinical metric improvements.
GitHub stars n/a Velocity flat History 1 snapshot Medical Imaging AI Apr 20 Pending High viability
LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent Build Now
LiteResearcher is a scalable RL training framework for research agents, enabling a small agent to outperform large commercial models by mirroring real-world search dynamics in a lite virtual world.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 20 Pending High viability
WISV: Wireless-Informed Semantic Verification for Distributed Speculative Decoding in Device-Edge LLM Inference Build Now
WISV is a wireless-informed semantic verification framework for distributed speculative decoding in edge LLM inference, significantly improving accepted length and reducing interaction rounds.
GitHub stars n/a Velocity flat History 1 snapshot Edge AI Apr 20 Code High viability
ProtoCLIP: Prototype-Aligned Latent Refinement for Robust Zero-Shot Chest X-Ray Classification Build Now
ProtoCLIP enhances zero-shot chest X-ray classification by refining vision-language models with curated data and prototype alignment.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 20 Code High viability
Tight Auditing of Differential Privacy in MST and AIM Watch
A Gaussian Differential Privacy-based auditing framework for synthetic data generators that provides tight audits in the strong-privacy regime.
GitHub stars n/a Velocity flat History 1 snapshot Differential Privacy Apr 20 Pending
Autonomous Unmanned Aircraft Systems for Enhanced Search and Rescue of Drowning Swimmers: Image-Based Localization and Mission Simulation Build Now
An autonomous drone system using YOLO for rapid drowning swimmer detection and localization, significantly reducing rescue response times.
GitHub stars n/a Velocity flat History 1 snapshot Search and Rescue Drones Apr 20 Code High viability
Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play Build Now
Stratagem learns transferable reasoning in language models via trajectory-modulated game self-play, improving performance on mathematical, general reasoning, and code generation benchmarks.
GitHub stars n/a Velocity flat History 1 snapshot Reasoning Apr 20 Pending High viability
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence Build Now
Agent-World is a self-evolving training arena that synthesizes realistic environments and tasks to co-evolve general-purpose AI agents, outperforming proprietary models on challenging benchmarks.
GitHub stars n/a Velocity flat History 1 snapshot Agent Training Apr 20 Code High viability
Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks Build Now
A framework for analyzing LLM failures on realistic benchmarks using contrastive attribution.
GitHub stars n/a Velocity flat History 1 snapshot LLM Interpretability Apr 20 Code High viability
When Vision-Language Models Judge Without Seeing: Exposing Informativeness Bias Build Now
A paradigm to improve VLM-as-a-Judge reliability by balancing informativeness and image-grounded correctness.
GitHub stars n/a Velocity flat History 1 snapshot Vision-Language Models Apr 20 Code High viability
RAVEN: Retrieval-Augmented Vulnerability Exploration Network for Memory Corruption Analysis in User Code and Binary Programs Build Now
RAVEN is a framework using LLM agents and RAG to synthesize comprehensive vulnerability analysis reports for memory corruption in code and binaries.
GitHub stars n/a Velocity flat History 1 snapshot Cybersecurity AI Apr 20 Code High viability
Sessa: Selective State Space Attention Watch
Sessa introduces a novel decoder architecture that places attention within a feedback path for improved long-context sequence modeling.
GitHub stars n/a Velocity flat History 1 snapshot Sequence Modeling Apr 20 Pending
Towards Disentangled Preference Optimization Dynamics Beyond Likelihood Displacement Build Now
A plug-and-play reward calibration method that mitigates likelihood displacement in preference optimization for LLMs, improving downstream performance.
GitHub stars n/a Velocity flat History 1 snapshot LLM Alignment Apr 20 Pending High viability
LeGo-Code: Can Modular Curriculum Learning Advance Complex Code Generation? Insights from Text-to-SQL Build Now
A modular adapter composition strategy for curriculum learning that improves complex code generation by sequentially training tier-specific adapters on incremental complexity levels.
GitHub stars n/a Velocity flat History 1 snapshot Code Generation Apr 20 Pending High viability
ClawEnvKit: Automatic Environment Generation for Claw-Like Agents Build Now
Automate generation of diverse environments for claw-like agents, reducing manual effort and costs in robotics training.
GitHub stars n/a Velocity flat History 1 snapshot AI Tools for Robotics Apr 20 Pending High viability
Bounded Ratio Reinforcement Learning Build Now
Bounded Ratio Reinforcement Learning (BRRL) provides a theoretically grounded framework for policy optimization, outperforming PPO in stability and performance.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 20 Code High viability
A Control Architecture for Training-Free Memory Use Build Now
A training-free control architecture enhances LLM reasoning by intelligently managing memory, improving arithmetic benchmarks significantly.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Apr 20 Code High viability
Enhancing Tabular Anomaly Detection via Pseudo-Label-Guided Generation Build Now
PLAG enhances tabular anomaly detection by using pseudo-labels to generate localized, feature-level anomalies, achieving state-of-the-art performance and boosting existing unsupervised detectors.
GitHub stars n/a Velocity flat History 1 snapshot Tabular Anomaly Detection Apr 20 Code High viability
Revisiting Change VQA in Remote Sensing with Structured and Native Multimodal Qwen Models Build Now
Leveraging Qwen multimodal models with LoRA for improved change visual question answering in remote sensing imagery.
GitHub stars n/a Velocity flat History 1 snapshot Remote Sensing AI Apr 20 Code High viability
GeGS-PCR: Effective and Robust 3D Point Cloud Registration with Two-Stage Color-Enhanced Geometric-3DGS Fusion Build Now
A two-stage point cloud registration method that fuses geometric and color information for robust performance in challenging low-overlap scenarios.
GitHub stars n/a Velocity flat History 1 snapshot 3D Computer Vision Apr 20 Code High viability
An Integrated Deep-Learning Framework for Peptide-Protein Interaction Prediction and Target-Conditioned Peptide Generation with ConGA-PePPI and TC-PepGen Build Now
An integrated AI framework for peptide-protein interaction prediction and target-conditioned peptide generation to accelerate drug discovery.
GitHub stars n/a Velocity flat History 1 snapshot Biotech AI Apr 20 Code High viability
Toward Zero-Egress Psychiatric AI: On-Device LLM Deployment for Privacy-Preserving Mental Health Decision Support Build Now
A privacy-preserving, on-device AI platform for psychiatric decision support using lightweight LLMs, enabling real-time, local inference for sensitive mental health data.
GitHub stars n/a Velocity flat History 1 snapshot On-Device LLM Apr 20 Code High viability
Style-Based Neural Architectures for Real-Time Weather Classification Build Now
Style-based neural network architectures, including Multi-PatchGAN and Truncated ResNet50 with Gram Matrix and Attention, for real-time weather classification and other appearance-based tasks.
GitHub stars n/a Velocity flat History 1 snapshot Image Classification Apr 20 Code High viability
AIT Academy: Cultivating the Complete Agent with a Confucian Three-Domain Curriculum Build Now
A curriculum framework for AI agents that organizes capability development across scientific, humanities, and social domains, demonstrating improved security and social reasoning.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents Apr 20 Code High viability
RePrompT: Recurrent Prompt Tuning for Integrating Structured EHR Encoders with Large Language Models Build Now
A time-aware LLM framework that integrates structured EHR encoders via prompt tuning to capture longitudinal patient information and population-level patterns for clinical prediction.
GitHub stars n/a Velocity flat History 1 snapshot LLM Applications Apr 20 Code High viability
Aether: Network Validation Using Agentic AI and Digital Twin Build Now
Automate network change validation with agentic AI and a digital twin, reducing manual effort and errors.
GitHub stars n/a Velocity flat History 1 snapshot Network Operations AI Apr 20 Code High viability
Transition-Matrix Regularization for Next Dialogue Act Prediction in Counselling Conversations Build Now
A regularization technique that improves next dialogue act prediction in counselling conversations by incorporating empirical dialogue-flow statistics.
GitHub stars n/a Velocity flat History 1 snapshot Dialogue Systems Apr 20 Code High viability
LQM: Linguistically Motivated Multidimensional Quality Metrics for Machine Translation Watch
LQM is a linguistically motivated framework for evaluating machine translation quality, designed to capture dialect- and culture-specific errors in diglossic languages like Arabic.
GitHub stars n/a Velocity flat History 1 snapshot Machine Translation Apr 20 Pending
AnchorRefine: Synergy-Manipulation Based on Trajectory Anchor and Residual Refinement for Vision-Language-Action Models Build Now
A hierarchical framework for vision-language-action models that separates global trajectory planning from local execution refinement to improve robotic manipulation precision.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 20 Code High viability
SPENCE: A Syntactic Probe for Detecting Contamination in NL2SQL Benchmarks Build Now
A syntactic probing framework to detect and quantify benchmark contamination in NL2SQL datasets, revealing leakage in older benchmarks and validating newer ones.
GitHub stars n/a Velocity flat History 1 snapshot NL2SQL Apr 20 Code High viability
Long-Text-to-Image Generation via Compositional Prompt Decomposition Build Now
PRISM enables pre-trained text-to-image models to generate images from long descriptive paragraphs by decomposing prompts and merging independent noise predictions.
GitHub stars n/a Velocity flat History 1 snapshot Generative Video Apr 20 Code High viability
TPS-CalcBench: A Benchmark and Diagnostic Evaluation Framework for LLM Analytical Calculation Competence in Hypersonic Thermal Protection System Engineering Build Now
TPS-CalcBench is a diagnostic benchmark and evaluation framework for LLM analytical calculation competence in safety-critical aerospace engineering, including intervention methods.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 20 Code High viability
MHSafeEval: Role-Aware Interaction-Level Evaluation of Mental Health Safety in Large Language Models Build Now
An agent-based framework for evaluating mental health safety in LLMs by simulating multi-turn counseling interactions and identifying role-dependent harms.
GitHub stars n/a Velocity flat History 1 snapshot LLM Safety Apr 20 Code High viability
QuantumQA: Enhancing Scientific Reasoning via Physics-Consistent Dataset and Verification-Aware Reinforcement Learning Build Now
A physics-consistent dataset and verification-aware reinforcement learning approach to enhance LLM reliability in scientific domains.
GitHub stars n/a Velocity flat History 1 snapshot Scientific LLMs Apr 20 Code High viability
AJ-Bench: Benchmarking Agent-as-a-Judge for Environment-Aware Evaluation Build Now
AJ-Bench, a benchmark for evaluating Agent-as-a-Judge across search, data systems, and GUIs, demonstrating performance gains over LLM-as-a-Judge baselines.
GitHub stars n/a Velocity flat History 1 snapshot AI Agents Apr 20 Code High viability
AQPIM: Breaking the PIM Capacity Wall for LLMs with In-Memory Activation Quantization Build Now
AQPIM quantizes LLM activations directly in memory to drastically reduce decoding latency and improve speed for Processing-in-Memory architectures.
GitHub stars n/a Velocity flat History 1 snapshot LLM Optimization Apr 20 Code High viability
Co-evolving Agent Architectures and Interpretable Reasoning for Automated Optimization Build Now
A co-evolutionary framework that evolves agent architectures and reasoning paths for automated optimization tasks, outperforming existing methods.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 20 Code High viability
TLoRA: Task-aware Low Rank Adaptation of Large Language Models Build Now
TLoRA is a unified framework for parameter-efficient LLM fine-tuning that jointly optimizes initialization and rank allocation for improved performance across diverse tasks.
GitHub stars n/a Velocity flat History 1 snapshot LLM Fine-tuning Apr 20 Code High viability
SELF-EMO: Emotional Self-Evolution from Recognition to Consistent Expression Build Now
A self-evolving framework for LLMs to achieve state-of-the-art emotion recognition and consistent expression in conversations through self-play and reinforcement learning.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents Apr 20 Code High viability
Adversarial Humanities Benchmark: Results on Stylistic Robustness in Frontier Model Safety Build Now
A benchmark and dataset for evaluating the stylistic robustness of frontier model safety refusals, revealing significant weaknesses in generalization.
GitHub stars n/a Velocity flat History 1 snapshot LLM Safety Apr 20 Code High viability
Faster by Design: Interactive Aerodynamics via Neural Surrogates Trained on Expert-Validated CFD Build Now
A neural surrogate model trained on expert-validated CFD data for interactive aerodynamics design, enabling rapid exploration of design spaces in motorsport.
GitHub stars n/a Velocity flat History 1 snapshot Aerodynamics Apr 20 Code High viability
MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval Build Now
A large-scale, multimodal, and multilingual benchmark for evaluating mathematical reasoning and retrieval in generative and embedding-based systems.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal Reasoning Benchmark Apr 20 Code High viability
Before You Interpret the Profile: Validity Scaling for LLM Metacognitive Self-Report Build Now
This research introduces a validity scaling framework for LLM metacognitive self-report, identifying construct-level invalid models and providing a portable screening protocol.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 20 Pending High viability
STaD: Scaffolded Task Design for Identifying Compositional Skill Gaps in LLMs Build Now
STaD is a framework for generating scaffolded tasks to systematically identify and visualize compositional skill gaps in LLMs.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 20 Code High viability
Do LLMs Need to See Everything? A Benchmark and Study of Failures in LLM-driven Smartphone Automation using Screentext vs. Screenshots Build Now
A benchmark and study of LLM-driven smartphone automation failures, revealing insights into multimodal vs. text-only inputs and common error patterns.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 20 Code High viability
PDDL-Mind: Large Language Models are Capable on Belief Reasoning with Reliable State Tracking Build Now
A neuro-symbolic framework that enhances LLM belief reasoning for theory-of-mind tasks by explicitly tracking environment states using PDDL.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 20 Code High viability
Class-specific diffusion models improve military object detection in a low-data domain Build Now
Leverage class-specific diffusion models and structural guidance to generate synthetic data for significantly improving military object detection in low-data scenarios.
GitHub stars n/a Velocity flat History 1 snapshot Generative AI for Computer Vision Apr 20 Code High viability
WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models Watch
A multimodal benchmark for evaluating LLMs in end-to-end web coding, including generation, editing, and repair.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 20 Code
Forget What Matters, Keep the Rest: Selective Unlearning of Informative Tokens Build Now
A novel unlearning method for LLMs that selectively removes informative tokens based on predictive entropy, preserving model utility while mitigating adversarial behaviors.
GitHub stars n/a Velocity flat History 1 snapshot LLM Unlearning Apr 20 Code High viability
Voronoi-guided Bilateral 2D Gaussian Splatting for Arbitrary-Scale Hyperspectral Image Super-Resolution Build Now
A Gaussian-Splatting based framework for arbitrary-scale hyperspectral image super-resolution that adaptively reconstructs spatial details while preserving spectral fidelity.
GitHub stars n/a Velocity flat History 1 snapshot Computer Vision Apr 20 Code High viability
Different Paths to Harmful Compliance: Behavioral Side Effects and Mechanistic Divergence Across LLM Jailbreaks Build Now
Analyzes distinct jailbreaking methods for open-weight LLMs, revealing divergent behavioral and mechanistic properties despite similar harmfulness.
GitHub stars n/a Velocity flat History 1 snapshot LLM Safety Apr 20 Code High viability
ContraPrompt: Contrastive Prompt Optimization via Dyadic Reasoning Trace Analysis Build Now
ContraPrompt optimizes LLM prompts by analyzing the reasoning traces of successful and failed attempts, significantly outperforming existing methods on multiple benchmarks.
GitHub stars n/a Velocity flat History 1 snapshot LLM Optimization Apr 20 Pending High viability
Screen Before You Interpret: A Portable Validity Protocol for Benchmark-Based LLM Confidence Signals Watch
A portable protocol for validating LLM confidence signals, adapted from clinical psychology, can be applied across benchmarks and probe formats.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 20 Pending
SafeAnchor: Preventing Cumulative Safety Erosion in Continual Domain Adaptation of Large Language Models Build Now
A framework to prevent safety alignment erosion in LLMs during continual domain adaptation.
GitHub stars n/a Velocity flat History 1 snapshot LLM Safety Apr 20 Code High viability
Adversarial Arena: Crowdsourcing Data Generation through Interactive Competition Build Now
Adversarial Arena crowdsources high-quality conversational datasets for LLM training through interactive competition, demonstrating significant improvements in secure code generation.
GitHub stars n/a Velocity flat History 1 snapshot Data Generation Apr 20 Pending High viability
AI Approach for MRI-only Full-Spine Vertebral Segmentation and 3D Reconstruction in Paediatric Scoliosis Build Now
An AI framework enables radiation-free 3D spine reconstruction from MRI for pediatric scoliosis assessment, significantly reducing processing time and improving accuracy.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 20 Code High viability
ExAI5G: A Logic-Based Explainable AI Framework for Intrusion Detection in 5G Networks Build Now
A logic-based explainable AI framework for 5G intrusion detection that integrates Transformer models with XAI techniques to achieve high accuracy and transparent reasoning.
GitHub stars n/a Velocity flat History 1 snapshot Explainable AI for Cybersecurity Apr 20 Code High viability
First, Do No Harm (With LLMs): Mitigating Racial Bias via Agentic Workflows Build Now
An agentic workflow that leverages retrieval to mitigate racial bias in LLM-generated medical cases and differential diagnoses, with code available.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Bias Mitigation Apr 20 Code High viability
Bridging the Reasoning Gap in Vietnamese with Small Language Models via Test-Time Scaling Build Now
This paper demonstrates that Test-Time Scaling and Supervised Fine-Tuning can bridge the reasoning gap in Vietnamese Small Language Models, outperforming complex agentic workflows for edge deployment.
GitHub stars n/a Velocity flat History 1 snapshot Small Language Models Apr 20 Code High viability
Prompting Foundation Models for Zero-Shot Ship Instance Segmentation in SAR Imagery Build Now
Enabling zero-shot ship instance segmentation in SAR imagery by prompting foundation models with bounding boxes from a SAR-trained detector.
GitHub stars n/a Velocity flat History 1 snapshot SAR Image Analysis Apr 20 Code High viability
Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering Build Now
Latent Phase-Shift Rollback (LPSR) is an inference-time technique that corrects unrecoverable reasoning errors in LLMs by monitoring residual streams and steering the KV-cache, without fine-tuning.
GitHub stars n/a Velocity flat History 1 snapshot LLM Inference Apr 20 Code High viability
WorldDB: A Vector Graph-of-Worlds Memory Engine with Ontology-Aware Write-Time Reconciliation Build Now
WorldDB is a novel vector graph-of-worlds memory engine that significantly improves agentic system performance by enabling recursive world composition and ontology-aware write-time reconciliation.
GitHub stars n/a Velocity flat History 1 snapshot Memory Engines Apr 20 Pending High viability
CADMAS-CTX: Contextual Capability Calibration for Multi-Agent Delegation Build Now
CADMAS-CTX is a framework for contextual capability calibration in multi-agent delegation, improving teamwork by adapting agent capabilities to task context.
GitHub stars n/a Velocity flat History 1 snapshot Multi-Agent Systems Apr 20 Code High viability
A Generalized Synthetic Control Method for Baseline Estimation in Demand Response Services Build Now
A Generalized Synthetic Control Method for baseline estimation in demand response services, outperforming existing methods by treating it as a dynamic counterfactual prediction problem.
GitHub stars n/a Velocity flat History 1 snapshot Causal Inference Apr 20 Code High viability
Diversity Collapse in Multi-Agent LLM Systems: Structural Coupling and Collective Failure in Open-Ended Idea Generation Ignore
Identifies diversity collapse in multi-agent LLM systems due to structural coupling, offering insights for designing more creative AI collaborations.
GitHub stars n/a Velocity flat History 1 snapshot Multi-Agent Systems Apr 20 Pending
Benchmarking System Dynamics AI Assistants: Cloud Versus Local LLMs on CLD Extraction and Discussion Build Now
A systematic evaluation of cloud vs. local LLMs for System Dynamics AI assistance, providing insights into performance trade-offs and practical deployment guides.
GitHub stars n/a Velocity flat History 1 snapshot LLM Benchmarking Apr 20 Code High viability
Learning from AVA: Early Lessons from a Curated and Trustworthy Generative AI for Policy and Development Research Watch
A GenAI platform for policy experts that provides evidence-based syntheses with verifiable citations and reasoned abstention, saving users significant weekly hours.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 20 High viability
Using large language models for embodied planning introduces systematic safety risks Watch
A benchmark and analysis revealing systematic safety risks in using large language models for robotic planning.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Agents Apr 20 Code
A novel LSTM music generator based on the fractional time-frequency feature extraction Watch
An AI music generator using fractional Fourier transform for feature extraction and LSTM networks for generating high-quality music comparable to human compositions.
GitHub stars n/a Velocity flat History 1 snapshot Generative Audio Apr 20 Code
Learning the Riccati solution operator for time-varying LQR via Deep Operator Networks Watch
A deep operator network framework that learns a surrogate for the Riccati solution operator, enabling fast online evaluation of optimal feedbacks for Linear Quadratic Regulator problems.
GitHub stars n/a Velocity flat History 1 snapshot Optimal Control Apr 20 Code
Mix and Match: Context Pairing for Scalable Topic-Controlled Educational Summarisation Watch
A data augmentation strategy for training smaller language models to perform topic-controlled educational summarization, improving performance with less data.
GitHub stars n/a Velocity flat History 1 snapshot Educational Summarization Apr 20 Code
PARM: Pipeline-Adapted Reward Model Watch
A pipeline-adapted reward model that aligns LLM rewards with downstream pipeline execution outcomes for improved consistency in multi-stage LLM applications.
GitHub stars n/a Velocity flat History 1 snapshot LLM Alignment Apr 20 Code
Modular Representation Compression: Adapting LLMs for Efficient and Effective Recommendations Watch
MARC is a novel framework that compresses LLM representations for recommendation systems by controlling modularity, achieving significant online A/B test lift.
GitHub stars n/a Velocity flat History 1 snapshot Recommendation Systems Apr 20 High viability
Latent Abstraction for Retrieval-Augmented Generation Watch
A unified framework for RAG that performs encoding, retrieval, and generation entirely within an LLM's latent space, improving efficiency and performance.
GitHub stars n/a Velocity flat History 1 snapshot Retrieval Augmented Generation Apr 20 Code
IceBreaker for Conversational Agents: Breaking the First-Message Barrier with Personalized Starters Watch
IceBreaker generates personalized conversation starters for AI agents, overcoming the initial user engagement barrier and demonstrably increasing user activity and click-through rates in production.
GitHub stars n/a Velocity flat History 1 snapshot Conversational AI Apr 20 High viability
Latent Preference Modeling for Cross-Session Personalized Tool Calling Watch
A benchmark and memory-augmented method for LLM agents to personalize tool calling by representing user preferences as evolving hypotheses, improving accuracy with minimal token usage.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 20 Code
Bayesian Active Learning with Gaussian Processes Guided by LLM Relevance Scoring for Dense Passage Retrieval Watch
A Bayesian active learning framework that uses Gaussian Processes guided by LLM relevance scoring to improve dense passage retrieval efficiency and coverage.
GitHub stars n/a Velocity flat History 1 snapshot Information Retrieval Apr 20 Code
AdaCluster: Adaptive Query-Key Clustering for Sparse Attention in Video Generation Build Now
AdaCluster is a training-free framework that accelerates video diffusion transformers with adaptive query-key clustering, achieving significant speedups with negligible quality loss.
GitHub stars n/a Velocity flat History 1 snapshot Video Generation Apr 20 Pending High viability
Progressive Online Video Understanding with Evidence-Aligned Timing and Transparent Decisions Build Now
A framework for real-time video understanding that aligns responses with evidence and provides transparent decision-making.
GitHub stars n/a Velocity flat History 1 snapshot Video Understanding Apr 20 Pending High viability
Committed SAE-Feature Traces for Audited-Session Substitution Detection in Hosted LLMs Watch
A commit-open protocol uses sparse autoencoder feature traces to detect silent model substitutions in hosted LLMs, outperforming existing methods.
GitHub stars n/a Velocity flat History 1 snapshot LLM Security Apr 20 High viability
Multilingual Training and Evaluation Resources for Vision-Language Models Watch
A comprehensive suite of multilingual resources for training and evaluating Vision-Language Models across five European languages, demonstrating consistent benefits for non-English benchmarks.
GitHub stars n/a Velocity flat History 1 snapshot Vision-Language Models Apr 20 Code
Can Explicit Physical Feasibility Benefit VLA Learning? An Empirical Study Watch
Integrating explicit physical feasibility supervision into Vision-Language-Action models improves robot policy reliability and learning efficiency.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 20 Code
Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration Watch
LLM agents are trained for spontaneous, reward-free self-evolution by exploring world knowledge, leading to significant performance gains on downstream tasks.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 20
PBSBench: A Multi-Level Vision-Language Framework and Benchmark for Hematopathology Whole Slide Image Interpretation Build Now
PBSBench offers a targeted vision-language framework for improving analysis of hematopathology slides, enhancing diagnostic precision.
GitHub stars n/a Velocity flat History 1 snapshot AI in Healthcare Apr 19 Pending High viability
RASP-Tuner: Retrieval-Augmented Soft Prompts for Context-Aware Black-Box Optimization in Non-Stationary Environments Watch
RASP-Tuner, a retrieval-augmented soft prompt method for efficient context-aware black-box optimization in non-stationary environments.
GitHub stars n/a Velocity flat History 1 snapshot Black-Box Optimization Apr 20 Code
Learning from Less: Measuring the Effectiveness of RLVR in Low Data and Compute Regimes Ignore
This research explores effective RLVR fine-tuning strategies for small language models in low-data environments, demonstrating improved sample efficiency and generalization through procedural datasets.
GitHub stars n/a Velocity flat History 1 snapshot LLM Fine-tuning Apr 20 Code
Periodic Steady-State Control of a Handkerchief-Spinning Task Using a Parallel Anti-Parallelogram Tendon-driven Wrist Ignore
A novel tendon-driven wrist and hierarchical control scheme enable precise, high-speed spinning of flexible objects like handkerchiefs.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Control Apr 20 Code
PV-SQL: Synergizing Database Probing and Rule-based Verification for Text-to-SQL Agents Build Now
An agentic framework that synergizes database probing and rule-based verification to improve text-to-SQL accuracy and efficiency for complex queries.
GitHub stars n/a Velocity flat History 1 snapshot Text-to-SQL Apr 19 Code High viability
Video-Robin: Autoregressive Diffusion Planning for Intent-Grounded Video-to-Music Generation Build Now
A text-conditioned video-to-music generation model that balances musical fidelity and semantic understanding with fine-grained creator control and faster inference.
GitHub stars n/a Velocity flat History 1 snapshot Generative Audio Apr 19 Code High viability
Is SAM3 ready for pathology segmentation? Ignore
Evaluate the capability of SAM3 for pathology segmentation to understand its limitations and guide domain adaptation.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 20 Code
Beyond Reproduction: A Paired-Task Framework for Assessing LLM Comprehension and Creativity in Literary Translation Ignore
A framework for evaluating LLM literary translation that disentangles comprehension from creativity, revealing significant gaps in current models.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 20 Code
Training and Agentic Inference Strategies for LLM-based Manim Animation Generation Ignore
A novel training and inference pipeline for LLM-based Manim animation generation that improves code quality and visual outputs.
GitHub stars n/a Velocity flat History 1 snapshot Generative Video Apr 20
MM-JudgeBias: A Benchmark for Evaluating Compositional Biases in MLLM-as-a-Judge Ignore
A benchmark to identify and mitigate compositional biases in multimodal large language models used for automated evaluation.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 20 Code
Document-as-Image Representations Fall Short for Scientific Retrieval Ignore
A new benchmark and analysis showing that text-based representations outperform image-based ones for scientific document retrieval, even for figure-based queries.
GitHub stars n/a Velocity flat History 1 snapshot Document Retrieval Apr 20 Code
Implicit neural representations as a coordinate-based framework for continuous environmental field reconstruction from sparse ecological observations Ignore
Implicit neural representations offer a coordinate-based framework for continuous environmental field reconstruction from sparse ecological data.
GitHub stars n/a Velocity flat History 1 snapshot Environmental Field Reconstruction Apr 20 Code
Six Llamas: Comparative Religious Ethics Through LoRA-Adapted Language Models Watch
Fine-tuning Llama models with LoRA on religious texts to analyze differentiated ethical reasoning patterns.
GitHub stars n/a Velocity flat History 1 snapshot LLM Ethics Apr 20
Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale Ignore
This research challenges the notion of cross-modal representational convergence in neural networks, suggesting that models trained on different modalities learn distinct, rather than shared, representations of reality.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal AI Apr 20 Code
On the Importance and Evaluation of Narrativity in Natural Language AI Explanations Ignore
Proposes new metrics to evaluate and generate narrative explanations for AI, moving beyond feature importance to provide more understandable 'why' behind predictions.
GitHub stars n/a Velocity flat History 1 snapshot Explainable AI (XAI) Apr 20 Code
LLM Safety From Within: Detecting Harmful Content with Internal Representations Ignore
Revolutionize content moderation by detecting harmful content using internal representations of LLMs for improved safety.
GitHub stars n/a Velocity flat History 1 snapshot AI Safety/Content Moderation Apr 20 Pending
Poly-EPO: Training Exploratory Reasoning Models Build Now
A framework for post-training language models that encourages optimistic exploration and synergizes exploration with exploitation for improved generalization and diversity.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Apr 19 Pending High viability
DIRCR: Dual-Inference Rule-Contrastive Reasoning for Solving RAVENs Build Now
A dual-inference rule-contrastive reasoning model that significantly enhances abstract visual reasoning robustness and generalization on RAVEN datasets.
GitHub stars n/a Velocity flat History 1 snapshot Abstract Visual Reasoning Apr 19 Pending High viability
Terminal Wrench: A Dataset of 331 Reward-Hackable Environments and 3,632 Exploit Trajectories Build Now
A dataset of reward-hackable environments and exploit trajectories for testing frontier LLM security against sophisticated attacks.
GitHub stars n/a Velocity flat History 1 snapshot AI Safety Apr 19 Pending High viability
SPREG: Structured Plan Repair with Entropy-Guided Test-Time Intervention for Large Language Model Reasoning Watch
A lightweight inference-time framework for LLMs that uses real-time entropy monitoring to detect and rectify logical failures during long-chain reasoning without compromising fluency.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Apr 20 Pending
DGSSM: Diffusion guided state-space models for multimodal salient object detection Build Now
A diffusion-guided state-space model for multimodal salient object detection that improves boundary accuracy and outperforms existing methods.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal Object Detection Apr 19 Code High viability
Semantic Density Effect (SDE): Maximizing Information Per Token Improves LLM Accuracy Build Now
A novel prompting technique that maximizes information per token to improve LLM accuracy and reduce hallucinations without additional tokens or latency.
GitHub stars n/a Velocity flat History 1 snapshot LLM Prompting Apr 19 Code High viability
Beyond Static Snapshots: A Grounded Evaluation Framework for Language Models at the Agentic Frontier Build Now
A new framework and system for evaluating and fine-tuning LLM agents that eliminates reward hacking and reduces hardware requirements.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents Apr 19 Code High viability
Copy-as-Decode: Grammar-Constrained Parallel Prefill for LLM Editing Ignore
Accelerate LLM text and code editing by recasting generation as structured decoding over a copy-and-generate grammar, significantly reducing regeneration time.
LLM Editing Apr 20
Towards Intelligent Legal Document Analysis: CNN-Driven Classification of Case Law Texts Ignore
A lightweight CNN framework for high-accuracy, fast classification of legal case law texts.
Legal AI Apr 20
Semantic Entanglement in Vector-Based Retrieval: A Formal Framework and Context-Conditioned Disentanglement Pipeline for Agentic RAG Systems Watch
A pipeline to disentangle semantic entanglement in vector embeddings for improved retrieval precision in RAG systems.
RAG Systems Apr 20
State Transfer Reveals Reuse in Controlled Routing Ignore
This research uses controlled routing tasks to reveal how prompt-based interventions alter LLM behavior and identify where relevant state is represented.
GitHub stars n/a Velocity flat History 1 snapshot LLM Interpretability Apr 20
Party Autonomy in Determining the Law Applicable to Non-contractual Obligations concerning Cross-Border Data Transfers Ignore
Proposes party autonomy as a solution for determining applicable law in cross-border data transfer disputes, aligning non-contractual obligations with contractual choices.
GitHub stars n/a Velocity flat History 1 snapshot Legal AI Apr 20 Code
Understanding Secret Leakage Risks in Code LLMs: A Tokenization Perspective Ignore
Investigates secret leakage risks in code LLMs, identifying a 'gibberish bias' in BPE tokenization as a root cause for memorization.
GitHub stars n/a Velocity flat History 1 snapshot LLM Security Apr 20
Ranking Abuse via Strategic Pairwise Data Perturbations Ignore
This paper explores the vulnerability of ranking systems to strategic data manipulation, proposing an attack method to identify high-impact perturbations and highlighting the need for more robust aggregation methods.
GitHub stars n/a Velocity flat History 1 snapshot AI Safety & Robustness Apr 20 Code
AlphaContext: An Evolutionary Tree-based Psychometric Context Generator for Creativity Assessment Ignore
An evolutionary tree-based generator for creating psychometric contexts to assess creativity.
GitHub stars n/a Velocity flat History 1 snapshot AI for Creativity Assessment Apr 20
When Can LLMs Learn to Reason with Weak Supervision? Ignore
This paper investigates when Large Language Models can learn to reason effectively with weak supervision by analyzing training dynamics and identifying key properties for generalization.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Apr 20
Agentic Forecasting using Sequential Bayesian Updating of Linguistic Beliefs Ignore
A new approach to agentic forecasting using Bayesian updating to estimate future events with linguistic data.
GitHub stars n/a Velocity flat History 1 snapshot Predictive Analytics Apr 20 Code
Randomly Initialized Networks Can Learn from Peer-to-Peer Consensus Ignore
Demonstrating that randomly initialized networks can learn representations through peer-to-peer consensus without complex mechanisms.
GitHub stars n/a Velocity flat History 1 snapshot Self-Supervised Learning Apr 20 Code
Dissecting AI Trading: Behavioral Finance and Market Bubbles Ignore
This study uses LLM agents in simulated markets to reveal behavioral finance patterns and demonstrates how prompt interventions can control market bubble formation.
GitHub stars n/a Velocity flat History 1 snapshot AI Trading Apr 20
Latent Fourier Transform Ignore
A framework for generative music models that uses a latent-space Fourier transform to provide frequency-domain controls for timescale-based manipulation and blending.
GitHub stars n/a Velocity flat History 1 snapshot Generative Audio Apr 20
Physics-Informed Causal MDPs for Sequential Constraint Repair in Engineering Simulation Pipelines Ignore
A framework for constrained reinforcement learning in engineering simulations that uses causal identification and physics-guided estimation to improve success rates.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 20 Code
Community-Led AI Integration for Wildfire Risk Assessment: A Participatory AI Literacy and Explainability Integration (PALEI) Framework in Los Angeles, CA Ignore
A community-led framework for integrating AI into wildfire risk assessment with a focus on literacy and explainability.
GitHub stars n/a Velocity flat History 1 snapshot AI for Climate Apr 20
From Fallback to Frontline: When Can LLMs be Superior Annotators of Human Perspectives? Ignore
Investigates conditions under which LLMs can outperform human annotators in estimating aggregate subgroup opinions on subjective tasks.
GitHub stars n/a Velocity flat History 1 snapshot LLM Applications Apr 20 Pending
Characterizing Model-Native Skills Watch
This paper proposes model-native skills derived from internal representations for more effective LLM intervention and behavior modification.
GitHub stars n/a Velocity flat History 1 snapshot LLM Intervention Apr 19 High viability
KnowledgeBerg: Evaluating Systematic Knowledge Coverage and Compositional Reasoning in Large Language Models Watch
KnowledgeBerg is a new benchmark to evaluate LLMs on systematic knowledge coverage and compositional reasoning, revealing significant limitations.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 19 Code
LEPO: \underline{L}atent R\underline{e}asoning \underline{P}olicy \underline{O}ptimization for Large Language~Models Ignore
A novel framework for large language models that applies reinforcement learning directly to continuous latent representations to improve reasoning diversity and performance.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 20
CAPO: Counterfactual Credit Assignment in Sequential Cooperative Teams Ignore
This paper introduces CAPO, a critic-free policy-gradient algorithm for sequential cooperative teams that derives a per-agent learning signal to improve individual learnability.
GitHub stars n/a Velocity flat History 1 snapshot Multi-Agent Systems Apr 20
On the Reliability of Computer Use Agents Ignore
This research identifies key factors like stochasticity, ambiguity, and behavioral variability that cause unreliability in computer-use AI agents.
GitHub stars n/a Velocity flat History 1 snapshot AI Agents Apr 20
Semantic-based Distributed Learning for Diverse and Discriminative Representations Ignore
A novel distributed learning framework for diverse and discriminative representations in large-scale scenarios, theoretically proven to maintain desired properties.
GitHub stars n/a Velocity flat History 1 snapshot Distributed Learning Apr 20 Pending
HEALing Entropy Collapse: Enhancing Exploration in Few-Shot RLVR via Hybrid-Domain Entropy Dynamics Alignment Ignore
A framework for improving few-shot reinforcement learning with verifiable rewards by aligning entropy dynamics between general and target domains.
GitHub stars n/a Velocity flat History 1 snapshot Few-Shot RL Apr 20
Does "Do Differentiable Simulators Give Better Policy Gradients?'' Give Better Policy Gradients? Ignore
This paper investigates methods to improve policy gradient reinforcement learning by addressing discontinuities in differentiable simulators, proposing new estimators for better performance.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 20
The Topological Dual of a Dataset: A Logic-to-Topology Encoding for AlphaGeometry-Style Data Ignore
A novel logic-to-topology encoding framework to bridge formal logic, topology, and neural processing for mechanistic interpretability in neuro-symbolic AI.
GitHub stars n/a Velocity flat History 1 snapshot Neuro-Symbolic AI Apr 20 Code
Learning to Correct: Calibrated Reinforcement Learning for Multi-Attempt Chain-of-Thought Ignore
A calibrated reinforcement learning approach for multi-attempt chain-of-thought reasoning that optimizes verification success rates.
GitHub stars n/a Velocity flat History 1 snapshot Chain-of-Thought Reasoning Apr 20
Toward Reusability of AI Models Using Dynamic Updates of AI Documentation Watch
This work introduces a methodology for creating agile, data-driven AI model documentation to improve model reusability.
GitHub stars n/a Velocity flat History 1 snapshot AI Documentation Apr 19 Code
Multi-Agent Systems: From Classical Paradigms to Large Foundation Model-Enabled Futures Ignore
This survey reviews classical multi-agent systems and explores their evolution towards foundation model-enabled futures, outlining challenges and opportunities.
GitHub stars n/a Velocity flat History 1 snapshot Multi-Agent Systems Apr 20
Architectural Design Decisions in AI Agent Harnesses Ignore
This paper analyzes architectural design decisions in 70 publicly available AI agent systems to identify recurring patterns and provide guidance for framework designers.
GitHub stars n/a Velocity flat History 1 snapshot AI Agents Apr 20 Pending
Depth Registers Unlock W4A4 on SwiGLU: A Reader/Generator Decomposition Ignore
A theoretical exploration of post-training quantization techniques for language models, focusing on understanding error sources rather than product development.
GitHub stars n/a Velocity flat History 1 snapshot LLM Quantization Apr 20
STEP-PD: Stage-Aware and Explainable Parkinson's Disease Severity Classification Using Multimodal Clinical Assessments Watch
An interpretable machine learning framework for stage-aware Parkinson's disease severity classification using multimodal clinical data.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 19
How Much Data is Enough? The Zeta Law of Discoverability in Biomedical Data, featuring the enigmatic Riemann zeta function Ignore
A theoretical framework using the Riemann zeta function to predict when additional data will substantially improve performance in biomedical discovery.
GitHub stars n/a Velocity flat History 1 snapshot Biomedical Data Discoverability Apr 19 Code
Concurrent Criterion Validation of a Validity Screen for LLM Confidence Signals via Selective Prediction Ignore
A validity screen for LLM confidence signals demonstrates its ability to predict selective prediction performance across various models and datasets.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 20 Pending
Understanding Human Actions through the Lens of Executable Models Ignore
Introduces a domain-specific language EXACT to represent human motions as underspecified motion programs for zero-shot policy inference and compositional modeling.
GitHub stars n/a Velocity flat History 1 snapshot Human Action Understanding Apr 20
A Sugeno Integral View of Binarized Neural Network Inference Ignore
This paper establishes a theoretical connection between binarized neural networks and Sugeno integrals, offering a new framework for understanding input importance and interactions.
GitHub stars n/a Velocity flat History 1 snapshot AI Theory Apr 20
Prompt Optimization Enables Stable Algorithmic Collusion in LLM Agents Ignore
Prompt optimization can lead to emergent and stable algorithmic collusion in LLM agents participating in market simulations.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents Apr 20 Pending
The Collaboration Gap in Human-AI Work Ignore
A conceptual framework for understanding the fragility of human-AI collaboration, identifying factors beyond model capability that impact stable interaction.
GitHub stars n/a Velocity flat History 1 snapshot Human-AI Collaboration Apr 20
Provable Coordination for LLM Agents via Message Sequence Charts Ignore
A domain-specific language and framework for provable coordination of LLM agents, ensuring deadlock-free communication.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 19
SafeAgent: A Runtime Protection Architecture for Agentic Systems Ignore
A runtime security architecture for LLM agents that improves robustness against prompt-injection attacks.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents Apr 19
How Much Cache Does Reasoning Need? Depth-Cache Tradeoffs in KV-Compressed Transformers Ignore
This paper theoretically analyzes the depth-cache tradeoffs in KV-compressed Transformers, focusing on memory bottlenecks during inference.
GitHub stars n/a Velocity flat History 1 snapshot LLM Inference Optimization Apr 20
Symbolic Synthesis for LTLf+ Obligations Ignore
A theoretical framework for symbolic synthesis of obligation properties in LTLf, showing efficiency comparable to LTLf synthesis.
GitHub stars n/a Velocity flat History 1 snapshot Formal Methods Apr 20
The implicated scientist: on the role of AI researchers in the development of weapons systems Ignore
This paper examines the ethical implications of AI researchers' involvement in the development of weapons systems and explores avenues for solidarity with victims of technologically-driven injustices.
GitHub stars n/a Velocity flat History 1 snapshot AI Ethics Apr 20
AIRA: AI-Induced Risk Audit: A Structured Inspection Framework for AI-Generated Code Ignore
A framework for auditing AI-generated code to detect failure-untruthful patterns, crucial for safety-critical systems.
GitHub stars n/a Velocity flat History 1 snapshot AI Code Auditing Apr 19
On the Emergence of Syntax by Means of Local Interaction Ignore
A minimal neural cellular automaton spontaneously develops a structured internal representation for syntactic processing, mimicking CKY algorithm principles.
GitHub stars n/a Velocity flat History 1 snapshot AI Theory Apr 20
Polysemantic Experts, Monosemantic Paths: Routing as Control in MoEs Ignore
A parameter-free decomposition for Mixture-of-Experts models that separates control signals from content channels to improve compositional specialization across layers.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 20 Pending
On The Mathematics of the Natural Physics of Optimization Ignore
This paper explores the theoretical foundations of optimization algorithms by drawing parallels to natural physics and non-Newtonian dynamics.
GitHub stars n/a Velocity flat History 1 snapshot Optimization Theory Apr 19
Polarization and Integration in Global AI Research Ignore
Analysis of polarization and integration in global AI research over three decades, highlighting US and China's diverging influence.
GitHub stars n/a Velocity flat History 1 snapshot Research Trends Apr 19