IDOBE: Infectious Disease Outbreak forecasting Benchmark Ecosystem Build Now
A comprehensive benchmark ecosystem for infectious disease outbreak forecasting, providing curated datasets and baseline models for reproducible research and development.
GitHub stars n/a Velocity flat History 1 snapshot Epidemiological Forecasting Apr 20 Pending High viability
DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization Build Now
DuQuant++ enhances microscaling FP4 quantization for LLM inference by adapting outlier-aware rotation to MXFP4, achieving state-of-the-art performance with reduced computational cost.
GitHub stars n/a Velocity flat History pending LLM Quantization Apr 20 Pending High viability
Evaluating Multi-Hop Reasoning in RAG Systems: A Comparison of LLM-Based Retriever Evaluation Strategies Build Now
CARE, a context-aware retriever evaluation strategy for RAG systems, significantly improves multi-hop reasoning evaluation compared to existing methods.
GitHub stars n/a Velocity flat History pending RAG Evaluation Apr 20 Pending High viability
Soft Label Pruning and Quantization for Large-Scale Dataset Distillation Build Now
A method to drastically reduce soft label storage in large-scale dataset distillation by pruning and quantizing labels, improving accuracy and compression.
GitHub stars n/a Velocity flat History pending Dataset Distillation Apr 20 Pending High viability
EVE: Verifiable Self-Evolution of MLLMs via Executable Visual Transformations Build Now
EVE enables verifiable self-evolution of MLLMs through executable visual transformations, generating diverse and challenging training data with verified ground truth.
GitHub stars n/a Velocity flat History pending MLLM Self-Evolution Apr 20 Pending High viability
Screen Before You Interpret: A Portable Validity Protocol for Benchmark-Based LLM Confidence Signals Build Now
A portable protocol for validating LLM confidence signals, inspired by clinical psychology, provides a three-tier classification system to ensure reliable use.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 20 Pending High viability
DSAINet: An Efficient Dual-Scale Attentive Interaction Network for General EEG Decoding Build Now
A novel dual-scale attentive network for general EEG decoding that outperforms existing methods across multiple datasets with a single architecture.
GitHub stars n/a Velocity flat History pending EEG Decoding Apr 20 Pending High viability
DocQAC: Adaptive Trie-Guided Decoding for Effective In-Document Query Auto-Completion Build Now
An adaptive trie-guided decoding framework for effective in-document query auto-completion that steers language models towards high-quality completions using document context.
GitHub stars n/a Velocity flat History pending Document Search & Completion Apr 20 Pending High viability
Brain-Inspired Capture: Evidence-Driven Neuromimetic Perceptual Simulation for Visual Decoding Build Now
A neuromimetic simulation paradigm that aligns neural and visual modalities for improved brain-computer interfaces by emulating human visual system processing.
GitHub stars n/a Velocity flat History pending Brain-Computer Interfaces Apr 20 Pending High viability
One Pass for All: A Discrete Diffusion Model for Knowledge Graph Triple Set Prediction Build Now
A discrete diffusion model that generates complete knowledge graphs in one pass, ensuring consistency and achieving state-of-the-art performance.
GitHub stars n/a Velocity flat History pending Knowledge Graph AI Apr 20 Pending High viability
Evolutionary Negative Module Pruning for Better LoRA Merging Build Now
A plug-and-play method to prune negative LoRA modules before merging, improving performance across language and vision tasks.
GitHub stars n/a Velocity flat History 1 snapshot LLM Optimization Apr 20 Pending High viability
Reverse Constitutional AI: A Framework for Controllable Toxic Data Generation via Probability-Clamped RLAIF Build Now
A framework for automatically generating controllable toxic data to improve LLM safety and red teaming.
GitHub stars n/a Velocity flat History pending LLM Safety Apr 20 Pending High viability
Negative Advantage Is a Double-Edged Sword: Calibrating Advantage in GRPO for Deep Search Build Now
CalibAdv, an advantage calibration method for deep search agents, improves performance and stability by downscaling negative advantages and rebalancing positive/negative advantages.
GitHub stars n/a Velocity flat History pending LLM Agents Apr 20 Pending High viability
Stability Implies Redundancy: Delta Attention Selective Halting for Efficient Long-Context Prefilling Build Now
DASH is a training-free method that selectively halts stabilized tokens during LLM prefilling to achieve significant speedups while preserving accuracy and hardware efficiency.
GitHub stars n/a Velocity flat History pending LLM Inference Optimization Apr 20 Pending High viability
OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning Build Now
A novel framework for reinforcement learning that unifies offline teacher guidance and online learning to improve LLM reasoning and exploration capabilities.
GitHub stars n/a Velocity flat History pending Reinforcement Learning Apr 20 Pending High viability
Scalable Neighborhood-Based Multi-Agent Actor-Critic Build Now
A scalable multi-agent reinforcement learning method reduces critic computation by focusing on nearby agents, enabling better performance and faster convergence.
GitHub stars n/a Velocity flat History pending Multi-Agent RL Apr 20 Pending High viability
LoReC: Rethinking Large Language Models for Graph Data Analysis Build Now
LoReC is a plug-and-play method that enhances Large Language Models for graph data analysis by improving their understanding of graph structures and information.
GitHub stars n/a Velocity flat History pending Graph LLMs Apr 20 Pending High viability
Heterogeneity in Formal Linguistic Competence of Language Models: Is Data the Real Bottleneck? Build Now
This research demonstrates that targeted data augmentation in LLMs can significantly improve performance on under-represented linguistic phenomena, suggesting data composition is key.
GitHub stars n/a Velocity flat History pending LLM Data Composition Apr 20 Pending High viability
Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation Build Now
An end-to-end pipeline that extracts simulation-ready 3D assets from autonomous driving logs, enabling scalable AV development.
GitHub stars n/a Velocity flat History pending 3D Asset Generation Apr 20 Code High viability
TacticGen: Grounding Adaptable and Scalable Generation of Football Tactics Build Now
Generate adaptable and scalable football tactics using a multi-agent diffusion transformer, validated by experts.
GitHub stars n/a Velocity flat History pending Generative AI Apr 20 Code High viability
Region-Grounded Report Generation for 3D Medical Imaging: A Fine-Grained Dataset and Graph-Enhanced Framework Build Now
A graph-enhanced framework for generating region-grounded reports from 3D medical imaging, leveraging a new annotated dataset.
GitHub stars n/a Velocity flat History 1 snapshot Medical Imaging AI Apr 20 Code High viability
LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent Build Now
LiteResearcher is a scalable RL training framework for research agents that uses a lite virtual world to achieve state-of-the-art performance on benchmarks, outperforming larger models.
GitHub stars n/a Velocity flat History pending Agentic RL Training Apr 20 Code High viability
WISV: Wireless-Informed Semantic Verification for Distributed Speculative Decoding in Device-Edge LLM Inference Build Now
WISV is a wireless-informed semantic verification framework for distributed speculative decoding in device-edge LLM inference, significantly improving accepted length and reducing interaction rounds.
GitHub stars n/a Velocity flat History pending Edge LLM Inference Apr 20 Code High viability
ProtoCLIP: Prototype-Aligned Latent Refinement for Robust Zero-Shot Chest X-Ray Classification Build Now
ProtoCLIP enhances zero-shot chest X-ray classification by refining vision-language models with curated data and prototype alignment.
GitHub stars n/a Velocity flat History pending Medical AI Apr 20 Code High viability
Tight Auditing of Differential Privacy in MST and AIM Watch
A Gaussian Differential Privacy-based auditing framework for synthetic data generators that provides tight audits in the strong-privacy regime.
GitHub stars n/a Velocity flat History pending Differential Privacy Apr 20 Pending
Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play Build Now
STRATAGEM learns transferable reasoning in language models by modulating game self-play with a focus on abstract, domain-agnostic patterns and adaptive development.
GitHub stars n/a Velocity flat History pending Reasoning Transfer Apr 20 Code High viability
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence Build Now
Agent-World is a self-evolving training arena that synthesizes realistic environments and tasks to advance general agent intelligence through continuous learning.
GitHub stars n/a Velocity flat History pending Agents Apr 20 Code High viability
RAVEN: Retrieval-Augmented Vulnerability Exploration Network for Memory Corruption Analysis in User Code and Binary Programs Build Now
RAVEN is a framework using LLM agents and RAG to automatically generate comprehensive vulnerability analysis reports for user code and binary programs.
GitHub stars n/a Velocity flat History pending Cybersecurity AI Apr 20 Code High viability
Sessa: Selective State Space Attention Watch
Sessa introduces a novel decoder architecture that places attention within a feedback path for improved long-context sequence modeling.
GitHub stars n/a Velocity flat History pending Sequence Modeling Apr 20 Pending
Towards Disentangled Preference Optimization Dynamics Beyond Likelihood Displacement Build Now
A plug-and-play reward calibration method that mitigates likelihood displacement in preference optimization for LLMs, improving downstream performance.
GitHub stars n/a Velocity flat History pending LLM Alignment Apr 20 Pending High viability
LeGo-Code: Can Modular Curriculum Learning Advance Complex Code Generation? Insights from Text-to-SQL Build Now
A modular adapter composition strategy for curriculum learning that improves complex code generation by sequentially training tier-specific adapters on incremental complexity levels.
GitHub stars n/a Velocity flat History pending Code Generation Apr 20 Code High viability
ClawEnvKit: Automatic Environment Generation for Claw-Like Agents Build Now
ClawEnvKit automates environment generation for training and evaluating claw-like robotic agents from natural language inputs.
GitHub stars n/a Velocity flat History 1 snapshot AI Tooling Apr 20 Code High viability
LLM Safety From Within: Detecting Harmful Content with Internal Representations Build Now
Safeguarding AI systems by detecting harmful content using internal representations with reduced computational overhead.
GitHub stars n/a Velocity flat History 1 snapshot AI Safety Apr 20 Code High viability
CADMAS-CTX: Contextual Capability Calibration for Multi-Agent Delegation Build Now
CADMAS-CTX calibrates multi-agent delegation by dynamically adjusting agent capabilities based on task context, improving teamwork and reducing misdelegation.
GitHub stars n/a Velocity flat History pending Multi-Agent Systems Apr 20 Code High viability
WebUncertainty: Dual-Level Uncertainty Driven Planning and Reasoning For Autonomous Web Agent Build Now
A novel autonomous web agent framework that tackles dual-level uncertainty in planning and reasoning using adaptive planning and Monte Carlo tree search with uncertainty quantification.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 20 Code High viability
Enhancing Tabular Anomaly Detection via Pseudo-Label-Guided Generation Build Now
PLAG enhances tabular anomaly detection by generating pseudo-anomalies to identify localized feature-level abnormalities, outperforming existing methods.
GitHub stars n/a Velocity flat History pending Tabular AI Apr 20 Code High viability
Revisiting Change VQA in Remote Sensing with Structured and Native Multimodal Qwen Models Build Now
A multimodal model for answering questions about changes in remote sensing images.
GitHub stars n/a Velocity flat History pending Remote Sensing Apr 20 Code High viability
Diversity Collapse in Multi-Agent LLM Systems: Structural Coupling and Collective Failure in Open-Ended Idea Generation Watch
Identifies diversity collapse in multi-agent LLM systems due to structural coupling, impacting open-ended idea generation.
GitHub stars n/a Velocity flat History 1 snapshot Multi-Agent Systems Apr 20 Pending
GeGS-PCR: Effective and Robust 3D Point Cloud Registration with Two-Stage Color-Enhanced Geometric-3DGS Fusion Build Now
A two-stage point cloud registration method that fuses geometric and color information for robust performance in challenging low-overlap scenarios.
GitHub stars n/a Velocity flat History pending 3D Computer Vision Apr 20 Code High viability
An Integrated Deep-Learning Framework for Peptide-Protein Interaction Prediction and Target-Conditioned Peptide Generation with ConGA-PePPI and TC-PepGen Build Now
An integrated AI framework for peptide-protein interaction prediction and target-conditioned peptide generation to accelerate drug discovery.
GitHub stars n/a Velocity flat History pending Biotech AI Apr 20 Code High viability
Toward Zero-Egress Psychiatric AI: On-Device LLM Deployment for Privacy-Preserving Mental Health Decision Support Build Now
A privacy-preserving on-device AI platform for psychiatric decision support, enabling local LLM inference for sensitive mental health data.
GitHub stars n/a Velocity flat History pending On-Device AI Apr 20 Code High viability
Style-Based Neural Architectures for Real-Time Weather Classification Build Now
Style-based neural architectures, including Multi-PatchGAN and Truncated ResNet50 with Gram Matrix and Attention, for real-time weather classification that outperform state-of-the-art.
GitHub stars n/a Velocity flat History pending Image Classification Apr 20 Code High viability
RePrompT: Recurrent Prompt Tuning for Integrating Structured EHR Encoders with Large Language Models Build Now
A time-aware LLM framework that integrates structured EHR data through recurrent prompt tuning for improved clinical prediction.
GitHub stars n/a Velocity flat History pending LLM Applications Apr 20 Code High viability
Aether: Network Validation Using Agentic AI and Digital Twin Build Now
Automate network change validation with agentic AI and a digital twin, reducing manual effort and errors.
GitHub stars n/a Velocity flat History pending Network Operations AI Apr 20 Code High viability
Transition-Matrix Regularization for Next Dialogue Act Prediction in Counselling Conversations Build Now
A regularization technique that incorporates empirical dialogue-flow statistics to significantly improve next dialogue act prediction in counselling conversations.
GitHub stars n/a Velocity flat History pending Dialogue Systems Apr 20 Code High viability
LQM: Linguistically Motivated Multidimensional Quality Metrics for Machine Translation Watch
LQM is a linguistically motivated, multidimensional quality metric framework for machine translation, designed to capture dialect- and culture-specific errors, starting with Arabic.
GitHub stars n/a Velocity flat History pending Machine Translation Apr 20 Pending
AnchorRefine: Synergy-Manipulation Based on Trajectory Anchor and Residual Refinement for Vision-Language-Action Models Build Now
A hierarchical framework for vision-language-action models that separates global trajectory planning from local refinement to improve precision in robotic manipulation.
GitHub stars n/a Velocity flat History pending Robotics Apr 20 Code High viability
SPENCE: A Syntactic Probe for Detecting Contamination in NL2SQL Benchmarks Build Now
A syntactic probing framework to detect and quantify benchmark contamination in NL2SQL datasets, revealing leakage in older benchmarks and validating newer ones.
GitHub stars n/a Velocity flat History pending NL2SQL Apr 20 Code High viability
Long-Text-to-Image Generation via Compositional Prompt Decomposition Build Now
PRISM enables pre-trained text-to-image models to generate images from long descriptive paragraphs by decomposing prompts and merging component-wise predictions.
GitHub stars n/a Velocity flat History pending Generative AI Apr 20 Code High viability
TPS-CalcBench: A Benchmark and Diagnostic Evaluation Framework for LLM Analytical Calculation Competence in Hypersonic Thermal Protection System Engineering Build Now
TPS-CalcBench is a diagnostic benchmark and evaluation framework for LLMs in safety-critical aerospace engineering, focusing on analytical calculation competence and reasoning quality.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 20 Code High viability
MHSafeEval: Role-Aware Interaction-Level Evaluation of Mental Health Safety in Large Language Models Build Now
An agent-based framework for evaluating mental health safety in LLMs by simulating multi-turn counseling interactions and identifying role-dependent harms.
GitHub stars n/a Velocity flat History pending LLM Safety Apr 20 Code High viability
QuantumQA: Enhancing Scientific Reasoning via Physics-Consistent Dataset and Verification-Aware Reinforcement Learning Build Now
A new dataset and reinforcement learning approach for LLMs that ensures scientific accuracy in domains like quantum mechanics, outperforming baselines and offering a parameter-efficient alternative to scaling.
GitHub stars n/a Velocity flat History pending Scientific Reasoning LLMs Apr 20 Code High viability
AJ-Bench: Benchmarking Agent-as-a-Judge for Environment-Aware Evaluation Build Now
AJ-Bench, a benchmark for evaluating Agent-as-a-Judge across search, data systems, and GUIs, demonstrating consistent performance gains over LLM-as-a-Judge baselines.
GitHub stars n/a Velocity flat History pending AI Agents Apr 20 Code High viability
AQPIM: Breaking the PIM Capacity Wall for LLMs with In-Memory Activation Quantization Build Now
A novel PIM-aware activation quantization framework that significantly reduces LLM decoding latency and computational overhead by performing quantization directly within memory.
GitHub stars n/a Velocity flat History pending LLM Optimization Apr 20 Code High viability
Autonomous Unmanned Aircraft Systems for Enhanced Search and Rescue of Drowning Swimmers: Image-Based Localization and Mission Simulation Build Now
An autonomous drone system for drowning swimmer rescue, utilizing YOLO for image-based localization and simulation to demonstrate significant response time reduction.
GitHub stars n/a Velocity flat History pending Search and Rescue Drones Apr 20 Code High viability
Co-evolving Agent Architectures and Interpretable Reasoning for Automated Optimization Build Now
A co-evolutionary framework that evolves agent architectures and reasoning paths for automated optimization, improving performance and interpretability.
GitHub stars n/a Velocity flat History pending AI Agents Apr 20 Code High viability
PARM: Pipeline-Adapted Reward Model Build Now
A pipeline-adapted reward model that aligns LLM rewards with downstream feedback in multi-stage applications, improving output quality and stability.
GitHub stars n/a Velocity flat History pending LLM Pipelines Apr 20 Code High viability
TLoRA: Task-aware Low Rank Adaptation of Large Language Models Build Now
TLoRA is a unified framework for parameter-efficient LLM fine-tuning that jointly optimizes initialization and rank allocation for improved performance across diverse tasks.
GitHub stars n/a Velocity flat History pending LLM Fine-tuning Apr 20 Code High viability
SELF-EMO: Emotional Self-Evolution from Recognition to Consistent Expression Build Now
A self-evolution framework for LLMs that improves emotion recognition and consistent expression through self-play and reinforcement learning, achieving state-of-the-art results.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents Apr 20 Code High viability
Adversarial Humanities Benchmark: Results on Stylistic Robustness in Frontier Model Safety Build Now
A benchmark and dataset for evaluating the stylistic robustness of LLM safety refusals, revealing significant vulnerabilities.
GitHub stars n/a Velocity flat History pending LLM Safety Apr 20 Code High viability
Faster by Design: Interactive Aerodynamics via Neural Surrogates Trained on Expert-Validated CFD Build Now
A neural surrogate model trained on expert-validated CFD data for interactive aerodynamics design, enabling rapid exploration of design spaces in motorsport.
GitHub stars n/a Velocity flat History pending Aerodynamics Apr 20 Code High viability
MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval Build Now
A large-scale, multimodal, and multilingual benchmark for evaluating mathematical reasoning and retrieval in generative and embedding-based systems.
GitHub stars n/a Velocity flat History pending Multimodal Reasoning Benchmark Apr 20 Code High viability
Before You Interpret the Profile: Validity Scaling for LLM Metacognitive Self-Report Build Now
This research introduces a validity scaling framework for LLM metacognitive self-report, identifying construct-level invalid models and providing a portable screening protocol.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 20 Pending High viability
Do LLMs Need to See Everything? A Benchmark and Study of Failures in LLM-driven Smartphone Automation using Screentext vs. Screenshots Build Now
A benchmark and failure analysis for LLM-driven smartphone automation, revealing insights into multimodal vs. text-only inputs and common agent errors.
GitHub stars n/a Velocity flat History pending Agents Apr 20 Code High viability
PDDL-Mind: Large Language Models are Capable on Belief Reasoning with Reliable State Tracking Build Now
A neuro-symbolic framework that improves LLM belief reasoning for theory-of-mind tasks by explicitly tracking environment states using PDDL.
GitHub stars n/a Velocity flat History pending Agents Apr 20 Code High viability
Class-specific diffusion models improve military object detection in a low-data domain Build Now
Leverage class-specific diffusion models and structural guidance to significantly improve military object detection in low-data scenarios, offering an alternative to traditional simulation pipelines.
GitHub stars n/a Velocity flat History pending Generative AI for Computer Vision Apr 20 Code High viability
Latent Abstraction for Retrieval-Augmented Generation Build Now
A unified framework for RAG that performs encoding, retrieval, and generation entirely within an LLM's latent space, improving efficiency and accuracy.
GitHub stars n/a Velocity flat History 1 snapshot Retrieval Augmented Generation Apr 20 Code High viability
WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models Watch
A multimodal benchmark for evaluating LLMs in end-to-end web coding, including generation, editing, and repair.
GitHub stars n/a Velocity flat History pending LLM Agents Apr 20 Code
Forget What Matters, Keep the Rest: Selective Unlearning of Informative Tokens Build Now
A novel unlearning method for LLMs that selectively removes informative tokens based on predictive entropy, preserving model utility while mitigating adversarial behaviors.
GitHub stars n/a Velocity flat History pending LLM Unlearning Apr 20 Code High viability
Voronoi-guided Bilateral 2D Gaussian Splatting for Arbitrary-Scale Hyperspectral Image Super-Resolution Build Now
A Gaussian-Splatting framework for arbitrary-scale hyperspectral image super-resolution, improving spatial and spectral fidelity.
GitHub stars n/a Velocity flat History pending Computer Vision Apr 20 Code High viability
When Vision-Language Models Judge Without Seeing: Exposing Informativeness Bias Build Now
A new paradigm for vision-language model evaluation that corrects for informativeness bias, leading to more reliable judgments.
GitHub stars n/a Velocity flat History pending Vision-Language Models Apr 20 Code High viability
ContraPrompt: Contrastive Prompt Optimization via Dyadic Reasoning Trace Analysis Build Now
ContraPrompt optimizes LLM prompts by analyzing the reasoning differences between successful and failed attempts, significantly improving performance on benchmarks.
GitHub stars n/a Velocity flat History pending LLM Optimization Apr 20 Code High viability
SafeAnchor: Preventing Cumulative Safety Erosion in Continual Domain Adaptation of Large Language Models Build Now
A framework to prevent safety alignment erosion in LLMs during continual domain adaptation, outperforming baselines significantly.
GitHub stars n/a Velocity flat History pending LLM Safety Apr 20 Code High viability
Adversarial Arena: Crowdsourcing Data Generation through Interactive Competition Build Now
Adversarial Arena crowdsources high-quality, diverse conversational data for LLM training through interactive competition, demonstrating significant improvements in secure code generation.
GitHub stars n/a Velocity flat History pending LLM Data Generation Apr 20 Code High viability
AI Approach for MRI-only Full-Spine Vertebral Segmentation and 3D Reconstruction in Paediatric Scoliosis Build Now
An AI framework enables radiation-free 3D spine deformity assessment from MRI alone, automating segmentation and reconstruction for pediatric scoliosis care.
GitHub stars n/a Velocity flat History pending Medical AI Apr 20 Code High viability
ExAI5G: A Logic-Based Explainable AI Framework for Intrusion Detection in 5G Networks Build Now
ExAI5G integrates Transformer-based deep learning with logic-based XAI to provide a trustworthy and effective intrusion detection system for 5G networks with high accuracy and transparent reasoning.
GitHub stars n/a Velocity flat History pending Explainable AI for Cybersecurity Apr 20 Code High viability
First, Do No Harm (With LLMs): Mitigating Racial Bias via Agentic Workflows Build Now
An agentic workflow that leverages retrieval to mitigate racial bias in LLM-generated medical cases and differential diagnoses.
GitHub stars n/a Velocity flat History pending Medical AI Bias Mitigation Apr 20 Code High viability
Bridging the Reasoning Gap in Vietnamese with Small Language Models via Test-Time Scaling Build Now
This paper demonstrates that Test-Time Scaling with Supervised Fine-Tuning significantly bridges the reasoning gap in Vietnamese Small Language Models for elementary mathematics, outperforming complex agentic workflows.
GitHub stars n/a Velocity flat History pending Small Language Models Apr 20 Code High viability
Prompting Foundation Models for Zero-Shot Ship Instance Segmentation in SAR Imagery Build Now
Leveraging foundation models for zero-shot ship instance segmentation in SAR imagery by using a detector to prompt a segmentation model, eliminating pixel-level annotation needs.
GitHub stars n/a Velocity flat History pending SAR Image Analysis Apr 20 Code High viability
Randomly Initialized Networks Can Learn from Peer-to-Peer Consensus Build Now
Exploring self-distillation in randomly initialized networks for improved learning.
GitHub stars n/a Velocity flat History pending Self-Supervised Learning Apr 20 Code High viability
WorldDB: A Vector Graph-of-Worlds Memory Engine with Ontology-Aware Write-Time Reconciliation Watch
A novel vector graph-of-worlds memory engine for long-running agentic systems that significantly improves accuracy and temporal reasoning.
GitHub stars n/a Velocity flat History 1 snapshot AI Memory Systems Apr 20 High viability
A Generalized Synthetic Control Method for Baseline Estimation in Demand Response Services Build Now
A generalized synthetic control method for more accurate baseline estimation in demand response services, outperforming existing benchmarks.
GitHub stars n/a Velocity flat History pending Causal Inference for Energy Apr 20 Code High viability
Bounded Ratio Reinforcement Learning Watch
Bounded Ratio Reinforcement Learning (BRRL) offers a new framework for policy optimization that ensures monotonic performance improvement and outperforms PPO in empirical evaluations.
GitHub stars n/a Velocity flat History pending Reinforcement Learning Apr 20 Code
Benchmarking System Dynamics AI Assistants: Cloud Versus Local LLMs on CLD Extraction and Discussion Build Now
A systematic evaluation of cloud vs. local LLMs for System Dynamics AI assistance, providing insights into performance trade-offs and practical deployment guides.
GitHub stars n/a Velocity flat History pending LLM Benchmarking Apr 20 Code High viability
AIT Academy: Cultivating the Complete Agent with a Confucian Three-Domain Curriculum Watch
A curriculum framework for AI agents that organizes capability development across three domains (Science/Tech, Humanities, Social Science) to cultivate complete, well-rounded agents.
GitHub stars n/a Velocity flat History pending LLM Agents Apr 20 Code
Learning from AVA: Early Lessons from a Curated and Trustworthy Generative AI for Policy and Development Research Watch
An AI platform for policy and development experts that provides evidence-based syntheses with verifiable citations and reasoned abstention, saving users significant weekly hours.
Agents Apr 20 High viability
Using large language models for embodied planning introduces systematic safety risks Watch
A benchmark and analysis revealing systematic safety risks in using large language models for robotic planning.
GitHub stars n/a Velocity flat History pending Robotics Agents Apr 20 Code
Learning the Riccati solution operator for time-varying LQR via Deep Operator Networks Watch
A DeepONet framework that learns a surrogate operator for Riccati equations, enabling fast online evaluation of optimal feedbacks for Linear Quadratic Regulator problems.
GitHub stars n/a Velocity flat History pending Optimal Control Apr 20 Code
Mix and Match: Context Pairing for Scalable Topic-Controlled Educational Summarisation Watch
A data augmentation strategy for training small language models to perform topic-controlled educational summarization, achieving competitive performance with fewer parameters.
GitHub stars n/a Velocity flat History pending Educational Summarization Apr 20 Code
Multilingual Training and Evaluation Resources for Vision-Language Models Watch
A comprehensive suite of multilingual resources for training and evaluating Vision-Language Models across five European languages, demonstrating consistent benefits for non-English benchmarks.
GitHub stars n/a Velocity flat History pending Vision-Language Models Apr 20 Code
Modular Representation Compression: Adapting LLMs for Efficient and Effective Recommendations Watch
This paper proposes a modular representation compression technique for LLMs that improves recommendation efficiency and effectiveness, achieving a lift in online A/B tests.
Recommendation Systems Apr 20 High viability
IceBreaker for Conversational Agents: Breaking the First-Message Barrier with Personalized Starters Watch
IceBreaker is a system that generates personalized conversation starters for AI agents, improving user engagement and overcoming the initial message barrier.
Conversational AI Apr 20 High viability
Latent Preference Modeling for Cross-Session Personalized Tool Calling Watch
A benchmark and memory-augmented method to improve LLM agent tool-calling accuracy by modeling evolving user preferences across sessions.
GitHub stars n/a Velocity flat History pending Agents Apr 20 Code
Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering Watch
Latent Phase-Shift Rollback (LPSR) is an inference-time technique that corrects unrecoverable reasoning errors in LLMs by monitoring residual streams and steering the KV-cache, without fine-tuning.
GitHub stars n/a Velocity flat History pending LLM Inference Apr 20 Code
RASP-Tuner: Retrieval-Augmented Soft Prompts for Context-Aware Black-Box Optimization in Non-Stationary Environments Watch
RASP-Tuner uses retrieval-augmented soft prompts for efficient context-aware black-box optimization in non-stationary environments.
GitHub stars n/a Velocity flat History pending Black-Box Optimization Apr 20 Code
Bayesian Active Learning with Gaussian Processes Guided by LLM Relevance Scoring for Dense Passage Retrieval Watch
A Bayesian active learning framework that uses Gaussian Processes guided by LLM relevance scoring to improve dense passage retrieval efficiency and effectiveness.
GitHub stars n/a Velocity flat History pending Information Retrieval Apr 20 Code
AdaCluster: Adaptive Query-Key Clustering for Sparse Attention in Video Generation Build Now
AdaCluster is a training-free framework that accelerates video diffusion transformers with adaptive query-key clustering, achieving significant speedups with negligible quality loss.
GitHub stars n/a Velocity flat History pending Video Generation Apr 20 Code High viability
Progressive Online Video Understanding with Evidence-Aligned Timing and Transparent Decisions Build Now
A framework for real-time video understanding that aligns responses with evidence and provides transparent decision-making.
GitHub stars n/a Velocity flat History pending Video Understanding Apr 20 Code High viability
Is SAM3 ready for pathology segmentation? Watch
Evaluate the capabilities of SAM3 for pathology segmentation to understand its limitations and guide domain adaptation.
GitHub stars n/a Velocity flat History pending Medical AI Apr 20 Code
A novel LSTM music generator based on the fractional time-frequency feature extraction Watch
An AI music generator that uses fractional Fourier transform for feature extraction and LSTM networks for generating high-quality music comparable to human compositions.
GitHub stars n/a Velocity flat History pending Generative Audio Apr 20 Code
STaD: Scaffolded Task Design for Identifying Compositional Skill Gaps in LLMs Watch
A Scaffolded Task Design framework systematically identifies compositional skill gaps in LLMs by generating controlled task variations.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 20 Code
Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks Ignore
A framework for analyzing LLM failures using contrastive attribution, with code available for further research.
GitHub stars n/a Velocity flat History pending LLM Interpretability Apr 20 Code
Document-as-Image Representations Fall Short for Scientific Retrieval Watch
A new benchmark and analysis showing that text-based representations outperform image-based ones for scientific document retrieval, even for figure-based queries.
GitHub stars n/a Velocity flat History pending Document Retrieval Apr 20 Code
A multimodal and temporal foundation model for virtual patient representations at healthcare system scale Ignore
Develop a multimodal AI model for healthcare integrating patient data across systems for improved clinical decision-making.
GitHub stars n/a Velocity flat History 1 snapshot Healthcare AI Apr 20 Code
Learning from Less: Measuring the Effectiveness of RLVR in Low Data and Compute Regimes Ignore
This research explores effective RLVR fine-tuning strategies for small language models in low-data environments, demonstrating improved sample efficiency through procedural datasets.
GitHub stars n/a Velocity flat History pending LLM Fine-tuning Apr 20 Code
A Control Architecture for Training-Free Memory Use Ignore
A training-free memory control architecture improves LLM arithmetic reasoning by intelligently applying retrieved information.
GitHub stars n/a Velocity flat History pending LLM Reasoning Apr 20 Code
SPREG: Structured Plan Repair with Entropy-Guided Test-Time Intervention for Large Language Model Reasoning Watch
A lightweight inference-time framework for LLMs that uses entropy gating to dynamically repair logical errors during long-chain reasoning.
GitHub stars n/a Velocity flat History pending LLM Reasoning Apr 20 Code
Committed SAE-Feature Traces for Audited-Session Substitution Detection in Hosted LLMs Watch
A commit-open protocol uses sparse autoencoder features to detect silent model substitutions in hosted LLMs, offering robust auditing with minimal overhead.
LLM Security Apr 20
Periodic Steady-State Control of a Handkerchief-Spinning Task Using a Parallel Anti-Parallelogram Tendon-driven Wrist Ignore
A novel tendon-driven wrist and hierarchical control scheme enable precise, high-speed spinning of flexible objects like handkerchiefs.
GitHub stars n/a Velocity flat History pending Robotics Control Apr 20 Code
Beyond Reproduction: A Paired-Task Framework for Assessing LLM Comprehension and Creativity in Literary Translation Ignore
A framework for evaluating LLM literary translation that disentangles comprehension from creativity, revealing significant gaps in current models.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 20 Code
Training and Agentic Inference Strategies for LLM-based Manim Animation Generation Ignore
A novel training and inference pipeline for LLM-based Manim animation generation that improves code and visual outputs, outperforming GPT-4.1.
Generative Video Apr 20
MM-JudgeBias: A Benchmark for Evaluating Compositional Biases in MLLM-as-a-Judge Ignore
A benchmark to identify and mitigate compositional biases in multimodal large language models used for automated evaluation.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 20 Code
Can Explicit Physical Feasibility Benefit VLA Learning? An Empirical Study Ignore
An empirical study investigating the benefits of explicit physical feasibility supervision for improving Vision-Language-Action (VLA) model learning in robotics.
GitHub stars n/a Velocity flat History pending Robotics Apr 20 Code
Implicit neural representations as a coordinate-based framework for continuous environmental field reconstruction from sparse ecological observations Ignore
Implicit neural representations offer a coordinate-based framework for continuous environmental field reconstruction from sparse ecological data, showing stable and predictable computational characteristics.
GitHub stars n/a Velocity flat History pending Environmental Field Reconstruction Apr 20 Code
Different Paths to Harmful Compliance: Behavioral Side Effects and Mechanistic Divergence Across LLM Jailbreaks Ignore
An analysis of different methods for jailbreaking LLMs, revealing distinct behavioral and mechanistic properties across various unsafe routes.
GitHub stars n/a Velocity flat History pending LLM Jailbreaking Analysis Apr 20 Code
Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale Ignore
This research challenges the notion of cross-modal representational convergence, suggesting that models trained on different modalities learn distinct, rather than shared, representations of reality.
GitHub stars n/a Velocity flat History pending Multimodal AI Apr 20 Code
Copy-as-Decode: Grammar-Constrained Parallel Prefill for LLM Editing Ignore
Accelerate LLM text and code editing by recasting generation as structured decoding over a copy-and-generate grammar, significantly reducing regeneration time.
LLM Editing Apr 20
CAPO: Counterfactual Credit Assignment in Sequential Cooperative Teams Ignore
CAPO is a critic-free policy-gradient algorithm for sequential cooperative teams that derives a per-agent learning signal to improve individual learnability.
Multi-Agent Reinforcement Learning Apr 20
Towards Intelligent Legal Document Analysis: CNN-Driven Classification of Case Law Texts Ignore
A lightweight CNN framework for high-accuracy, fast classification of legal case law texts.
Legal AI Apr 20
Semantic Entanglement in Vector-Based Retrieval: A Formal Framework and Context-Conditioned Disentanglement Pipeline for Agentic RAG Systems Watch
A pipeline to disentangle semantic entanglement in vector embeddings for improved retrieval precision in RAG systems.
RAG Systems Apr 20
On the Reliability of Computer Use Agents Ignore
This research investigates the sources of unreliability in computer-use agents, highlighting the need for repeated execution evaluation and ambiguity resolution.
AI Agents Apr 20
Ranking Abuse via Strategic Pairwise Data Perturbations Ignore
This research identifies vulnerabilities in pairwise ranking systems to strategic data manipulation, proposing an attack method to highlight the need for more robust aggregation techniques.
GitHub stars n/a Velocity flat History pending AI Safety & Robustness Apr 20 Code
When Can LLMs Learn to Reason with Weak Supervision? Ignore
This paper investigates when Large Language Models can learn to reason effectively under weak supervision by analyzing training dynamics and identifying key properties for generalization.
LLM Reasoning Apr 20
Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration Ignore
Trains LLM agents with an intrinsic meta-evolution capability to spontaneously learn about unseen environments without external rewards or human guidance.
Agents Apr 20
Agentic Forecasting using Sequential Bayesian Updating of Linguistic Beliefs Ignore
Develop an AI-based forecasting tool using Bayesian updating for improved forecasting in various domains.
GitHub stars n/a Velocity flat History 1 snapshot AI Forecasting Apr 20 Code
On the Importance and Evaluation of Narrativity in Natural Language AI Explanations Ignore
Proposes new metrics to evaluate the narrativity of AI explanations, aiming to improve their understandability beyond feature importance lists.
GitHub stars n/a Velocity flat History pending Explainable AI Apr 20 Code
Dissecting AI Trading: Behavioral Finance and Market Bubbles Ignore
This study analyzes LLM agents in simulated asset markets, revealing behavioral patterns and the impact of prompt interventions on market bubbles.
AI Trading Apr 20
Latent Fourier Transform Ignore
Latent Fourier Transform provides frequency-domain controls for generative music models, enabling manipulation of musical patterns by timescale for variations and blends.
Generative Audio Apr 20
Physics-Informed Causal MDPs for Sequential Constraint Repair in Engineering Simulation Pipelines Ignore
A framework for constrained reinforcement learning in engineering simulations that uses causal identification and physics-guided estimation to improve constraint repair success rates.
GitHub stars n/a Velocity flat History pending Reinforcement Learning Apr 20 Code
From Fallback to Frontline: When Can LLMs be Superior Annotators of Human Perspectives? Ignore
This work challenges the presumption that LLMs are fallback annotators, showing they can outperform humans in estimating aggregate subgroup opinions on subjective tasks under specific conditions.
LLM Applications Apr 20
LEPO: \underline{L}atent R\underline{e}asoning \underline{P}olicy \underline{O}ptimization for Large Language~Models Ignore
A novel framework for applying reinforcement learning directly to continuous latent representations in LLMs to enhance reasoning diversity and performance.
LLM Optimization Apr 20
Semantic-based Distributed Learning for Diverse and Discriminative Representations Ignore
A novel distributed learning framework that ensures diverse and discriminative representations by decoupling global optimization and leveraging semantic information.
Distributed Learning Apr 20
HEALing Entropy Collapse: Enhancing Exploration in Few-Shot RLVR via Hybrid-Domain Entropy Dynamics Alignment Ignore
A framework to improve few-shot reinforcement learning with verifiable rewards by aligning entropy dynamics between target and general domains.
Few-Shot RL Apr 20
State Transfer Reveals Reuse in Controlled Routing Ignore
This research explores how prompt-based interventions in LLMs reveal where behaviorally relevant state is represented, distinguishing fixed-interface reuse from prompt relocation.
LLM Interpretability Apr 20
Understanding Secret Leakage Risks in Code LLMs: A Tokenization Perspective Ignore
Investigates how BPE tokenization in code LLMs leads to secret leakage through a 'gibberish bias', impacting cybersecurity risks.
LLM Security Apr 20
Understanding Human Actions through the Lens of Executable Models Ignore
Introduces a domain-specific language (EXACT) to represent human motions as underspecified motion programs for zero-shot policy inference and compositional modeling.
Human Action Understanding Apr 20
Symbolic Synthesis for LTLf+ Obligations Ignore
A theoretical framework for symbolic synthesis of obligation properties in LTLfp, demonstrating efficiency comparable to LTLf synthesis.
Formal Methods Apr 20
The Topological Dual of a Dataset: A Logic-to-Topology Encoding for AlphaGeometry-Style Data Ignore
A theoretical framework for bridging logic, topology, and neural processing to improve mechanistic interpretability in neuro-symbolic AI.
GitHub stars n/a Velocity flat History pending Neuro-Symbolic AI Apr 20 Code
Learning to Correct: Calibrated Reinforcement Learning for Multi-Attempt Chain-of-Thought Ignore
A calibrated reinforcement learning approach for multi-attempt chain-of-thought reasoning that optimizes verification success by carefully weighting individual attempts.
Reasoning Models Apr 20
Architectural Design Decisions in AI Agent Harnesses Ignore
Analyzes architectural design decisions in 70 publicly available AI agent systems to identify recurring patterns and provide guidance for framework designers.
AI Agents Apr 20
The Collaboration Gap in Human-AI Work Ignore
A conceptual framework analyzing the fragility of human-AI collaboration, identifying grounding conditions and interaction structures that lead to breakdowns.
Human-AI Collaboration Apr 20
Community-Led AI Integration for Wildfire Risk Assessment: A Participatory AI Literacy and Explainability Integration (PALEI) Framework in Los Angeles, CA Ignore
A community-led framework for integrating AI into wildfire risk assessment, focusing on literacy and explainability.
AI for Climate Risk Apr 20
Depth Registers Unlock W4A4 on SwiGLU: A Reader/Generator Decomposition Ignore
A theoretical exploration of post-training quantization techniques for language models, focusing on understanding error origins rather than product development.
LLM Quantization Apr 20
Party Autonomy in Determining the Law Applicable to Non-contractual Obligations concerning Cross-Border Data Transfers Ignore
Discusses party autonomy in determining applicable law for cross-border data transfer liabilities in the context of cloud computing and AI.
GitHub stars n/a Velocity flat History pending Legal AI Apr 20 Code
AlphaContext: An Evolutionary Tree-based Psychometric Context Generator for Creativity Assessment Ignore
A context generator for assessing creativity using evolutionary algorithms.
Creativity Assessment Apr 20
Six Llamas: Comparative Religious Ethics Through LoRA-Adapted Language Models Ignore
A study on ethical reasoning patterns in language models fine-tuned on religious texts.
Ethics in AI Apr 20
Does "Do Differentiable Simulators Give Better Policy Gradients?'' Give Better Policy Gradients? Ignore
This paper investigates how to improve policy gradient reinforcement learning by addressing discontinuities in simulators and controlling variance in gradient estimators.
Reinforcement Learning Apr 20
A Sugeno Integral View of Binarized Neural Network Inference Ignore
This paper establishes a theoretical connection between binarized neural networks and Sugeno integrals, offering a new mathematical framework for understanding neuron activation and input interactions.
AI Theory Apr 20
Multi-Agent Systems: From Classical Paradigms to Large Foundation Model-Enabled Futures Ignore
A survey comparing classical multi-agent systems with those enabled by large foundation models, outlining future research directions.
Multi-Agent Systems Apr 20
Concurrent Criterion Validation of a Validity Screen for LLM Confidence Signals via Selective Prediction Ignore
A validity screen for LLM confidence signals demonstrates its ability to predict selective prediction performance across various models.
LLM Evaluation Apr 20
Prompt Optimization Enables Stable Algorithmic Collusion in LLM Agents Ignore
Prompt optimization can lead to emergent and stable algorithmic collusion in LLM agents participating in market simulations.
AI Agents Apr 20
How Much Cache Does Reasoning Need? Depth-Cache Tradeoffs in KV-Compressed Transformers Ignore
This paper theoretically analyzes the trade-offs between KV cache compression and Transformer reasoning depth, identifying bandwidth barriers and adaptive vs. oblivious error scaling.
LLM Inference Optimization Apr 20
The implicated scientist: on the role of AI researchers in the development of weapons systems Ignore
This paper examines the ethical implications of AI researchers' involvement in the development of weapons systems and explores avenues for solidarity with victims.
AI Ethics Apr 20
On the Emergence of Syntax by Means of Local Interaction Ignore
A minimal neural cellular automaton spontaneously develops a structured internal representation akin to syntactic processing.
AI Theory Apr 20
Polysemantic Experts, Monosemantic Paths: Routing as Control in MoEs Ignore
A parameter-free decomposition for Mixture-of-Experts models that separates control signals from content channels to enable compositional specialization across layers.
LLM Training Apr 20