Daily AI Research Rundown

Key insights from the latest papers on AI advancements.

February 13, 2026•7 min read

ScienceToStartup Editorial

Good morning. Today’s set has a clear vibe: make the training signal match where the model actually operates, and test “reasoning” in a way that can’t be faked by vibes and priors. Also: hierarchical RL gets a realism upgrade—because unimodal Gaussian policies are a polite lie in long-horizon tasks.

In today's rundown

Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling
GENIUS: Generative Fluid Intelligence Evaluation Suite
Data-Efficient Hierarchical Goal-Conditioned Reinforcement Learning via Normalizing Flows

⚙️ AI Alignment

Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling

Training Curves (ReFL on SD3.5-M). We optimize with either HPSv3 or DiNa-LRM (Ours) as the proxy reward. We report the optimized proxy score (right) and an external held-out golden metric (PickScore; left).DiNa-LRM improves the proxy score faster while the golden metric increases in tandem.

The Rundown

DiNa-LRM trains a reward model directly on noisy diffusion states so you’re not paying VLM tax and not forcing a latent generator to optimize against a pixel-space judge.

The details

Targets preference learning on diffusion timesteps (“noisy diffusion states”), not decoded images.
Uses a noise-calibrated Thurstone likelihood with uncertainty tied to diffusion noise level.
Built on a pretrained latent diffusion backbone with a timestep-conditioned reward head.
Adds inference-time noise ensembling as a diffusion-native test-time scaling knob.
Claims: competitive with SOTA VLM rewards at a fraction of compute, and stronger than diffusion-based reward baselines on alignment benchmarks.

Why it matters

Apr 3