AI Learns Physics, Understands Spectra, and Writes Better SQL | ScienceToStartup
AI Learns Physics, Understands Spectra, and Writes Better SQL
Generative video gets physical, scientific images get understood, and SQL queries get reliable
May 1, 2026•4 min read
ScienceToStartup Editorial
This week's AI research pushes the boundaries of generative models and specialized understanding. Researchers are teaching AI to grasp physical laws for more realistic video, enabling machines to interpret complex scientific spectra, and building systems that generate more accurate SQL queries from natural language. These developments offer significant potential for startups in simulation, scientific research, and data analysis.
Use This Via API or MCP
Use this article as a reusable operator context layer
Pillar articles explain the operator narrative around the same proof surfaces your agents can access directly. Use them for context, then drop into REST, MCP, Signal Canvas, or the benchmark and dataset routes for machine-readable execution.
Visualizing physical properties in generative video.
The Rundown
Modern generative video models often falter when it comes to physical consistency. Objects might drift unnaturally, collisions lack realistic rebound physics, and material responses rarely match their real-world counterparts. Anthropic's PhyCo framework tackles this head-on by injecting continuous, interpretable, and physically grounded control into video generation. The system leverages a massive dataset of over 100,000 photorealistic simulation videos where parameters like friction, restitution, and deformation are systematically varied. PhyCo then fine-tunes a pretrained diffusion model using a ControlNet conditioned on pixel-aligned physical property maps. Crucially, it employs VLM-guided reward optimization, where a vision-language model evaluates generated videos against targeted physics queries, providing differentiable feedback. This approach allows the generative model to produce physically consistent and controllable outputs based on variations in physical attributes—no simulator or geometry reconstruction is needed at inference time. PhyCo significantly improves physical realism on the Physics-IQ benchmark and human studies confirm clearer control over physical attributes, marking a significant step toward more believable and controllable AI-generated video content.
The details
PhyCo integrates a dataset of over 100,000 photorealistic simulation videos with systematically varied physical parameters.
It uses physics-supervised fine-tuning of a diffusion model via a ControlNet conditioned on physical property maps.
VLM-guided reward optimization provides feedback on physics queries, enhancing control.
PhyCo achieves significant improvements in physical realism on the Physics-IQ benchmark compared to baselines.
Human studies confirm clearer and more faithful control over physical attributes in generated videos.
Why it matters
For startups in gaming, VFX, or simulation, PhyCo offers a path to generating more realistic and controllable visual assets. This reduces the need for manual physics simulation and opens doors for faster content creation and more immersive experiences.
Example of spectral data and associated questions.
The Rundown
Spectra, dense forms of scientific imagery, pose a significant challenge for current multimodal large language models (MLLMs) due to their unstructured and domain-specific nature. The new SpecVQA benchmark aims to evaluate these models on their ability to understand scientific spectral data. Covering seven representative spectrum types, SpecVQA features expert-annotated question-answer pairs curated from peer-reviewed literature. It targets both direct information extraction and domain-specific reasoning, comprising 620 figures and 3,100 QA pairs. To manage token length while preserving essential curve characteristics, the researchers propose a spectral data sampling and interpolation reconstruction approach, which shows substantial performance improvements on the benchmark. Prominent MLLMs were tested, and a leaderboard is presented. This work is a critical step toward enhancing spectral understanding in multimodal models, suggesting directions for extending visual-language models to broader scientific research and data analysis applications.
The details
SpecVQA is a benchmark for evaluating multimodal models on scientific spectral understanding.
It covers 7 representative spectrum types with expert-annotated question-answer pairs from peer-reviewed literature.
The benchmark includes 620 figures and 3,100 QA pairs for direct information extraction and domain-specific reasoning.
A spectral data sampling and interpolation reconstruction approach is proposed to reduce token length while preserving curve characteristics.
Large language models (LLMs) have advanced text-to-SQL generation, but real-world deployment faces hurdles like inconsistent accuracy and invalid SQL generation, especially with complex or unseen schemas. Template Constrained Decoding (TeCoD) addresses these issues by exploiting query pattern recurrence in labeled workloads. TeCoD converts historical natural language-SQL pairs into reusable templates. Its robust template selection module uses a fine-tuned natural language inference model to efficiently match or reject queries. Once a template is selected, TeCoD enforces it during SQL generation via grammar-constrained decoding, using a novel partitioned strategy for syntactic validity and efficiency. This system achieves up to 36% higher execution accuracy than in-context learning (ICL) and reduces latency by 2.2x on matched queries. TeCoD's approach offers a more reliable path for integrating LLMs into data analysis workflows, particularly for recurring query patterns.
The details
TeCoD addresses inconsistent accuracy and invalid SQL generation in text-to-SQL tasks.
It converts historical NL-SQL pairs into reusable templates for efficient query matching.
A fine-tuned NLI model is used for robust template selection, matching or rejecting queries.
Grammar-constrained decoding enforces selected templates during SQL generation.
TeCoD achieves up to 36% higher execution accuracy than ICL and 2.2x lower latency on matched queries.
Community AI Usage
Every newsletter, we showcase how a reader is using AI to work smarter, save time, or make life easier.
Community Spotlight in 💬
“I'm a freelance graphic designer, and I've been experimenting with AI tools to speed up my workflow. I recently used Midjourney to generate a series of concept art pieces for a client's new product launch. I gave it very specific prompts about the product's aesthetic and target audience, and within minutes, I had over 50 unique visual directions. The client loved the variety and the speed at which I could present these ideas. It saved me hours of manual sketching and brainstorming, allowing me to focus on refining the best concepts.”
Performance improvements were observed using this reconstruction approach on the SpecVQA benchmark.
Why it matters
Startups in scientific research, materials science, or drug discovery can leverage SpecVQA to benchmark their multimodal AI tools. Improved spectral understanding can accelerate hypothesis generation, data analysis, and the discovery of new materials or compounds.
Why it matters
For startups building data analytics platforms or offering BI tools, TeCoD offers a significant improvement in reliability and efficiency for text-to-SQL capabilities. This can lower the barrier to entry for non-technical users wanting to query structured data, driving adoption and value.