How can query-aware performance-cost control in AI infrastructure optimize LLM runtime memory usage?Answer not yet generated.