How can query-aware performance-cost control in AI infrastructure optimize LLM memory usage?Answer not yet generated.