This equation captures one of the core mathematical components of the system. π½(π) = πΌπβΌπ where a trajectory π= (π0, π΄0, π1, π΄1, β¦ ) follows policy π. The optimal policy πβmaximizes this objective π
Page and bbox are available; crop image is pending.