# Exact Deterministic Dynamic Programming

**Setup**

$x_{k+1} = f(x_k, u_k)$ for $k = 0, 1, …, N - 1$

where

- $k$ is the time index
- $x_k$ is the state of the system (what separates/makes independent the past from the future)
- $u_k$ is the action or decision or control selected at time $k$ from a set $U_k(x_k)$
- $f_k$ describes the dynamics of the system
- N is the finite time horizon

The problem also has a cost function $g_k(x_k, u_k)$ associated with it, which is the cost of going from $x_k$ to $x_{k+1}$, and the terminal cost $g_N(x_N)$ of the last state.

The total cost $J(x_0)$, which we sometimes write as $J_{u_0, \dots, u_{N_1}}(x_0)$ or $J(x_0; u_0, \dots, u_{N_1})$ or $J(x_0, u_0, \dots, u_{N_1})$, is:

$J(x_0) = g_N(x_N) + \sum_{k=0}^{N-1} g_k(x_k, u_k)$

We want to choose $u_k$ so as to achieve the minimum cost $J^{*} (x_0)$:

$J^{*} (x_0) = \min_{u_k \in U_k(x_k)} J(x_0; u_0, \dots, u_{N_1})$

**Shortest path view**

The setup above can be completely described as a shortest weighed path problem on a graph. We set up the graph with an artificial terminal node:

**Principle of optimality**

The tail of an optimal sequence is optimal for the tail subproblem.

“For an auto travel analogy, suppose that the fastest route from Los Angeles to Boston passes through Chicago. The principle of optimality translates to the obvious fact that the Chicago to Boston portion of the route is also the fastest route for a trip that starts from Chicago and ends in Boston”

The proof is an example of the Extremal Principle (Engels, Problem Solving Strategies).

**Example: Deterministic Scheduling Problem**

Figure 1.1.6 in Chapter 1 of Bertsekas.

N = 3 (initial state starts at $k = 0$)

$x_N = x_3 \in $ {ABC, ACB, ACD, CAB, CAD, CDA}

[DP algorithm: Start with: $J_N^{*} (x_N) = g_N (x_N)$ for all $x_N$]

$J_3^{*} (ABC) = 6, J_3^{*} (ACB) = 1, J_3^{*} (ACD) = 3, J_3^{*} (CAB) = 1, J_3^{*} (CAD) = 3, J_3^{*} (CDA) = 2$

[DP algorithm: And for k = N-1, … 0, let $J_k^{*} (x_k) = \min_{u_k \in U_k(x_k)} \left[ g_k(x_k, u_k) + J^{*}_{k+1} (f_k(x_k, u_k)) \right]$ for all $x_k$]

[Note that $J^{*}_{k+1} (f_k(x_k, u_k))$ is not a constant. And this calculation is not just for one path along the graph. It’s a calculation for every state/node in the graph.]

$k = N - 1$

$x_{N-1} = x_2 \in $ {AB, AC, CA, CD}

$J_2^{*} (AB) = \min_{[ABC]} \left[ g_2(x_2, u_2) + J^{*}_{3} (f_2(x_2, u_2)) \right] = \left[ g_2(AB, ABC) + J^{*}_{3} (ABC) \right] = 3 + 6 = 9$

$J_2^{*} (AC) = \min_{[ACB, ACD]} \left[ g_2(x_2, u_2) + J^{*}_{3} (f_2(x_2, u_2)) \right] = \min \left[ g_2(AC, ACB) + J^{*}_{3} (ACB), g_2(AC, ACD) + J^{*}_{3} (ACD)\right] = 5$

$\cdots$

Now we have $J_k^{*}(x_k)$ for all $x_k$, i.e., $J_0^{*} (x_0), J_1^{*} (A), J_1^{*} (C), J_2^{*} (AB), J_2^{*}(AC), J_2^{*}(CA), J_2^{*}(CD), J_3^{*}(ABC), \dots, J_3^{*}(CDA)$

**DP algorithm**

Start at $k = N$. Label all the terminal nodes (the set of all possible $x_N$) with the terminal cost at that node.

Proceed backwards.

$k = N - 1$. Label all the nodes (the set of all possible $x_{N-1}$) with the optimal cost if you started at that node, i.e., the mimimum over all edges of the sum of the cost of the edge to the terminal node and the label of the terminal node it connects to. And then do the same for $k = N - 2$ and so on all the way back to $k = 0$.

**Construction of optimal control sequence**

After the DP algorithm / labeling of the graph, we have $J_k^{*}(x_k)$ for all $x_k$. Reverse the process. Start from the initial state and choose the action with the minimum sum of the edge weight and label of the node it travels to.

**Sources**