On Implementation of Dynamic Programming for Optimal Control Problems with Final State Constraints

Résumé — À propos de l’implémentation de la programmation dynamique pour des problèmes de contrôle optimal avec des contraintes sur l’état ﬁnal — Cette publication présente certains problèmes concernant l’implémentation de la programmation dynamique pour le contrôle optimal d’un modèle dynamique scalaire, comme par exemple la gestion énergétique d’un véhicule hybride électrique. Une étude sur la résolution de l’espace d’état discrétisé souligne le besoin d’une implémen-tation minutieuse. Une nouvelle méthode qui permet de traiter des problèmes numériques d’une façon adéquate est présentée. Cette méthode permet particulièrement de résoudre des problèmes numériques engendrés par de forts gradients dans la fonction coût optimale. Ces gradients se situent surtout aux bornes de l’ensemble d’états atteignables. La méthode proposée améliore non seulement la précision de l’optimum global, mais permet aussi de réduire la résolution de l’espace d’état en conservant la précision. Le nombre de calculs nécessaire pour évaluer l’optimum global est ainsi considérablement Abstract — On Implementation of Dynamic Programming for Optimal Control Problems with Final State Constraints — In this paper we present issues related to the implementation of dynamic programming for optimal control of a one-dimensional dynamic model, such as the hybrid electric vehicle energy management problem. A study on the resolution of the discretized state space emphasizes the need for careful implementation. A new method is presented to treat numerical issues appropriately. In particular, the method deals with numerical problems that arise due to high gradients in the optimal cost-to-go function. These gradients mainly occur on the border of the feasible state region. The proposed method not only enhances the accuracy of the ﬁnal global optimum but also allows for a reduction of the state-space resolution with maintained accuracy. The latter substantially reduces the computational e ﬀ ort to calculate the global optimum. This allows for further applications of dynamic programming for hybrid electric vehicles such as extensive parameter studies.


INTRODUCTION
Developing and optimizing hybrid electric powertrains include several tasks, such as energy management strategy design and component dimensioning. Both when designing the energy management strategy and when dimensioning the powertrain components it is a clear advantage if the global optimal fuel consumption is known for a given configuration and drive cycle. Knowing the global optimal fuel consumption allows a proposed control system to be evaluated with respect to the global optimum. This evaluation must be carried out for the same final state-of-charge using the proposed controller and the global optimal control. As a result, the optimization problem includes final state constraints. Furthermore, hybrid vehicles inherently include constraints on the battery state-of-charge since a too high or too low stateof-charge can damage the battery. What is more, the control signal which decides the power split between electric and thermal driving also includes constraints due to power limits of the two power sources. The optimal control problem for deciding the power split in a hybrid electric vehicle therefore includes final state constraints, state constraints, and control signal constraints.
For the energy management strategy design, and also for powertrain dimensioning, the hybrid vehicle model can be simplified, with maintained accuracy, and reduced to having the battery state-of-charge as the only dynamic state variable [1].
To calculate the optimal control signal trajectory for a hybrid vehicle model with a single state variable, including state and input constraints, most studies [2,3] use the dynamic programming algorithm [4,5]. The theory behind dynamic programming as a tool for calculating the optimal control is relatively simple. However, numerical problems arise when implementing the algorithm. In general, if these problems are not treated, numerical errors can have a major impact on the final result. This paper deals with numerical problems that arise due to high gradients in the optimal costto-go function which occur on the boundary of the feasible state region. One method for handling state constraints when using dynamic programming is to apply a penalty scheme for infeasible states [6,7]. This paper presents a new method for including final state constraints. The proposed method is compared to using a penalty scheme for infeasible states when using dynamic programming for two different dynamic systems. First, the issues are emphasized for the well-known optimal control problem of a fishery based on a Lotka-Volterra system [8]. The proposed method is compared to the penalty scheme and to the analytic optimal solution of the Lotka-Volterra fishery problem. Secondly, numerical issues are investigated that are associated with the dynamic programming algorithm for solving the optimal control problem of a parallel hybrid electric vehicle on a given drive cycle. The numerical errors in the hybrid vehicle case are then evaluated for the proposed method and are compared to the penalty scheme.
The proposed method improves the dynamic programming only if the optimal state trajectory is close to the bounds of the feasible region at some points. However, this is typically the case for constrained optimal control problems such as the hybrid electric vehicle problem.

OPTIMAL CONTROL PROBLEM
In this paper a special class of optimal control problems is studied, namely problems with fixed final time and a partially constrained final state. Furthermore, the considered problems are assumed to include state constraints and input constraints. What is more, the dynamic systems in this study include only a single state variable, and the disturbances are assumed to be perfectly known. In summary, this problem can be written as an optimal control problem min s.t.
is the cost functional.

DYNAMIC PROGRAMMING
This section gives a brief overview of the deterministic dynamic programming algorithm [4], which throughout this study is referred to as dynamic programming (DP). Since dynamic programming is a numerical algorithm used here to solve a continuous control problem, the continuoustime model (2) must be discretized. Let the discrete-time model be given by with the state variable x k ∈X k and the control signal u k ∈U k . Furthermore, assume that the disturbance is perfectly known in advance and at every time instance k.

Basic Algorithm
Let π = {µ 0 , µ 1 , ...,µ N−1 } be a control policy. Further let the discretized cost of Equation (7) using π with the initial state is the final cost. The first term g N (x N ) corresponds to the final cost in Equation (7). The second term is the additional penalty function φ N (x N )forcing a partially constrained final state (4). The function (7). The state constraints (5) are enforced by the penalty function φ k (x k )f o r k = 0, 1, ...,N − 1.
The optimal control policy π o is the policy that mini- where Π is the set of all admissible policies. Based on the principle of optimality [4], dynamic programming is the algorithm which evaluates the optimal costto-go (1) function J k (x i ) at every node in the discretized state-time space (2) by proceeding backward in time: 1. End cost calculation step 2. Intermediate calculation step for k = N − 1to0 The optimal control is given by the argument that minimizes the right-hand side of Equation (12) for each x i at time index k of the discretized state-time space.
The cost-to-go function J k+1 (x) used in Equation (12) is evaluated only on discretized points in the state space. Furthermore, the output of the model function F k (x i , u k )isa continuous variable in the state space which can be between the nodes of the state grid. Consequently, the last term in Equation (12), namely J k+1 (F k (x i , u k )) must be evaluated appropriately. There exist several methods of finding the appropriate cost-to-go function J k+1 (F k (x i , u k )) such as (1) The terms cost-to-go and optimal cost-to-go are used equivalently throughout this paper referring to optimal cost-to-go. It is important to note that the term optimal is used in the sense of optimality achievable despite the numeric errors. (2) The following notation is used: x i k denotes the state variable x in the discretized state-time space at the node with time-index k and stateindex i, while x k denotes a (state-) continuous state variable at time k. using a nearest-neighbor approximation or using an interpolation scheme. Throughout this study, linear interpolation of the cost-to-go function J k+1 is used to account for the problem of the discretized state space.
The output of the algorithm (11-12) is an optimal control signal map. This map is used to find the optimal control signal during a forward simulation of the model (8), starting from a given initial state x 0 , to generate the optimal state trajectory. In the map the control signal is only given for the discrete points in the state-space grid. The control signal is therefore interpolated when the actual state does not coincide with the points in the state grid.

Numerical Issues on the Boundary Line
When implementing the algorithm numerical errors must be considered and minimized. One issue to consider is the definition of the cost function for infeasible states and inputs. Infeasible states and inputs are of course infinitely expensive and should therefore have infinite cost φ k (x i X k ) →∞ for k = 1, ...,N since the defined objectives (such as final state constraints and model limitations) cannot be achieved. When using infinite cost for such states, some substantial numerical errors occur due to the discretization of time and state space.
Define the set of reachable states Ω i k over one time-step by using all admissible inputs and starting at a given state Consider the grid point/time step domain in Figure 1 (bottom) and the fact that the DP algorithm is calculating the cost-to-go for the state x i at time k + 1. If an infinite cost was used for infeasible states together with a linear interpolation, the feasible part of Ω i k+1 would use an interpolation between an infinite cost-to-go J k+2 (x i ) and a finite cost-togo J k+2 (x i+1 ). As a result, the cost-to-go for x i at time k + 1 becomes infinite, i.e., J k+1 (x i ) →∞, although the grid point {k + 1, i} lies perfectly within the feasible domain. Now consider the algorithm at time k and the step of calculating the cost-to-go for the state x i . For the same reason as for the time k + 1, the cost-to-go J k (x i ) will be infinite since J k+1 (x i ) was calculated before to be infinite. When these effects continue and the algorithm proceeds backwards in time, the calculated infeasible region will grow into the actual feasible region.
A first step to tackle this problem is to use a big, but finite value for the cost instead of infinity φ k (x i X k ) = J ∞ for k = 1, ...,N. This big, finite value J ∞ must be bigger than the maximum value of the cost-to-go function J k (x i ). Using a finite cost value for infeasible domains improves the solution, but the effect shown above for infinity cannot be completely eliminated close to the boundary line. Throughout this paper, the method of using a finite cost value J ∞ for infeasible domains together with the algorithm in Section 2.1 is referred to as basic DP. Due to the interpolation between feasible and infeasible states, the infinite gradient at the boundary line is being blurred. This is shown in Figure 2 for the fishing problem (introduced later), where the dashed line is the cost-to-go computed by DP with a finite cost for infeasible states, i.e., the basic DP method. The solid line corresponds to the costto-go from DP improved by the new method introduced in this paper. As a result of the blurred cost-to-go function, the optimal state trajectory cannot approach the boundary line since the computed cost-to-go near the boundary line is too high. Figure 3 shows the corresponding state trajectory (dashed) being deviated by this effect. The solid line is the state trajectory from DP improved by the new method. Section of the cost-to-go function J k (x) at time index k such that t = 180 h for the fishing problem. State-space discretization is ∆x = 10, penalty for infeasible states is set to

Boundary-line DP
Basic DP x 0

BOUNDARY-LINE METHOD
The method presented in this paper tackles the problem of a blurred gradient at the boundary line due to interpolation of the cost-to-go between a feasible and an infeasible stategrid point. Therefore, the boundary line between feasible and infeasible regions must be found. This is shown in the first part of this section. The second part shows a simple, yet powerful method to improve the DP by accounting for this boundary line. This improved DP is referred to as boundaryline DP.
Throughout this section, Equation (8) is reformulated as

Computation of the Boundary Line
There exist infeasible regions in the state-time space of an optimization problem with fixed final time and a partially constrained final state if the state dynamics are bounded. Since the dynamic system is assumed to be onedimensional, there exist only two infeasible regions, namely an upper and a lower region. This is depicted in Figure 1.
In this section, the lower boundary line between the feasible and the infeasible region is derived. The upper boundary line is found analogously. The partially constrained final state is given by Equation (4). The lower boundary line is defined as the lowest state x k,low at each time instance k that allows achieving the minimal final state x f,min . Note that the lower boundary line is only discretized in time, i.e., it is continuous in the state variable. The lower boundary line can be evaluated by sequentially going backward in time from k = N −1tok = 0 and solving the following optimization problem at each time instance k s.t.
The problem is initialized with x N,low = x f,min . At each timestep, u k and x k,low are the only unknowns, while x k+1,low is a parameter at time k. By solving Equation (17) for x k,low and inserting it in Equation (16) the following, more direct problem is obtained If the state is assumed to be unconstrained, i.e., (23) is omitted, the following formulation is equivalent Equation (24) is a so-called fixed point problem (x = f (x)), where x k,low is the unknown. The lower boundary line is finally found by the following algorithm: 1. Initialize with the lower bound of the partially constrained final state x k,low = x f,min . 2. Proceed backward in time for k = N − 1, ..., 0: (a) solve the fixed point problem (24) without state constraints as shown below in (25-27); (b) check wether the solution found respects the state constraints; (c) if the constraints are not respected, solve the general problem (20-23); (d) store the solution x k,low with the respective minimizer u k,low and the cost-to-go J k,low . The fixed point problem (24) of time step k without state constraints can be solved with the following algorithm (3) : 2. Iteration over j until a specified tolerance is achieved: Note that the algorithm mentioned above (25-27) finds the limit value x k,low in the first iteration if the update function f k is independent of the state variable x k .

Interpolation Near the Boundary Line
It is assumed that the state boundary lines x k,low (and x k,high ) shown in Figure 1 with their corresponding cost-to-go J k,low (and J k,high ) along the boundary line have been calculated prior to the DP algorithm. Therefore, when the set Ω i k contains the boundary it is possible to interpolate between the exact boundary and a feasible state grid point, as illustrated in Figure 4 with the solid and the dashed lines. The dotted line illustrates the interpolation by the basic algorithm at the boundary between feasible and infeasible regions.
Consider the DP algorithm to evaluate the cost-to-go for the state-grid point x i at time k + 1( s e eFig. 1, bottom). Starting from state x i , the state achieved at the end of this time-step can reach the feasible as well as the infeasible region. The corresponding cost-to-go J k+2 (x k+2 ) is evaluated by interpolation between J k+2 (x i+1 )andJ k+2,low if the state x k+2 is (3) The top right index of x is the iteration index, here. It is not the index of the state-grid as used in the rest of the paper. above or on the boundary line x k+2,low . Otherwise, the costto-go is set to infinity or to the big, finite value J ∞ .T h i s procedure allows maintaining the same accuracy close to the boundary line as achieved within the feasible domain. The application of the optimal control signal map in the forward simulation which is mentioned in Section 2.1 is improved by the boundary line analogously. Since the control signal on the boundary line was evaluated before, interpolation of the control signal is carried out between the grid points of the feasible domain or between the feasible domain and the boundary line.

EXAMPLE 1: SIMPLE DYNAMIC MODEL
This section studies a well-known optimal control problem, namely the optimal fishing in a Lotka-Volterra fish population. The fishing problem is chosen because it has an analytic solution.

Continuous-Time Problem
The continuous-time dynamic Lotka-Volterra system iṡ where the state variable x(t) is the amount of fish in a lake, the control signal u(t) is the fishing rate. The control signal 10]. For the considered system the state The objective is to maximize the amount of fish caught, which is equivalent to minimizing within a fixed time t f while the minimal amount of fish in the population at the final time must be x f,min = 750. This can be stated as the optimal control problem min s.t.
x(0) = 250 (34) The solution to this optimal control problem is straightforward to determine and consists of three parts: First, there is no fishing to let the population grow, then there is fishing such that the population is kept constant, then there is no fishing again to let the population grow to the final condition. The optimal control expressed in time is where t a = t b = 100 · artanh 1 2 (40) The final maximum amount of fish caught is

Discrete-Time Problem
In order to evaluate the optimal solution by means of dynamic programming, the continuous-time state dynamics (29) must be discretized. Using an Euler forward approximation with a time step t s = 0.2 h, the discrete-time model is The state x k is the amount of fish in a lake, while the control signal u k is the constant fishing rate during one time step. The discrete-time optimal control problem is s.t.
As mentioned in Section 2.2, use of a big, but finite value J ∞ to penalize infeasible states improves the numerics. This value should be chosen as small as possible, but larger than any value of the (feasible) cost-to-go that could occur. Since this simple example allows for analytic solutions, the maximum cost-to-go of the continuous-time problem is evaluated in order to choose a suitable value for J ∞ . The minimum of the cost-to-go J t (x) is obviously at t = 0andx = 1000 and yields J t=0 (x = 1000) = 500 artanh For the fishing problem, at t = 0, the minimum cost-togo (50) is approximately 1118 less than the cost-to-go at The infeasible states at t f must therefore be penalized by a value larger than 1118 to ensure that the final state constraint is met. Consequently, the penalty J ∞ used for the cost-to-go of infeasible states is set to a value greater than J ∞ > 1118. Fortheexamplehereavalueof is chosen and is used in the DP algorithm. The output of the dynamic programming algorithm is an optimal control signal map, specifying the optimal control signal at each time step k and each state x k ∈X k .T h e optimal control signal map for the Lotka-Volterra system is shown in Figure 5. It shows that in the beginning of the problem the optimal control is "not fishing" (u = 0) if the fish population is small (x < 500), "moderate fishing" (u = 5) The optimal control signal map, determined using dynamic programming, for the discrete-time Lotka-Volterra system. The optimal state trajectory for x 0 = 250 when using the map is shown as the solid black line.
if the population is x = 500 and "full fishing" (u = 10) if the population is large (x > 500). Toward the end of the problem, one must stop fishing as late as possible, such that the population reaches the specified minimum final size of x f,min = 750. The resulting optimal state trajectory, i.e.,the fish population for an initial state of x 0 = 250 is shown as the black solid line. The solution of the dynamic programming clearly reflects the optimal control found for the continuous problem (39).

Resolution Study
As mentioned earlier, the accuracy of the solution obtained with dynamic programming can degrade due to numeric issues. The state space must be discretized for the DP algorithm. The resolution of the state-space discretization is a critical quantity. On one hand, the computational effort increases with a higher resolution. On the other hand, the accuracy of the solution improves with increasing resolution.
Therefore, a study is carried out here to quantify the accuracy of the solution obtained by DP for the simple example of the fishing problem. The fishing problem has been chosen because an analytic solution exists that can be used as a benchmark. The resolution study is carried out for the basic DP, but also for the new method presented in this paper, i.e., the boundary-line DP.
The quality of the solution is expressed as the relative difference between optimal cost obtained by DP and the State-space discretization ∆x (-) Rel. cost deviation The relative deviation of the cost computed by dynamic programming compared to the optimal analytic solution for the fishing problem.
analytic optimal solution, Figure 6 shows this deviation of the optimal solution evaluated with DP (basic and boundary-line) from the analytic optimal solution.
Since the analytic solution is evaluated for the original continuous-time problem, the discrete-time solution can never achieve the analytic optimal solution. This discretization error is indicated in Figure 6 with the dotted line marked as "minimum time-discretization error". It emphasizes that the solution using the boundary-line DP converges well toward the discrete-time optimum. Furthermore, this figure illustrates the importance of the boundary line: the numeric solution with the boundary-line DP is closer to the analytic solution by a factor of 50 to 68 000 than with the basic DP for the same resolution. It is interesting to note that the cost (44) resulting from boundary-line DP is inferior to the cost resulting from basic DP over the entire range of resolution that was investigated. The solution using the boundaryline DP at the lowest resolution (∆x = 125) is closer to the analytic solution than the solution of the basic DP at the highest resolution (∆x = 1).
The relative deviation of the final state achieved by the DP (basic and boundary-line) from the optimal final state is showninFigure7fordifferent resolutions. The optimal final state is the lowest admissible final state for this example, i.e., x o (t f ) = x f,min . The figure shows clearly that the final state deviation of the basic DP decreases with decreasing ∆x, i.e., with increasing resolution. Using the boundaryline DP, the final state deviation is negligible over the entire range of resolutions investigated here. The relative deviation of the actual final state and the optimal final state for the fishing problem.

Computational Effort
The computational effort of an optimization method is often a crucial factor that determines whether a method is being applied in practice for a given problem or not. Therefore, not only the accuracy of a solution as shown in Section 4.3 is relevant, but also the corresponding computational cost. The number of model-function evaluations for the basic DP with an equally spaced grid is given by This is only true for a single-dimensional state space and a scalar control signal. The variable N x represents the number of grid points for the state space, N u for the control signal, and N for the time discretization. When using the boundary-line DP, the infeasible domain is well known. Consequently, the computation for the grid points in this infeasible domain (see Fig. 1) can be omitted [2]. The number of infeasible grid points at a time step k is denoted as N in f eas k,x . Hence, the number of function evaluations that can be saved are The cost for evaluating the boundary line cannot be neglected. The number of function evaluations needed to compute the line is denoted by N line feval . Consequently, the number of function evaluations required for solving the DP with the boundary-line method is given by Number of function evaluations needed to find the solution of the fishing problem. Figure 8 shows the number of function evaluations for the fishing problem over the discretization step ∆x.Its ho w s that more computations can be saved due to the infeasible domain than are required to evaluate the boundary line. The boundary-line DP requires fewer function evaluations by a factor of 1.5 to 1.85 than the basic DP over the entire range of discretization steps investigated here. Furthermore, it should be recalled that the accuracy of the solution is considerably higher, even though the computational burden is lower.
More interesting is a comparison of solutions of similar accuracy. Therefore, the solution using the basic DP at the lowest discretization step of ∆x = 1 is compared to the solution of the boundary-line DP at its highest discretization step of ∆x = 125. The values obtained are shown in Table 1. These results reveal that the boundary-line DP is computationally more efficient by a factor of 1 002 001 000 4 877 143 ≈ 205 than the basic DP, even though the accuracy of the solution is still better (449.340 > 445.936). This result motivates to apply the method to more complex systems.

EXAMPLE 2: HYBRID ELECTRIC VEHICLE
This section studies the control problem of deciding the power split between the two power converters in a full parallel electric hybrid vehicle throughout a given drive cycle.
A hybrid vehicle includes state-of-charge constraints and normally also a final state-of-charge constraint to ensure a charge-sustaining solution. As a result, the dynamic programming solution is sensitive to the method used to account for final state constraints. The basic DP is compared to the proposed boundary-line DP for the optimal control problem of the hybrid vehicle. The focus of this section is not on modeling, but rather on handling the state constraints appropriately when implementing dynamic programming. Readers interested in detailed equations of the model are referred to [9].

Discrete-Time Problem
The full parallel hybrid electric vehicle model is a quasistatic, discrete-time model. The modeling follows the theories described in [1,10]. Essentially, the model contains the battery state-of-charge as the only state variable which is sufficient for energy management considerations. To summarize the model, the combustion engine is modeled using an affine Willans approximation, the electric motor is modeled using an electric-power map (derived from detailed simulations), and the battery is modeled as a voltage source together with a resistance in series. The vehicle model includes air drag, rolling friction, and inertial forces. The gearbox is modeled using a constant efficiency of 95%. The hybrid vehicle considered in this study has a 20% hybridization as defined in [9]. The model equations can be summarized and described as where x k is the battery state-of-charge, u k is the torque split factor, v k is the vehicle speed, a k is the vehicle acceleration, and i k is the gear number. The model assumes isothermal conditions, no extra fuel consumption during the startup of the combustion engine, and no energy losses during gear shifting. A constant auxiliary electric power demand of 350 W is used in the model. Since the drive cycle is assumed to be known in advance, the particular driving speed v k , acceleration a k , and gear number i k at instance k can be included in the model function to form the time-variant model The optimization problem of minimizing the total fuel mass consumed Output of the dynamic programming algorithm. The optimal input map for a full HEV driving the NEDC and the state-ofcharge trajectory (black) when using the optimal control signal map.
for the hybrid vehicle over a given drive cycle, here the New European Driving Cycle (NEDC), can be stated as the discrete-time optimal control problem min s.t.
where ∆m f is the fuel mass consumption at each time step. The time step in this example is t s = 1 s. Throughout this section the optimal control problem (59-64) is studied and solved using dynamic programming for the NEDC. In Figure 9 the optimal torque split is shown at each state-ofcharge over the duration of the drive cycle for the hybrid vehicle. Similarly to the fishing problem, the optimal stateof-charge trajectory is close to the boundary of the feasible state region at the end of the problem. As mentioned in Section 2.2 for the fishing problem, the cost-to-go function is blurred when a finite value for J ∞ is used together with an interpolation. For the hybrid vehicle problem, the blurring effect is similar to that of the fishing Section of the cost-to-go function J k (x) at time index k such that t = 1100 s for the HEV-problem. State-space discretization is ∆x = 2.15 × 10 −3 , penalty for infeasible states is set to J ∞ = 10 4 . problem when a finite cost for infeasible states is used. Figure 10 shows the cost-to-go at t = 1100 s when the basic DP and the boundary-line DP are used. Since the blurring effect is dependent on the state-space resolution, the next section investigates this dependency for the basic DP and on the boundary-line DP.

Resolution Study
Dynamic programming is often used to calculate the optimal fuel consumption for hybrid vehicles. The optimal fuel consumption is then used, for example, to compare different causal energy management strategies or component dimensions. However, such comparisons rely on the accuracy of the dynamic programming solution. In general, a higher resolution of the state space improves the accuracy of the solution. This section studies different state-space resolutions for the hybrid vehicle problem using the basic DP and the boundary-line DP. Figure 11 shows the fuel consumption calculated with the basic DP and the boundary-line DP. The fuel consumption is proportional to the total cost J, which is the fuel mass consumption. In contrast to the simple model in Section 4, no analytic solution to the hybrid vehicle problem is known. Consequently, the absolute value of the fuel consumption is shown instead of the relative deviation from the optimum. Figure 11 shows that the fuel consumption is considerably lower when using the boundary-line DP compared to the basic DP, especially for large state-space discretizations ∆x. Fuel consumption of the HEV obtained using DP for different state-space discretizations.
State-space discretization ∆x (-) Rel. final state deviation The deviation of the actual final state and the optimal final state for the HEV problem.
of the basic DP, is insensitive to the state-space discretization. Figure 12 shows the relative deviation of the final state achieved by the DP algorithm from the optimal final state x o (t f ) = x f,min . The behavior of the fuel consumption shown in Figure 12 resembles the behavior of the relative deviation of the final state in Figure 11. This is a consequence of the fact that any final state deviation is accumulated electric energy that could have been used to save fuel during the drive cycle. For low resolutions of the state space, the blurring effect of the cost-to-go substantially affects the final State-space discretization ∆x (-)  state-of-charge when using the basic DP. For a state-space discretization of ∆x = 0.01 the error is greater than 16%. However, when increasing the state-space resolution, the final state approaches the lowest admissible final state x f,min , and the fuel consumption decreases. In general, for the hybrid vehicle problem, the errors in the final state-of-charge and in the total cost originate from two main problems. First, the blurring effect unrealistically increases the cost-to-go and therefore increases the final state-of-charge. Second, the state-space discretization must include enough points to capture the nonlinearities in the cost-to-go, which both methods suffer from. However, for the hybrid vehicle problem, the error associated with the blurring effect is by far the largest error source. Using the boundary-line method eliminates this blurring effect and is therefore an appropriate method for increasing the accuracy of the final solution.

Computational Effort
The accuracy of the final solution described in the previous section must be put in relation to the computational effort of the particular discretization. This section shows the relationship between the accuracy of the solution and the computational effort, expressed as model function evaluations, similarly to those described in Section 4.4. Figure 13 shows the number of function evaluations for the hybrid vehicle problem over the discretization step ∆x. The difference between the boundary-line DP and the basic DP is due to the removal of infeasible states in the state space for the boundary-line DP. However, this effect is not as large as in the fishing problem since it depends on the problem length and on the boundary line. When comparing the boundary-line DP and the basic DP at a similar accuracy the difference in computational efforts becomes clear, as shown in Table 2. For the basic DP using ∆x = 10 −4 the total cost of the HEV is 0.3790 kg fuel mass consumed. Furthermore, the boundary-line method achieves a similar total cost of 0.3788 kg using ∆x = 1.3 × 10 −3 . Consequently, the number of model function evaluations used for the boundary-line DP is computationally 74 364 780 5 578 935 ≈ 13 times more efficient than using the basic DP.
Similarly to the fishing problem, the biggest reduction in computational effort is due to the possibility of reducing the state-space discretization while maintaining the accuracy of the solution.

CONCLUSIONS
The boundary-line DP presented in this paper improves the efficiency of dynamic programming considerably for the case of a single state variable. Not only the final state constraint is fulfilled at high accuracy, but also the calculated optimal cost is very close to the true solution even for a moderate resolution of the discretized state space. The computational cost is substantially reduced since the new method allows the accuracy to be maintained at a much lower statespace resolution. Furthermore, the number of calculated grid points is reduced by evaluating the feasible domain only. The method presented here improves the dynamic programming only if the optimal state trajectory is close to the bounds of the feasible region at some points. This is typically the case for constrained optimal control problems.
The new method is well suited for optimal control of hybrid electric vehicles because the optimal trajectory coincides with the boundary line at the end of most drive cycles and because the system can be sufficiently well described with a single state variable. The resulting gain in computational time can be used for extensive parametric studies, for instance, such as optimal component dimensioning without increasing the total computational effort.
In this study, the proposed method is applied to partially constrained optimal control problems where only the lower state boundary line is determined. However, the proposed method can easily be applied to constrained optimal control problems where a lower and an upper state boundary line have to be determined.
Future work includes investigations of a possible extension of the proposed method to include multi-dimensional discrete optimal control problems.