oO(ML Discuss)
Talking about ICML 2010
Should one compute the Temporal Difference fix point or minimize the Bellman Residual ?
by Bruno Scherrer , at ICML 2010
We consider the question of the choice of the projection method, for evaluating a linear approximation of the value function of a Markov Decision Process policy. We review the two most-known methods, the Temporal Difference (TD) fix-point computation and the Bellman Residual (BR) minimization. We describe two new Examples, where each of the methods is superior to the other. We provide an overview of the available analytical results, which we extend: we describe a clear relation between the two methods, a new sufficient condition and bound for the stability of TD, a simple and natural performance bound for the BR method involving the condition number of the Bellman Equation, and a unified rewriting of the recent results of Yu & Bertsekas (2009). These analytical arguments, along with some extensive simulations, suggest that the TD solution is usually slightly better than the BR solution. However, the TD method, which contrary to the BR method, does not have some performance guarantees, is also more risky: its inherent numerical instability makes it very bad in some cases, and thus worse on average.
Download PDF
blog comments powered by Disqus