We develop a 2nd-order Hessian-free optimization method for training deep neural networks, and without using pre-training, obtain results superior to those reported by Hinton & Salakhutdinov (2006) on deep auto-encoders. Our method is practical, easy to use, scales nicely to very large datasets and isn't limited just to auto-encoders. We also discuss issues of ``curvature" as a possible explanation for the difficulty of deep-learning and how our method effectively deals with it.
Download PDF