Most existing approaches for non-parametric kernel learning (NPKL) suffer from expensive computation, which would limit their applications to large-scale problems. To address the scalability problem of NPKL, we propose a novel algorithm called BCDNPKL, which is very efficient and scalable. Superior to most existing approaches, BCDNPKL keeps away from semidefinite programming (SDP) and eigen-decomposition, which benefits from two findings: 1) The original SDP framework of NPKL can be reduced into a far smaller-sized counterpart which is corresponding to the sub-kernel (referred to as boundary kernel) learning; 2) The sub-kernel learning can be efficiently solved by using the proposed block coordinate descent (BCD) technique. We provide a formal proof of global convergence for the proposed BCDNPKL algorithm. The extensive experiments verify the scalability and effectiveness of BCDNPKL, compared with the state-of-the-art algorithms.