Lab Home | Phone | Search | ||||||||
|
||||||||
Modern machine learning applications require large models, lots of data, and complicated optimization. I will discuss scaling machine learning by decomposing learning problems into simpler sub-problems. This decomposition allows us to trade off accuracy, computational complexity, and potential for parallelization, where a small sacrifice in one can mean a big gain in another. Moreover, we can tailor our decomposition to our model and data in order to optimize these trade-offs. I will present two examples. First, I will discuss parallel optimization for regression, where the goal is to model or predict a label given many other measurements. Our Shotgun algorithm parallelizes coordinate descent, a seemingly sequential method. Shotgun theoretically achieves near-linear speedups and empirically is one of the fastest methods for multicore sparse regression. Second, I will discuss parameter learning for Probabilistic Graphical Models, a powerful class of models of probability distributions. In both examples, our analysis provides strong theoretical guarantees which guide our very practical implementations. Host: Reid Porter |