This talk will give a brief introduction of deep learning models and the energy they consume for training and inference. We then discuss what methods currently exist for handling their complexity, and how neural network parameter counts could grow by orders of magnitude, despite the end of Moore's law.
Declared dead numerous times, the hype around deep learning is bigger than ever. With Large Language Models and Diffusion Models becoming a commodity, we ask the question of how bad their energy consumption *really* is, what we can do about it, and how it is possible to run cutting-edge language models on off-the-shelf GPUs.
We will look at the various ways that people have come up with to rein in the hunger for resources of deep learning models, and why we still struggle to keep up with the demands of modern neural network model architectures. From low-bitwidth integer representation, through pruning of redundant connections and using a large network to teach a small one, all the way to quickly adapting existing models using low-rank adaptation.
This talk aims to give the audience an estimation of the amount of energy modern machine learning models consume to allow for more informed decisions around their usage and regulations. In the second part, we discuss the most common techniques used for running modern architectures on commodity hardware, outside of data centers. Hopefully, deeper insights into these methods will help improve experimentation with and access to deep learning models.
This Talk was translated into multiple languages. The files available for download contain all languages as separate audio-tracks. Most desktop video players allow you to choose between them.
Please look for "audio tracks" in your desktop video player.