Fall 2022

UC Berkeley

An advanced course dealing with deep networks in the fields of computer vision, language technology, robotics, and control. It delves into the themes of deep learning, model families, and real-world applications. A strong mathematical background in calculus, linear algebra, probability, optimization, and statistical learning is necessary.

Deep Networks have revolutionized computer vision, language technology, robotics and control. They have a growing impact in many other areas of science and engineering, and increasingly, on commerce and society. They do not however, follow any currently known compact set of theoretical principles. In Yann Lecun's words they require "an interplay between intuitive insights, theoretical modeling, practical implementations, empirical studies, and scientific analyses." This is a fancy way of saying “we don’t understand this stuff nearly well enough, but we have no choice but to muddle through anyway.” This course attempts to cover that ground and show you how to muddle through even as we aspire to do more.

This is a graduate-level/advanced undergraduate course about a particular approach to information processing using (simulated) analog circuits where the desired circuit behavior is tuned via optimization involving data since we have no idea how to do hand-tuning at scale. Probabilistic frames are useful to understand what is going on, as well as how we navigate certain design choices. Overall, we expect students to have a strong mathematical background in calculus, linear algebra, probability, optimization, and statistical learning. Berkeley undergraduate courses that can help build maturity include:

**Calculus**: Math 53 (note: Math 1B or AP Math is not enough)**Linear Algebra and Optimization**: EECS 16B and EECS 127/227A is ideal, but EECS 16B alone might be enough if students have complete mastery of that material. Math 110 is also helpful. (note: Math 54 or EECS 16A is required as a minimum, but are not nearly enough.)**Probability**: EECS 126, Stat 134, or Stat 140 (note: CS 70 is required at a minimum, but might not be enough for everyone)**Statistical Learning**: CS 189/289A or Stat 154 (note: Data 102 is insufficient, even when combined with Data 100.)

*Math 53 and EECS 126 and EECS 127 and CS 189 is the recommended background.*

Prerequisites are not enforced for enrollment, but we encourage you to consider taking some of the classes listed above and save this course for a future semester if you feel shaky on the fundamentals.

The course assumes familiarity with programming in a high-level language with data structures. Homeworks and projects will typically use Python. We encourage you to check out this tutorial if you haven’t used it before. Students who have taken Berkeley courses like CS 61A and CS 61B are well-prepared for the programming components of the class.

We do not have the staff bandwidth to help students with material that they should have understood before taking this course. If you choose to proceed with this course, you are accepting full responsibility to teach yourself anything in your background that you are missing. We will not be slowing down to accommodate you, and questions pertaining to background material will always have the lowest priority in all course forums.

The goal is to teach a principled course in Deep Learning that serves the diverse needs of our students while also codifying the present understanding of the field. Topics covered may include, but are not limited to:

- Underlying themes of deep learning, including building beyond underlying machine learning concepts like supervised vs unsupervised learning, regression and classification, training/validation/testing, distribution shifts, regularization, the fundamental underlying tradeoffs;
- Defining and training neural networks: features, computation graphs, backpropagation, iterative optimization (SGD, Newton’s Method, Momentum, RMSProp, AdaGrad, Adam), strategies for training (explicit and implicit regularization, batch and layer normalization, weight initialization, gradient clipping, ensembles, dropout), hyperparameter tuning
- Families of contemporary models: fully connected networks, convolutional nets, graph neural nets, recurrent neural nets, transformers
- Problems that utilize neural networks: computer vision, natural language processing, generative models, and others.
- Conducting experiments in a systematic, repeatable way, leveraging and presenting data from experiments to reason about network behavior.

No data

Attention (machine learning)AutoencodersComputer VisionConvolutional neural network (CNN)Fine-tuningGenerative ModelsGraph neural network (GNN)Long Short-Term Memory (LSTM)Meta-learningRecurrent neural network (RNN)Self-supervisionSequence-to-sequence (Seq2Seq)Transfer learningTransformer (machine learning model)