Stochastic gradient descent (SGD)

Stochastic Optimization

Core-sets and importance sampling

VC dimension

Inverse Problems

Noncommutative Chernoff bounds

Rademacher complexity

Learning representations

Online Learning

Graph

Metric embedding

Kaczmarz algorithm

Matrix approximation

Restricted isometries

Randomized algorithm

Diagonally dominant systems

Compressed Sensing

Random Matrices

Covering numbers

Statistical complexity

CS 294 - The Mathematics of Information and Data UC Berkeley Fall 2013 This course investigates the mathematical principles behind data and information analysis. It brings together concepts from statistics, optimization, and computer science, with a focus on large deviation inequalities, and convex analysis. It's tailored towards advanced graduate students who wish to incorporate these theories into their research. This course will explore the foundations of an emerging discipline: the mathematics of information and data. Through recent and classic texts in mathematical statistics, optimization, and computer science, we will find unifying themes in these three disciplinary approaches. We will draw connections between how we analyze running time, statistical accuracy, and implementation of data-driven computations. We will focus in particular on large deviation inequalities, convex analysis and their applications in minimax statistics; sparse and stochastic optimization; and discrete and convex geometry. This course is ideal for advanced graduate students who would like to apply these theoretical and algorithmic developments to their own research. The current list of topics (which will change depending on the course we chart) is:

1. Stochastic Optimization
    - stochastic gradients, online learning, and the Kaczmarz algorithm
    - core sets and importance sampling
    - randomized algorithms for linear systems
1. Random Matrices
    - Elementary analysis of random matrices
    - Graph sparsification, frames, and matrix approximation
    - Noncommutative Chernoff Bounds
1. Average Case Analysis of Optimization Problems
    - covering numbers, VC dimension, rademacher complexity
    - metric embedding and restricted isometries
    - compressed sensing and all that it has wrought Consent of the instructor is required. Graduate level courses in probability and optimization will be necessary. 

CS 294 - The Mathematics of Information and Data

Theoretical Computer Science

Histories of AI

Societal impacts of AI

Vector (mathematics and physics)

Dot product

Geometric interpretations

Taking gradients

Discrete random variables

Probability distributions

Mean

Variance

Marginal distributions

Conditional distribution

Big-O notation

Computational complexity

Recurrence relation

Dynamic programming

Continuous optimization

Objective functions

Gradient descent

Machine learning

Loss minimization

Hinge loss

Equitable performance

Design and organization of features

Computation graphs

Deep learning models composition

Cross-validation (statistics)

Search problem

Breadth-first search (BFS)

Uniform cost search (UCS)

UCS heuristics

Markov Decision Process (MDP)

Transportation problem

Discounting factor

Reinforcement learning (RL)

Model-free Monte Carlo

Q-learning

Function approximation

Game theory

Minimax algorithm

Temporal difference (TD) learning

Variable-based models

Backtracking search

Gibbs sampling

Bayesian network

Laplace smoothing

Logic

Entailment

Satisfiability

Soundness

Modus ponens

First-order logic

Unification (computer science)

Python

Linear regression

Linear classification

Non-linear functions

Neural network

Backpropagation

Generalization

K-means clustering

Exhaustive search

Depth-first search (DFS)

UCS correctness

Relaxed search problems

Dice game

Policy evaluation

Value iteration

Model-based Monte Carlo

State–action–reward–state–action (SARSA)

Epsilon-greedy exploration

Deep Reinforcement Learning

Halving game

Alpha-beta pruning

Nash equilibrium

Factor graphs

AC-3 algorithm

Markov random field

Hidden Markov Model (HMM)

Maximum Likelihood Estimation (MLE)

Propositional calculus (Propositional logic)

Contradiction

Model checking

Completeness

Horn clauses

Substitution (logic)

Skolem functions

CS 221 Artificial Intelligence: Principles and Techniques Stanford University Autumn 2022-2023 Stanford's CS 221 course teaches foundational principles and practical implementation of AI systems. It covers machine learning, game playing, constraint satisfaction, graphical models, and logic. A rigorous course requiring solid foundational skills in programming, math, and probability. The goal of artificial intelligence (AI) is to tackle complex real-world problems with rigorous mathematical tools. In this course, you will learn the foundational principles and practice implementing various AI systems. Specific topics include machine learning, search, Markov decision processes, game playing, constraint satisfaction, graphical models, and logic.  This course is fast-paced and covers a lot of ground, so it is important that you have a solid foundation in a number of areas. Here are the basic skills that you need and the classes that teach those skills:

- Programming (ideally Python): [CS 106A](http://www.stanford.edu/class/cs106a/), [CS 106B](http://www.stanford.edu/class/cs106b/), [CS 107](http://www.stanford.edu/class/cs107/)
- Discrete math, mathematical rigor: [CS 103](http://www.stanford.edu/class/cs103/)
- Probability: [CS 109](http://www.stanford.edu/class/cs109/)
- Linear algebra: [Math 51](https://web.stanford.edu/class/math51/textbook.html)

It is less important that you know particular things (e.g., we don't use eigenvectors in this course even though that's a pillar of any linear algebra course), and more important that you've done enough related things that you feel at ease with it. While it is possible to fill in the gaps, this course does move quickly, and ideally you want to be focusing your energy on learning AI rather than catching up on prerequisites. We have made a few [prerequisite modules](https://stanford-cs221.github.io/autumn2022/modules) that you can review to refresh your memory, and the first homework (foundations) will allow you to also get some practice on these basics. ### Further Reading

There are no required textbooks for this class, and you should be able to learn everything from the lecture notes and homeworks. However, if you would like to pursue more advanced topics or get another perspective on the same material, here are some great resources:

- [Russell and Norvig. Artificial Intelligence: A Modern Approach](http://aima.cs.berkeley.edu/). A comprehensive reference for all the AI topics that we will cover.
- [Koller and Friedman. Probabilistic Graphical Models](http://mitpress.mit.edu/books/probabilistic-graphical-models). Covers factor graphs and Bayesian networks (this is the textbook for CS228).
- [Sutton and Barto. Reinforcement Learning: An Introduction](https://mitpress.mit.edu/books/reinforcement-learning). Covers Markov decision processes and reinforcement learning (free online).
- [Hastie, Tibshirani, and Friedman. The Elements of Statistical Learning](https://web.stanford.edu/~hastie/ElemStatLearn/). Covers machine learning from a rigorous statistical perspective (free online).
- [Tsang. Foundations of Constraint Satisfaction](http://www.bracil.net/edward/fcs.html). Covers constraint satisfaction problems (free online).

Note that some of these books use different notation and terminology from this course, so it may take some effort to make the appropriate connections.

CS 221 Artificial Intelligence: Principles and Techniques

Artificial Intelligence

Connectionist Machines

McCullough and Pitt model

Hebb’s learning rule

Rosenblatt’s perceptron

Universal Approximator

Perceptron learning rule

Empirical risk minimization

Optimization

Back-propagation

Momentum

Nestorov

Convergence

Learning Rates

Optimization Algorithms

RMSProp

Acceleration

Overfitting

Regularization (mathematics)

Convolutional neural network (CNN)

Translation Invariance

Cascade Correlation Filters

Recurrent neural network (RNN)

Bidirectional RNNs

Sequence Prediction

Long Short-Term Memory (LSTM)

Connectionist Temporal Classification (CTC)

Representations

Autoencoders

Hopfield Networks

Boltzmann Machines

Normalizing Flows

Variational autoencoder (VAE)

Generative adversarial network (GAN)

Multi-layer Perceptron

Sequence-to-sequence (Seq2Seq)

AdaGrad

11-785 Introduction to Deep Learning Carnegie Mellon University Spring 2020 This course provides a comprehensive introduction to deep learning, starting from foundational concepts and moving towards complex topics such as sequence-to-sequence models. Students gain hands-on experience with PyTorch and can fine-tune models through practical assignments. A basic understanding of calculus, linear algebra, and Python programming is required. “Deep Learning” systems, typified by deep neural networks, are increasingly taking over all AI tasks, ranging from language understanding, and speech and image recognition, to machine translation, planning, and even game playing and autonomous driving. As a result, expertise in deep learning is fast changing from an esoteric desirable to a mandatory prerequisite in many advanced academic settings, and a large advantage in the industrial job market.

In this course we will learn about the basics of deep neural networks, and their applications to various AI tasks. By the end of the course, it is expected that students will have significant familiarity with the subject, and be able to apply Deep Learning to a variety of tasks. They will also be positioned to understand much of the current literature on the topic and extend their knowledge through further study.

If you are only interested in the lectures, you can watch them on the YouTube channel listed below.

### Course description from student point of view

The course is well rounded in terms of concepts. It helps us understand the fundamentals of Deep Learning. The course starts off gradually with MLPs and it progresses into the more complicated concepts such as attention and sequence-to-sequence models. We get a complete hands on with PyTorch which is very important to implement Deep Learning models. As a student, you will learn the tools required for building Deep Learning models. The homeworks usually have 2 components which is Autolab and Kaggle. The Kaggle components allow us to explore multiple architectures and understand how to fine-tune and continuously improve models. The task for all the homeworks were similar and it was interesting to learn how the same task can be solved using multiple Deep Learning approaches. Overall, at the end of this course you will be confident enough to build and tune Deep Learning models.  1. We will be using one of several toolkits (the primary toolkit for recitations/instruction is PyTorch). The toolkits are largely programmed in Python. You will need to be able to program in at least one of these languages. Alternately, you will be responsible for finding and learning a toolkit that requires programming in a language you are comfortable with,
1. You will need familiarity with basic calculus (differentiation, chain rule), linear algebra and basic probability. ### Course description from student point of view

The course is well rounded in terms of concepts. It helps us understand the fundamentals of Deep Learning. The course starts off gradually with MLPs and it progresses into the more complicated concepts such as attention and sequence-to-sequence models. We get a complete hands on with PyTorch which is very important to implement Deep Learning models. As a student, you will learn the tools required for building Deep Learning models. The homeworks usually have 2 components which is Autolab and Kaggle. The Kaggle components allow us to explore multiple architectures and understand how to fine-tune and continuously improve models. The task for all the homeworks were similar and it was interesting to learn how the same task can be solved using multiple Deep Learning approaches. Overall, at the end of this course you will be confident enough to build and tune Deep Learning models.

11-785 Introduction to Deep Learning

Deep Learning

Decision Trees

Clustering

PAC Learning

Error Decomposition

Boosting

Decision Making

Similarity Learning

Neural Networks Learning

Mathematical Optimization

Multiclass Learning

COS 324 - Introduction to Machine Learning Princeton University Fall 2017 A thorough introduction to machine learning principles such as online learning, decision making, gradient-based learning, and empirical risk minimization. It also explores regression, classification, dimensionality reduction, ensemble methods, neural networks, and deep learning. The course material is self-contained and based on freely available resources. The course provides an introduction to machine learning.

**Topic covered**:

- Online learning and decision making
- Learning from examples and generalization
- Empirical risk minimization and regularization
- Introduction to convex analysis
- Gradient-based learning
- Implementation and analysis of learning algorithms for regression, binary classification, multiclass categorization, and ranking problems
- Dimensionality reduction methods
- Ensemble methods and boosting
- Neural networks and deep learning
- Markov decision precesses    **NOTICE:**  All material of the course is self-contained and based on freely available books and surveys.   
Main references:

- [Understanding Machine Learning: From Theory to Algorithms](http://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/), by Shai Shalev-Shwartz and Shai Ben-David
- [Online convex optimization](http://ocobook.cs.princeton.edu/), by Elad Hazan
- [Machine Learning](http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/mlbook.html), by Tom Mitchell
- An Introduction to Computational Learning Theory, by Michael Kearns &amp; Umesh Vazirani
- Machine Learning: A Probabilistic Perspective, by Kevin Murphy,

Further advanced references: - [Convex Optimization](http://stanford.edu/~boyd/cvxbook/), by Stephen Boyd and Lieven Vandenberghe
- [Convex optimization: algorithms and complexity](https://arxiv.org/abs/1405.4980), by Sebastien Bubeck
- Artificial Intelligence: A Modern Approach, by Stuart Russell and Peter Norvig

Python Tutorials - [An interactive python tutorial](https://learnpython.org/) from LearnPython.com
- [Tutorial for Python 2.7](https://docs.python.org/2.7/tutorial/) from python.org
- [Tutorial for Python 3](https://docs.python.org/3/tutorial/) from python.org

COS 324 - Introduction to Machine Learning

Machine Learning

K-Nearest Neighbors

Linear Classifiers

Adam

Learning Rate Schedules

Support Vector Machine (SVM)

Softmax Loss

Higher-level representations

Image Features

Pooling

Batch Normalization

Transfer learning

AlexNet, VGG, GoogLeNet, ResNet

Activation Functions

Data Processing

Weight initialization

Hyperparameter Tuning

Data Augmentation

Feature visualization and inversion

Adversarial Examples

DeepDream and Style Transfer

Single-stage detectors

Two-stage detectors

Semantic/Instance/Panoptic segmentation

Gated recurrent unit (GRU)

Language Modeling

Image Captioning

Object Detection

Self-Attention

Transformer (machine learning model)

Video classification

3D CNNs

Two-stream networks

Multimodal Video Understanding

Supervised learning

Unsupervised learning

Pixel RNN, Pixel CNN

CS231n: Deep Learning for Computer Vision Stanford University Spring 2022 This is a deep-dive into the details of deep learning architectures for visual recognition tasks. The course provides students with the ability to implement, train their own neural networks and understand state-of-the-art computer vision research. It requires Python proficiency and familiarity with calculus, linear algebra, probability, and statistics. Computer Vision has become ubiquitous in our society, with applications in search, image understanding, apps, mapping, medicine, drones, and self-driving cars. Core to many of these applications are visual recognition tasks such as image classification, localization and detection. Recent developments in neural network (aka “deep learning”) approaches have greatly advanced the performance of these state-of-the-art visual recognition systems. This course is a deep dive into the details of deep learning architectures with a focus on learning end-to-end models for these tasks, particularly image classification. During the 10-week course, students will learn to implement and train their own neural networks and gain a detailed understanding of cutting-edge research in computer vision. Additionally, the final assignment will give them the opportunity to train and apply multi-million parameter networks on real-world vision problems of their choice. Through multiple hands-on assignments and the final course project, students will acquire the toolset for setting up deep learning tasks and practical engineering tricks for training and fine-tuning deep neural networks.  - Proficiency in Python  
    All class assignments will be in Python (and use numpy) (we provide a tutorial [here](http://cs231n.github.io/python-numpy-tutorial/) for those who aren't as familiar with Python). If you have a lot of programming experience but in a different language (e.g. C/C++/Matlab/Javascript) you will probably be fine.
- College Calculus, Linear Algebra (e.g. MATH 19 or 41, MATH 51)  
     You should be comfortable taking derivatives and understanding matrix vector operations and notation.
- Basic Probability and Statistics (e.g. CS 109 or other stats course)  
    You should know basics of probabilities, gaussian distributions, mean, standard deviation, etc. 

CS231n: Deep Learning for Computer Vision