Apache Spark

Clustering

Community Detection in Graphs

Computational Advertising

Decision Trees

Dimensionality Reduction

Frequent Itemsets Mining

Graph Representation Learning

Graph neural network (GNN)

Learning Embeddings

Learning through Experimentation

Locality-Sensitive Hashing

Matrix Sketching

Mining Data Streams

Optimizing Submodular Functions

PageRank

Recommender Systems

CS246: Mining Massive Data Sets Stanford University Spring 2023 This course focuses on data mining and machine learning algorithms for large scale data analysis. The emphasis is on parallel algorithms with tools like MapReduce and Spark. Topics include frequent itemsets, locality sensitive hashing, clustering, link analysis, and large-scale supervised machine learning. Familiarity with Java, Python, basic probability theory, linear algebra, and algorithmic analysis is required. ### What is this course about? [[Info Handout](https://web.stanford.edu/class/cs246/handouts/CS246_Info_Handout.pdf)]

The course will discuss data mining and machine learning algorithms for analyzing very large amounts of data. The emphasis will be on MapReduce and [Spark](http://spark.apache.org/) as tools for creating parallel algorithms that can process very large amounts of data.

**Topics include**: Frequent itemsets and Association rules, Near Neighbor Search in High Dimensional Data, Locality Sensitive Hashing (LSH), Dimensionality reduction, Recommendation Systems, Clustering, Link Analysis, Large-scale Supervised Machine Learning, Data streams, Mining the Web for Structured Data, Web Advertising.  Students are expected to have the following background:

- Knowledge of basic computer science principles and skills, at a level sufficient to write a reasonably non-trivial computer program (e.g., CS107 or CS145 or equivalent are recommended).
- Good knowledge of Java and Python will be extremely helpful since most assignments will require the use of Spark.
- Familiarity with basic probability theory (CS109 or Stat116 or equivalent is sufficient but not necessary).
- Familiarity with writing rigorous proofs (at a minimum, at the level of CS 103).
- Familiarity with basic linear algebra (e.g., any of Math 51, Math 103, Math 113, CS 205, or EE 263 would be much more than necessary).
- Familiarity with algorithmic analysis (e.g., CS 161 would be much more than necessary). ### Reference Text

The following text is useful, but not required. It can be downloaded for free, or purchased from Cambridge University Press.

[Leskovec-Rajaraman-Ullman: Mining of Massive Dataset](http://www.mmds.org/)

CS246: Mining Massive Data Sets

Machine learning focuses on the development of algorithms and statistical models that can enable computers to learn from and make predictions or decisions without being explicitly programmed. Common sub-topics include supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning. 

Machine Learning

Computer Science

CS 224W: Machine Learning with Graphs Stanford University Winter 2023 The course focuses on the analysis of large graphs and uses machine learning to gain insights into social, technological, and biological systems. Topics include Graph Neural Networks, influence maximization, disease outbreak detection, and social network analysis. ## What is this course about?

Complex data can be represented as a graph of relationships between objects. Such networks are a fundamental tool for modeling social, technological, and biological systems. This course focuses on the computational, algorithmic, and modeling challenges specific to the analysis of massive graphs. By means of studying the underlying graph structure and its features, students are introduced to machine learning techniques and data mining tools apt to reveal insights on a variety of networks.
Topics include: representation learning and Graph Neural Networks; algorithms for the World Wide Web; reasoning over Knowledge Graphs; influence maximization; disease outbreak detection, social network analysis.  Students are expected to have the following background:

- Knowledge of basic computer science principles, sufficient to write a reasonably non-trivial computer program (e.g., CS107 or CS145 or equivalent are recommended)
- Familiarity with the basic probability theory (CS109 or Stat116 are sufficient but not necessary)
- Familiarity with the basic linear algebra (any one of Math 51, Math 103, Math 113, or CS 205 would be much more than necessary)

The recitation sessions in the first weeks of the class will give an overview of the expected background. ## Course Materials

Notes and reading assignments will be posted periodically on the course Web site. The following books are recommended as optional reading:

- [Graph Representation Learning](https://www.cs.mcgill.ca/~wlh/grl_book/) by William L. Hamilton
- [Networks, Crowds, and Markets: Reasoning About a Highly Connected World](http://www.cs.cornell.edu/home/kleinber/networks-book/) by David Easley and Jon Kleinberg

## Previous Offerings

You can access slides and project reports of previous versions of the course on our archived websites: [CS224W: Fall 2021](http://snap.stanford.edu/class/cs224w-2021) / [CS224W: Winter 2021](http://snap.stanford.edu/class/cs224w-2020) / [CS224W: Fall 2019](http://snap.stanford.edu/class/cs224w-2019) / [CS224W: Fall 2018](http://snap.stanford.edu/class/cs224w-2018) / [CS224W: Fall 2017](http://snap.stanford.edu/class/cs224w-2017) / [CS224W: Fall 2016](http://snap.stanford.edu/class/cs224w-2016) / [CS224W: Fall 2015](http://snap.stanford.edu/class/cs224w-2015) / [CS224W: Fall 2014](http://snap.stanford.edu/class/cs224w-2014) / [CS224W: Fall 2013](http://snap.stanford.edu/class/cs224w-2013) / [CS224W: Fall 2012](http://snap.stanford.edu/class/cs224w-2012) / [CS224W: Fall 2011](http://snap.stanford.edu/class/cs224w-2011) / [CS224W: Fall 2010](http://snap.stanford.edu/class/cs224w-2010)
- [Network Science](http://networksciencebook.com) by Albert-László Barabási

CS 224W: Machine Learning with Graphs

CS 229: Machine Learning Stanford University Winter 2023 This comprehensive course covers various machine learning principles from supervised, unsupervised to reinforcement learning. Topics also touch on neural networks, support vector machines, bias-variance tradeoffs, and many real-world applications. It requires a background in computer science, probability, multivariable calculus, and linear algebra.  This course provides a broad introduction to machine learning and statistical pattern recognition. Topics include: supervised learning (generative/discriminative learning, parametric/non-parametric learning, neural networks, support vector machines); unsupervised learning (clustering, dimensionality reduction, kernel methods); learning theory (bias/variance tradeoffs, practical advice); reinforcement learning and adaptive control. The course will also discuss recent applications of machine learning, such as to robotic control, data mining, autonomous navigation, bioinformatics, speech recognition, and text and web data processing.  Students are expected to have the following background:

- Knowledge of basic computer science principles and skills, at a level sufficient to write a reasonably non-trivial computer program in Python/NumPy. (CS106A or CS106B, CS106X.)
- Familiarity with probability theory. (CS 109, MATH151, or STATS 116) 
- Familiarity with multivariable calculus and linear algebra (relevant classes include, but not limited to MATH 51, MATH 104, MATH 113, CS 205, CME 100.) Stanford Math 51 course text can be found [here](https://web.stanford.edu/class/math51/stanford/math51book.pdf). 

CS 229: Machine Learning

COS 324: Introduction to Machine Learning Princeton University Spring 2019 This introductory course focuses on machine learning, probabilistic reasoning, and decision-making in uncertain environments. A blend of theory and practice, the course aims to answer how systems can learn from experience and manage real-world uncertainties.  This course provides a broad introduction to machine learning, probabilistic reasoning and decision making in uncertain environments. The course should be of interest to undergraduate students in computer science, applied mathematics, sciences and engineering, and lower-level graduate students looking to gain an introduction to the tools of machine learning and probabilistic reasoning with applications to data-intensive problems in the applied sciences, natural sciences and social sciences.

For students with interests in the fundamentals of machine learning and probabilistic artificial intelligence, this course will address three central, related questions in the design and engineering of intelligent systems. How can a system process its perceptual inputs in order to obtain a reasonable picture of the world? How can we build programs that learn from experience? How can we design systems to deal with the inherent uncertainty in the real world?

Our approach to these questions will be both theoretical and practical. We will develop a mathematical underpinning for the methods of machine learning and probabilistic reasoning. We will look at a variety of successful algorithms and applications. We will also discuss the motivations behind the algorithms, and the properties that determine whether or not they will work well for a particular task.  Students should be comfortable with writing non-trivial programs in Python. Students should have a background in basic probability theory, and some level of mathematical sophistication, including calculus and linear algebra. There is no required textbook for the course. This course has its own notes that are considered the required reading. Nevertheless, people learn in different ways and seeing the material presented in different formats can be valuable. To that end, additional optional material is linked on the course website and several books provide useful additional reading:

- Kevin Murphy. *Machine Learning: A Probabilistic Perspective*. MIT Press. 2012.
- Christopher M. Bishop. *Pattern Recognition and Machine Learning*. Springer. 2011.
- David J.C. MacKay. *Information Theory, Inference, and Learning Algorithms*. Cambridge University
Press. 2003. Freely available online at http://www.inference.org.uk/itila/book.html.
- Trevor Hastie, Robert Tibshirani, and Jerome Friedman. *The Elements of Statistical Learning*.
Springer. 2001. Freely available online at http://www-stat.stanford.edu/~tibs/ElemStatLearn/
- Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. *An Introduction to Statistical
Learning*. Springer. 2013. Freely available online at http://www-bcf.usc.edu/~gareth/ISL/
- Richard S. Sutton and Andrew G. Barto. *Reinforcement Learning: An Introduction*. MIT Press. 1998.
Freely available online at http://incompleteideas.net/book/the-book-2nd.html

COS 324: Introduction to Machine Learning

CS 228 - Probabilistic Graphical Models Stanford University Winter 2023 An in-depth study of probabilistic graphical models, combining graph and probability theory. Equips students with the skills to design, implement, and apply these models to solve real-world problems. Discusses Bayesian networks, exact and approximate inference methods, etc. Probabilistic graphical models are a powerful framework for representing complex domains using probability distributions, with numerous applications in machine learning, computer vision, natural language processing and computational biology. Graphical models bring together graph theory and probability theory, and provide a flexible framework for modeling large collections of random variables with complex interactions. This course will provide a comprehensive survey of the topic, introducing the key formalisms and main techniques used to construct them, make predictions, and support decision-making under uncertainty.

The aim of this course is to develop the knowledge and skills necessary to design, implement and apply these models to solve real problems. The course will cover: (1) Bayesian networks, undirected graphical models and their temporal extensions; (2) exact and approximate inference methods; (3) estimation of the parameters and the structure of graphical models.  Students are expected to have background in basic probability theory, statistics, programming, algorithm design and analysis. If you are able to comfortably able to complete homework 1 then you likely have all the relevant background knowledge. ## Recommended Readings

**Corresponding Textbook**: (“PGM”) *Probabilistic Graphical Models: Principles and Techniques* by Daphne Koller and Nir Friedman. MIT Press.

**Course Notes**: Available [here](https://ermongroup.github.io/cs228-notes/). Student contributions welcome!

**Lecture Videos**: [here](https://canvas.stanford.edu/courses/166637/external_tools/3367)

**Further Readings**:

- (“GEV”) *Graphical models, exponential families, and variational inference* by Martin J. Wainwright and Michael I. Jordan. Available [online](https://www.eecs.berkeley.edu/~wainwrig/Papers/WaiJor08_FTML.pdf).
- *Modeling and Reasoning with Bayesian Networks* by Adnan Darwiche. Available [online](http://ebookcentral.proquest.com/lib/stanford-ebooks/detail.action?docID=424583) (through Stanford).
- *Pattern Recognition and Machine Learning* by Chris Bishop. Available [online](https://www.microsoft.com/en-us/research/people/cmbishop/#!prml-book).
- *Machine Learning: A Probabilistic Perspective* by Kevin P. Murphy. Available [online](http://ebookcentral.proquest.com/lib/stanford-ebooks/detail.action?docID=3339490) (through Stanford).
- *Information Theory, Inference, and Learning Algorithms* by David J. C. Mackay. Available [online](http://www.inference.org.uk/mackay/itila/book.html).
- *Bayesian Reasoning and Machine Learning* by David Barber. Available [online](http://www.cs.ucl.ac.uk/staff/d.barber/brml/).

CS 228 - Probabilistic Graphical Models

CS246: Mining Massive Data Sets

Overview

What is this course about? [Info Handout]

Prerequisites

Learning objectives

Textbooks and other notes

Reference Text

Other courses in Machine Learning

CS 224W: Machine Learning with Graphs

CS 229: Machine Learning

COS 324: Introduction to Machine Learning

CS 228 - Probabilistic Graphical Models

Courseware availability

Covered concepts