Attention (machine learning)

Attention (machine learning)

Machine learning-based attention is a mechanism mimicking cognitive attention. It calculates "soft" weights for each word, more precisely for its embedding, in the context window. It can do it either in parallel (such as in transformers) or sequentially (such as recursive neural networks). "Soft" weights can change during each runtime, in contrast to "hard" weights, which are (pre-)trained and fine-tuned and remain frozen afterwards. Multiple attention heads are used in transformer-based large language models.

3 courses cover this concept

CS 182/282A: Deep Neural Networks

UC Berkeley

Fall 2022

An advanced course dealing with deep networks in the fields of computer vision, language technology, robotics, and control. It delves into the themes of deep learning, model families, and real-world applications. A strong mathematical background in calculus, linear algebra, probability, optimization, and statistical learning is necessary.

CS 230 Deep Learning

Stanford University

Fall 2022

An in-depth course focused on building neural networks and leading successful machine learning projects. It covers Convolutional Networks, RNNs, LSTM, Adam, Dropout, BatchNorm, Xavier/He initialization, and more. Students are expected to have basic computer science skills, probability theory knowledge, and linear algebra familiarity.

CSCI 1470/2470 Deep Learning

Brown University

Spring 2022

Brown University's Deep Learning course acquaints students with the transformative capabilities of deep neural networks in computer vision, NLP, and reinforcement learning. Using the TensorFlow framework, topics like CNNs, RNNs, deepfakes, and reinforcement learning are addressed, with an emphasis on ethical applications and potential societal impacts.