Apache Spark

Apache Spark

Apache Spark is an open-source analytics engine for large-scale data processing. It provides an interface for programming clusters with implicit data parallelism and fault tolerance. It was originally developed at the University of California, Berkeley's AMPLab and donated to the Apache Software Foundation.

6 courses cover this concept

CS 186 Introduction to Database Systems

UC Berkeley

Spring 2023

This project-heavy course covers access methods, data models, query languages, database services, and interfaces. It introduces transaction processing and requires CS 61A, CS 61B, and CS 61C as prerequisites/corequisites. It suggests proficiency in Java for project work.

No concepts data

+ 23 more concepts

CS 149 PARALLEL COMPUTING

Stanford University

Fall 2022

Focused on principles and trade-offs in designing modern parallel computing systems, this course also teaches parallel programming techniques. It is intended for students looking to understand both parallel hardware and software design. Prerequisite knowledge in computer systems is required.

No concepts data

+ 45 more concepts

CS 262a Advanced Topics in Computer Systems

UC Berkeley

Fall 2021

A graduate survey of systems managing computation and information. Topics include volatile and persistent memory management, system support for networking, security infrastructure, extensible systems, APIs, and large software system performance analysis. Students are expected to engage in quality systems research, culminating in a publishable group project.

No concepts data

+ 31 more concepts

CS246: Mining Massive Data Sets

Stanford University

Spring 2023

This course focuses on data mining and machine learning algorithms for large scale data analysis. The emphasis is on parallel algorithms with tools like MapReduce and Spark. Topics include frequent itemsets, locality sensitive hashing, clustering, link analysis, and large-scale supervised machine learning. Familiarity with Java, Python, basic probability theory, linear algebra, and algorithmic analysis is required.

No concepts data

+ 17 more concepts

15-440 Distributed Systems

Carnegie Mellon University

Fall 2020

A course offering both theoretical understanding and practical experience in distributed systems. Key themes include concurrency, scheduling, network communication, and security. Real-world protocols and paradigms like distributed filesystems, RPC, MapReduce are studied. Course utilizes C and Go programming languages.

No concepts data

+ 34 more concepts

CS 61C Great Ideas in Computer Architecture (Machine Structures)

UC Berkeley

Fall 2022

This course deepens students' understanding of computer architecture and the translation of high-level programs into machine language. Emphasis is on C and assembly language programming, computer organization, parallelism, CPU design, and warehouse-scale computing. Prerequisites include CS61A and CS61B or equivalent C-based programming experience.

No concepts data

+ 51 more concepts