Apache Hadoop is an open-source software framework that allows for distributed storage and processing of big data using the MapReduce programming model. It consists of a storage part called Hadoop Distributed File System (HDFS) and a processing part that utilizes data locality to process data faster and more efficiently. Hadoop also has a wide range of additional software packages that can be installed on top of it, such as Apache Pig, Apache Hive, and Apache Spark.
Carnegie Mellon University
Fall 2020
A course offering both theoretical understanding and practical experience in distributed systems. Key themes include concurrency, scheduling, network communication, and security. Real-world protocols and paradigms like distributed filesystems, RPC, MapReduce are studied. Course utilizes C and Go programming languages.
No concepts data
+ 34 more concepts