Fall 2022

UC Berkeley

UC Berkeley's course blends inferential thinking, computational thinking, and real-world relevance, offering students hands-on analysis of real-world datasets. It covers critical concepts in computer programming, statistical inference, privacy, and study design.

Foundations of Data Science combines three perspectives: inferential thinking, computational thinking, and real-world relevance. Given data arising from some real-world phenomenon, how does one analyze that data so as to understand that phenomenon? The course teaches critical concepts and skills in computer programming and statistical inference, in conjunction with hands-on analysis of real-world datasets, including economic data, document collections, geographical data, and social networks. It also delves into social issues surrounding data analysis such as privacy and study design.

This course does not have any prerequisites beyond high-school algebra. The curriculum and format is designed specifically for students who have not previously taken statistics or computer science courses. Students with some prior experience in either statistics or computing are welcome to enroll and will find much of interest due to the innovative nature of the course. Students who have taken several statistics or computer science courses should instead take a more advanced course.

No data.

Our primary text is an online book called Computational and Inferential Thinking: The Foundations of Data Science. This text was written for the course by the course instructors. A complete PDF of the textbook can be found in the Student Materials Google Drive.

The computing platform for the course is hosted at data8.datahub.berkeley.edu. Students find it convenient to use their own computer for the course. If you do not have adequate access to a personal computer, we can help you borrow a machine; please contact data8@berkeley.edu.

A/B TestingBayes' theoremCausalityCensusCenter and SpreadChanceChartsClassificationClassifiersComparing DistributionsConditional (computer programming)Conditional probabilityConfidence IntervalsCorrelationData typeDecisionsFunction (computer programming)GroupsHistogramsIteration JoinsLeast SquaresLinear regressionModelsNormal distributionP-ValuePivotsPrivacyRegression InferenceResidualsSample MeansSamplingTables