Launch Your Career in Data Science. A ninecourse introduction to data science, developed and taught by leading professors
About This Specialization
Ask the right questions, manipulate data sets, and create visualizations to communicate results.
This Specialization covers the concepts and tools you’ll need throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results. In the final Capstone Project, you’ll apply the skills learned by building a data product using realworld data. At completion, students will have a portfolio demonstrating their mastery of the material.
Created by:
Industry Partners:
10 courses
Follow the suggested order or choose your own.
Projects
Designed to help you practice and apply the skills you learn.
Certificates
Highlight your new skills on your resume or LinkedIn

COURSE 1
The Data Scientist’s Toolbox
 Commitment
 14 hours/week
 Subtitles
 English, French, Chinese (Simplified), Greek, Italian, Portuguese (Brazilian), Vietnamese, Russian, Turkish, Hebrew
About the Course
In this course you will get an introduction to the main tools and ideas in the data scientist’s toolbox. The course gives an overview of the data, questions, and tools that data analysts and data scientists work with. There are two components to this course. The first is a conceptual introduction to the ideas behind turning data into actionable knowledge. The second is a practical introduction to the tools that will be used in the program like version control, markdown, git, GitHub, R, and RStudio.
You can choose to take this course only. Learn more.
WEEK 1Week 1During Week 1, you’ll learn about the goals and objectives of the Data Science Specialization and each of its components. You’ll also get an overview of the field as well as instructions on how to install R.Reading · Welcome to the Data Scientist’s ToolboxReading · PreCourse SurveyReading · SyllabusReading · Specialization TextbooksVideo · Specialization MotivationReading · The Elements of Data Analytic StyleVideo · The Data Scientist’s ToolboxVideo · Getting HelpVideo · Finding AnswersVideo · R Programming OverviewVideo · Getting Data OverviewVideo · Exploratory Data Analysis OverviewVideo · Reproducible Research OverviewVideo · Statistical Inference OverviewVideo · Regression Models OverviewVideo · Practical Machine Learning OverviewVideo · Building Data Products OverviewVideo · Installing R on Windows {Roger Peng}Video · Install R on a Mac {Roger Peng}Video · Installing Rstudio {Roger Peng}Video · Installing Outside Software on Mac (OS X Mavericks)Quiz · Week 1 Quiz
WEEK 2Week 2: Installing the ToolboxThis is the most lectureintensive week of the course. The primary goal is to get you set up with R, Rstudio, Github, and the other tools we will use throughout the Data Science Specialization and your ongoing work as a data scientist.Video · Tips from Coursera Users – Optional VideoVideo · Command Line InterfaceVideo · Introduction to GitVideo · Introduction to GithubVideo · Creating a Github RepositoryVideo · Basic Git CommandsVideo · Basic MarkdownVideo · Installing R PackagesVideo · Installing RtoolsQuiz · Week 2 Quiz
WEEK 3Week 3: Conceptual IssuesThe Week 3 lectures focus on conceptual issues behind study design and turning data into knowledge. If you have trouble or want to explore issues in more depth, please seek out answers on the forums. They are a great resource! If you happen to be a superstar who already gets it, please take the time to help your classmates by answering their questions as well. This is one of the best ways to practice using and explaining your skills to others. These are two of the key characteristics of excellent data scientists.Video · Types of QuestionsVideo · What is Data?Video · What About Big Data?Video · Experimental DesignQuiz · Week 3 Quiz
WEEK 4Week 4: Course Project Submission & EvaluationIn Week 4, we’ll focus on the Course Project. This is your opportunity to install the tools and set up the accounts that you’ll need for the rest of the specialization and for work in data science.Peer Review · Course ProjectReading · PostCourse Survey

COURSE 2
R Programming
 Subtitles
 English, French, Japanese, Chinese (Simplified)
About the Course
In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment and describe generic programming language
You can choose to take this course only. Learn more.
WEEK 1Week 1: Background, Getting Started, and Nuts & BoltsThis week covers the basics to get you started up with R. The Background Materials lesson contains information about course mechanics and some videos on installing R. The Week 1 videos cover the history of R and S, go over the basic data types in R, and describe the functions for reading and writing data. I recommend that you watch the videos in the listed order, but watching the videos out of order isn’t going to ruin the story.Reading · Welcome to R ProgrammingReading · About the InstructorReading · PreCourse SurveyReading · SyllabusReading · Course TextbookReading · Course Supplement: The Art of Data ScienceReading · Data Science Podcast: Not So Standard DeviationsVideo · Installing R on a MacVideo · Installing R on WindowsVideo · Installing R Studio (Mac)Video · Writing Code / Setting Your Working Directory (Windows)Video · Writing Code / Setting Your Working Directory (Mac)Reading · Getting Started and R Nuts and BoltsVideo · IntroductionVideo · Overview and History of RVideo · Getting HelpVideo · R Console Input and EvaluationVideo · Data Types – R Objects and AttributesVideo · Data Types – Vectors and ListsVideo · Data Types – MatricesVideo · Data Types – FactorsVideo · Data Types – Missing ValuesVideo · Data Types – Data FramesVideo · Data Types – Names AttributeVideo · Data Types – SummaryVideo · Reading Tabular DataVideo · Reading Large TablesVideo · Textual Data FormatsVideo · Connections: Interfaces to the Outside WorldVideo · Subsetting – BasicsVideo · Subsetting – ListsVideo · Subsetting – MatricesVideo · Subsetting – Partial MatchingVideo · Subsetting – Removing Missing ValuesVideo · Vectorized OperationsQuiz · Week 1 QuizVideo · Introduction to swirlReading · Practical R Exercises in swirl Part 1Practice Programming Assignment · swirl Lesson 1: Basic Building BlocksPractice Programming Assignment · swirl Lesson 2: Workspace and FilesPractice Programming Assignment · swirl Lesson 3: Sequences of NumbersPractice Programming Assignment · swirl Lesson 4: VectorsPractice Programming Assignment · swirl Lesson 5: Missing ValuesPractice Programming Assignment · swirl Lesson 6: Subsetting VectorsPractice Programming Assignment · swirl Lesson 7: Matrices and Data Frames
WEEK 2Week 2: Programming with RWelcome to Week 2 of R Programming. This week, we take the gloves off, and the lectures cover key topics like control structures and functions. We also introduce the first programming assignment for the course, which is due at the end of the week.Reading · Week 2: Programming with RVideo · Control Structures – IntroductionVideo · Control Structures – IfelseVideo · Control Structures – For loopsVideo · Control Structures – While loopsVideo · Control Structures – Repeat, Next, BreakVideo · Your First R FunctionVideo · Functions (part 1)Video · Functions (part 2)Video · Scoping Rules – Symbol BindingVideo · Scoping Rules – R Scoping RulesVideo · Scoping Rules – Optimization Example (OPTIONAL)Video · Coding StandardsVideo · Dates and TimesReading · Practical R Exercises in swirl Part 2Practice Programming Assignment · swirl Lesson 1: LogicPractice Programming Assignment · swirl Lesson 2: FunctionsPractice Programming Assignment · swirl Lesson 3: Dates and TimesQuiz · Week 2 QuizReading · Programming Assignment 1 INSTRUCTIONS: Air PollutionQuiz · Programming Assignment 1: Quiz
WEEK 3Week 3: Loop Functions and DebuggingWe have now entered the third week of R Programming, which also marks the halfway point. The lectures this week cover loop functions and the debugging tools in R. These aspects of R make R useful for both interactive work and writing longer code, and so they are commonly used in practice.Reading · Week 3: Loop Functions and DebuggingVideo · Loop Functions – lapplyVideo · Loop Functions – applyVideo · Loop Functions – mapplyVideo · Loop Functions – tapplyVideo · Loop Functions – splitVideo · Debugging Tools – Diagnosing the ProblemVideo · Debugging Tools – Basic ToolsVideo · Debugging Tools – Using the ToolsReading · Practical R Exercises in swirl Part 3Practice Programming Assignment · swirl Lesson 1: lapply and sapplyPractice Programming Assignment · swirl Lesson 2: vapply and tapplyQuiz · Week 3 QuizPeer Review · Programming Assignment 2: Lexical Scoping
WEEK 4Week 4: Simulation & ProfilingThis week covers how to simulate data in R, which serves as the basis for doing simulation studies. We also cover the profiler in R which lets you collect detailed information on how your R functions are running and to identify bottlenecks that can be addressed. The profiler is a key tool in helping you optimize your programs. Finally, we cover the str function, which I personally believe is the most useful function in R.Reading · Week 4: Simulation & ProfilingVideo · The str FunctionVideo · Simulation – Generating Random NumbersVideo · Simulation – Simulating a Linear ModelVideo · Simulation – Random SamplingVideo · R Profiler (part 1)Video · R Profiler (part 2)Quiz · Week 4 QuizReading · Practical R Exercises in swirl Part 4Practice Programming Assignment · swirl Lesson 1: Looking at DataPractice Programming Assignment · swrl Lesson 2: SimulationPractice Programming Assignment · swirl Lesson 3: Base GraphicsReading · Programming Assignment 3 INSTRUCTIONS: Hospital QualityQuiz · Programming Assignment 3: QuizReading · PostCourse Survey 
COURSE 3
Getting and Cleaning Data
 Subtitles
 English, Russian, French, Chinese (Simplified)
About the Course
Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data.
You can choose to take this course only. Learn more.
WEEK 1Week 1In this first week of the course, we look at finding data and reading different file types.Reading · Welcome to Week 1Reading · SyllabusReading · PreCourse SurveyVideo · Obtaining Data MotivationVideo · Raw and Processed DataVideo · Components of Tidy DataVideo · Downloading FilesVideo · Reading Local FilesVideo · Reading Excel FilesVideo · Reading XMLVideo · Reading JSONVideo · The data.table PackageReading · Practical R Exercises in swirl Part 1Quiz · Week 1 Quiz
WEEK 2Week 2Welcome to Week 2 of Getting and Cleaning Data! The primary goal is to introduce you to the most common data storage systems and the appropriate tools to extract data from web or from databases like MySQL.Video · Reading from MySQLVideo · Reading from HDF5Video · Reading from The WebVideo · Reading From APIsVideo · Reading From Other SourcesQuiz · Week 2 Quiz
WEEK 3Week 3Welcome to Week 3 of Getting and Cleaning Data! This week the lectures will focus on organizing, merging and managing the data you have collected using the lectures from Weeks 1 and 2.Video · Subsetting and SortingVideo · Summarizing DataVideo · Creating New VariablesVideo · Reshaping DataVideo · Managing Data Frames with dplyr – IntroductionVideo · Managing Data Frames with dplyr – Basic ToolsVideo · Merging DataReading · Practical R Exercises in swirl Part 2Practice Programming Assignment · swirl Lesson 1: Manipulating Data with dplyrPractice Programming Assignment · swirl Lesson 2: Grouping and Chaining with dplyrPractice Programming Assignment · swirl Lesson 3: Tidying Data with tidyrQuiz · Week 3 Quiz
WEEK 4Week 4Welcome to Week 4 of Getting and Cleaning Data! This week we finish up with lectures on text and date manipulation in R. In this final week we will also focus on peer grading of Course Projects.Video · Editing Text VariablesVideo · Regular Expressions IVideo · Regular Expressions IIVideo · Working with DatesVideo · Data ResourcesReading · Practical R Exercises in swirl Part 4Practice Programming Assignment · swirl Lesson 1: Dates and Times with lubridateQuiz · Week 4 QuizPeer Review · Getting and Cleaning Data Course ProjectReading · PostCourse Survey

COURSE 4
Exploratory Data Analysis
 Subtitles
 English, Chinese (Simplified)
About the Course
This course covers the essential exploratory techniques for summarizing data. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data. We will cover in detail the plotting systems in R as well as some of the basic principles of constructing data graphics. We will also cover some of the common multivariate statistical techniques used to visualize highdimensional data.
You can choose to take this course only. Learn more.
WEEK 1Week 1This week covers the basics of analytic graphics and the base plotting system in R. We’ve also included some background material to help you install R if you haven’t done so already.Reading · Welcome to Exploratory Data AnalysisReading · SyllabusReading · PreCourse SurveyVideo · IntroductionReading · Exploratory Data Analysis with R BookReading · The Art of Data ScienceVideo · Installing R on Windows (3.2.1)Video · Installing R on a Mac (3.2.1)Video · Installing R Studio (Mac)Video · Setting Your Working Directory (Windows)Video · Setting Your Working Directory (Mac)Video · Principles of Analytic GraphicsVideo · Exploratory Graphs (part 1)Video · Exploratory Graphs (part 2)Video · Plotting Systems in RVideo · Base Plotting System (part 1)Video · Base Plotting System (part 2)Video · Base Plotting DemonstrationVideo · Graphics Devices in R (part 1)Video · Graphics Devices in R (part 2)Reading · Practical R Exercises in swirl Part 1Practice Programming Assignment · swirl Lesson 1: Principles of Analytic GraphsPractice Programming Assignment · swirl Lesson 2: Exploratory GraphsPractice Programming Assignment · swirl Lesson 3: Graphics Devices in RPractice Programming Assignment · swirl Lesson 4: Plotting SystemsPractice Programming Assignment · swirl Lesson 5: Base Plotting SystemQuiz · Week 1 QuizPeer Review · Course Project 1
WEEK 2Week 2Welcome to Week 2 of Exploratory Data Analysis. This week covers some of the more advanced graphing systems available in R: the Lattice system and the ggplot2 system. While the base graphics system provides many important tools for visualizing data, it was part of the original R system and lacks many features that may be desirable in a plotting system, particularly when visualizing high dimensional data. The Lattice and ggplot2 systems also simplify the laying out of plots making it a much less tedious process.Video · Lattice Plotting System (part 1)Video · Lattice Plotting System (part 2)Video · ggplot2 (part 1)Video · ggplot2 (part 2)Video · ggplot2 (part 3)Video · ggplot2 (part 4)Video · ggplot2 (part 5)Reading · Practical R Exercises in swirl Part 2Practice Programming Assignment · swirl Lesson 1: Lattice Plotting SystemPractice Programming Assignment · swirl Lesson 2: Working with ColorsPractice Programming Assignment · swirl Lesson 3: GGPlot2 Part1Practice Programming Assignment · swirl Lesson 4: GGPlot2 Part2Practice Programming Assignment · swirl Lesson 5: GGPlot2 ExtrasQuiz · Week 2 Quiz
WEEK 3Week 3Welcome to Week 3 of Exploratory Data Analysis. This week covers some of the workhorse statistical methods for exploratory analysis. These methods include clustering and dimension reduction techniques that allow you to make graphical displays of very high dimensional data (many many variables). We also cover novel ways to specify colors in R so that you can use color as an important and useful dimension when making data graphics. All of this material is covered in chapters 912 of my book Exploratory Data Analysis with R.Video · Hierarchical Clustering (part 1)Video · Hierarchical Clustering (part 2)Video · Hierarchical Clustering (part 3)Video · KMeans Clustering (part 1)Video · KMeans Clustering (part 2)Video · Dimension Reduction (part 1)Video · Dimension Reduction (part 2)Video · Dimension Reduction (part 3)Video · Working with Color in R Plots (part 1)Video · Working with Color in R Plots (part 2)Video · Working with Color in R Plots (part 3)Video · Working with Color in R Plots (part 4)Reading · Practical R Exercises in swirl Part 3Practice Programming Assignment · swirl Lesson 1: Hierarchical ClusteringPractice Programming Assignment · swirl Lesson 2: K Means ClusteringPractice Programming Assignment · swirl Lesson 3: Dimension ReductionPractice Programming Assignment · swirl Lesson 4: Clustering Example
WEEK 4Week 4This week, we’ll look at two case studies in exploratory data analysis. The first involves the use of cluster analysis techniques, and the second is a more involved analysis of some air pollution data. How one goes about doing EDA is often personal, but I’m providing these videos to give you a sense of how you might proceed with a specific type of dataset.Video · Clustering Case StudyVideo · Air Pollution Case StudyReading · Practical R Exercises in swirl Part 4Practice Programming Assignment · swirl Lesson 1: CaseStudyPeer Review · Course Project 2Reading · PostCourse Survey

COURSE 5
Reproducible Research
 Commitment
 49 hours/week
 Subtitles
 English
About the Course
This course focuses on the concepts and tools behind reporting modern data analyses in a reproducible manner. Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them. The need for reproducibility is increasing dramatically as data analyses become more complex, involving larger datasets and more sophisticated computations. Reproducibility allows for people to focus on the actual content of a data analysis, rather than on superficial details reported in a written summary. In addition, reproducibility makes an analysis more useful to others because the data and code that actually conducted the analysis are available. This course will focus on literate statistical analysis tools which allow one to publish data analyses in a single document that allows others to easily execute the same analysis to obtain the same results.
You can choose to take this course only. Learn more.
WEEK 1Week 1: Concepts, Ideas, & StructureThis week will cover the basic ideas of reproducible research since they may be unfamiliar to some of you. We also cover structuring and organizing a data analysis to help make it more reproducible. I recommend that you watch the videos in the order that they are listed on the web page, but watching the videos out of order isn’t going to ruin the story.Video · IntroductionReading · SyllabusReading · Precourse surveyReading · Course Book: Report Writing for Data Science in RVideo · What is Reproducible Research About?Video · Reproducible Research: Concepts and Ideas (part 1)Video · Reproducible Research: Concepts and Ideas (part 2)Video · Reproducible Research: Concepts and Ideas (part 3)Video · Scripting Your AnalysisVideo · Structure of a Data Analysis (part 1)Video · Structure of a Data Analysis (part 2)Video · Organizing Your AnalysisQuiz · Week 1 Quiz
WEEK 2Week 2: Markdown & knitrThis week we cover some of the core tools for developing reproducible documents. We cover the literate programming tool knitr and show how to integrate it with Markdown to publish reproducible web documents. We also introduce the first peer assessment which will require you to write up a reproducible data analysis using knitr.Video · Coding Standards in RVideo · MarkdownVideo · R MarkdownVideo · R Markdown DemonstrationVideo · knitr (part 1)Video · knitr (part 2)Video · knitr (part 3)Video · knitr (part 4)Quiz · Week 2 QuizVideo · Introduction to Course Project 1Peer Review · Course Project 1
WEEK 3Week 3: Reproducible Research Checklist & Evidencebased Data AnalysisThis week covers what one could call a basic check list for ensuring that a data analysis is reproducible. While it’s not absolutely sufficient to follow the check list, it provides a necessary minimum standard that would be applicable to almost any area of analysis.Video · Communicating ResultsVideo · RPubsVideo · Reproducible Research Checklist (part 1)Video · Reproducible Research Checklist (part 2)Video · Reproducible Research Checklist (part 3)Video · Evidencebased Data Analysis (part 1)Video · Evidencebased Data Analysis (part 2)Video · Evidencebased Data Analysis (part 3)Video · Evidencebased Data Analysis (part 4)Video · Evidencebased Data Analysis (part 5)
WEEK 4Week 4: Case Studies & CommentariesThis week there are two case studies involving the importance of reproducibility in science for you to watch.Video · Caching ComputationsVideo · Case Study: Air PollutionVideo · Case Study: High Throughput BiologyVideo · Commentaries on Data AnalysisVideo · Introduction to Peer Assessment 2Peer Review · Course Project 2Reading · PostCourse Survey

COURSE 6
Statistical Inference
 Subtitles
 English
About the Course
Statistical inference is the process of drawing conclusions about populations or scientific truths from data. There are many modes of performing inference including statistical modeling, data oriented strategies and explicit use of designs and randomization in analyses. Furthermore, there are broad theories (frequentists, Bayesian, likelihood, design based, …) and numerous complexities (missing data, observed and unobserved confounding, biases) for performing inference. A practitioner can often be left in a debilitating maze of techniques, philosophies and nuance. This course presents the fundamentals of inference in a practical approach for getting things done. After taking this course, students will understand the broad directions of statistical inference and use this information for making informed choices in analyzing data.
You can choose to take this course only. Learn more.
WEEK 1Week 1: Probability & Expected ValuesThis week, we’ll focus on the fundamentals including probability, random variables, expectations and more.Video · Introductory videoReading · Welcome to Statistical InferenceReading · Some introductory commentsReading · PreCourse SurveyReading · SyllabusReading · Course Book: Statistical Inference for Data ScienceReading · Data Science Specialization Community SiteReading · Homework ProblemsReading · ProbabilityVideo · 02 01 Introduction to probabilityVideo · 02 02 Probability mass functionsVideo · 02 03 Probability density functionsReading · Conditional probabilityVideo · 03 01 Conditional ProbabilityVideo · 03 02 Bayes’ ruleVideo · 03 03 IndependenceReading · Expected valuesVideo · 04 01 Expected valuesVideo · 04 02 Expected values, simple examplesVideo · 04 03 Expected values for PDFsReading · Practical R Exercises in swirl 1Practice Programming Assignment · swirl Lesson 1: IntroductionPractice Programming Assignment · swirl Lesson 2: Probability1Practice Programming Assignment · swirl Lesson 3: Probability2Practice Programming Assignment · swirl Lesson 4: ConditionalProbabilityPractice Programming Assignment · swirl Lesson 5: ExpectationsQuiz · Quiz 1
WEEK 2Week 2: Variability, Distribution, & AsymptoticsWe’re going to tackle variability, distributions, limits, and confidence intervals.Reading · VariabilityVideo · 05 01 Introduction to variabilityVideo · 05 02 Variance simulation examplesVideo · 05 03 Standard error of the meanVideo · 05 04 Variance data exampleReading · DistributionsVideo · 06 01 Binomial distrubtionVideo · 06 02 Normal distributionVideo · 06 03 PoissonReading · AsymptoticsVideo · 07 01 Asymptotics and LLNVideo · 07 02 Asymptotics and the CLTVideo · 07 03 Asymptotics and confidence intervalsReading · Practical R Exercises in swirl Part 2Practice Programming Assignment · swirl Lesson 1: VariancePractice Programming Assignment · swirl Lesson 2: CommonDistrosPractice Programming Assignment · swirl Lesson 3: AsymptoticsQuiz · Quiz 2
WEEK 3Week: Intervals, Testing, & PvaluesWe will be taking a look at intervals, testing, and pvalues in this lesson.Reading · Confidence intervalsVideo · 08 01 T confidence intervalsVideo · 08 02 T confidence intervals exampleVideo · 08 03 Independent group T intervalsVideo · 08 04 A note on unequal varianceReading · Hypothesis testingVideo · 09 01 Hypothesis testingVideo · 09 02 Example of choosing a rejection regionVideo · 09 03 T testsVideo · 09 04 Two group testingReading · PvaluesVideo · 10 01 PvaluesVideo · 10 02 Pvalue further examplesReading · KnitrVideo · Just enough knitr to do the projectReading · Practical R Exercises in swirl Part 3Practice Programming Assignment · swirl Lesson 1: T Confidence IntervalsPractice Programming Assignment · swirl Lesson 2: Hypothesis TestingPractice Programming Assignment · swirl Lesson 3: P ValuesQuiz · Quiz 3
WEEK 4Week 4: Power, Bootstrapping, & Permutation TestsWe will begin looking into power, bootstrapping, and permutation tests.Reading · PowerVideo · 11 01 PowerVideo · 11 02 Calculating PowerVideo · 11 03 Notes on powerVideo · 11 04 T test powerVideo · 12 01 Multiple ComparisonsReading · ResamplingVideo · 13 01 BootstrappingVideo · 13 02 Bootstrapping exampleVideo · 13 03 Notes on the bootstrapVideo · 13 04 Permutation testsQuiz · Quiz 4Peer Review · Statistical Inference Course ProjectReading · Practical R Exercises in swirl Part 4Practice Programming Assignment · swirl Lesson 1: PowerPractice Programming Assignment · swirl Lesson 2: Multiple TestingPractice Programming Assignment · swirl Lesson 3: ResamplingReading · PostCourse Survey

COURSE 7
Regression Models
 Subtitles
 English
About the Course
Linear models, as their name implies, relates an outcome to a set of predictors of interest using linear assumptions. Regression models, a subset of linear models, are the most important statistical analysis tool in a data scientist’s toolkit. This course covers regression analysis, least squares and inference using regression models. Special cases of the regression model, ANOVA and ANCOVA will be covered as well. Analysis of residuals and variability will be investigated. The course will cover modern thinking on model selection and novel uses of regression models including scatterplot smoothing.
You can choose to take this course only. Learn more.
WEEK 1Week 1: Least Squares and Linear RegressionThis week, we focus on least squares and linear regression.Reading · Welcome to Regression ModelsReading · Book: Regression Models for Data Science in RReading · SyllabusReading · PreCourse SurveyReading · Data Science Specialization Community SiteReading · Where to get more advanced materialReading · RegressionVideo · Introduction to RegressionVideo · Introduction: Basic Least SquaresReading · Technical detailsVideo · Technical Details (Skip if you’d like)Video · Introductory Data ExampleReading · Least squaresVideo · Notation and BackgroundVideo · Linear Least SquaresVideo · Linear Least Squares Coding ExampleVideo · Technical Details (Skip if you’d like)Reading · Regression to the meanVideo · Regression to the MeanReading · Practical R Exercises in swirl Part 1Practice Programming Assignment · swirl Lesson 1: IntroductionPractice Programming Assignment · swirl Lesson 2: ResidualsPractice Programming Assignment · swirl Lesson 3: Least Squares EstimationQuiz · Quiz 1
WEEK 2Week 2: Linear Regression & Multivariable RegressionThis week, we will work through the remainder of linear regression and then turn to the first part of multivariable regression.Reading · *Statistical* linear regression modelsVideo · Statistical Linear Regression ModelsVideo · Interpreting CoefficientsVideo · Linear Regression for PredictionReading · ResidualsVideo · ResidualsVideo · Residuals, Coding ExampleVideo · Residual VarianceReading · Inference in regressionVideo · Inference in RegressionVideo · Coding ExampleVideo · PredictionReading · Looking ahead to the projectVideo · Really, really quick intro to knitrReading · Practical R Exercises in swirl Part 2Practice Programming Assignment · swirl Lesson 1: Residual VariationPractice Programming Assignment · swirl Lesson 2: Introduction to Multivariable RegressionPractice Programming Assignment · swirl Lesson 3: MultiVar ExamplesQuiz · Quiz 2
WEEK 3Week 3: Multivariable Regression, Residuals, & DiagnosticsThis week, we’ll build on last week’s introduction to multivariable regression with some examples and then cover residuals, diagnostics, variance inflation, and model comparison.Reading · Multivariable regressionVideo · Multivariable Regression part IVideo · Multivariable Regression part IIVideo · Multivariable Regression ContinuedVideo · Multivariable Regression Examples part IVideo · Multivariable Regression Examples part IIVideo · Multivariable Regression Examples part IIIVideo · Multivariable Regression Examples part IVReading · AdjustmentVideo · Adjustment ExamplesReading · ResidualsVideo · Residuals and Diagnostics part IVideo · Residuals and Diagnostics part IIVideo · Residuals and Diagnostics part IIIReading · Model selectionVideo · Model Selection part IVideo · Model Selection part IIVideo · Model Selection part IIIReading · Practical R Exercises in swirl Part 3Practice Programming Assignment · swirl Lesson 1: MultiVar Examples2Practice Programming Assignment · swirl Lesson 2: MultiVar Examples3Practice Programming Assignment · swirl Lesson 3: Residuals Diagnostics and VariationQuiz · Quiz 3Practice Quiz · (OPTIONAL) Data analysis practice with immediate feedback (NEW! 10/18/2017)
WEEK 4Week 4: Logistic Regression and Poisson RegressionThis week, we will work on generalized linear models, including binary outcomes and Poisson regression.Reading · GLMsVideo · GLMsReading · Logistic regressionVideo · Logistic Regression part IVideo · Logistic Regression part IIVideo · Logistic Regression part IIIReading · Count DataVideo · Poisson Regression part IVideo · Poisson Regression part IIReading · MishmashVideo · HodgepodgeReading · Practical R Exercises in swirl Part 4Practice Programming Assignment · swirl Lesson 1: Variance Inflation FactorsPractice Programming Assignment · swirl Lesson 2: Overfitting and UnderfittingPractice Programming Assignment · swirl Lesson 3: Binary OutcomesPractice Programming Assignment · swirl Lesson 4: Count OutcomesQuiz · Quiz 4Peer Review · Regression Models Course ProjectReading · PostCourse Survey

COURSE 8
Practical Machine Learning
 Subtitles
 English
About the Course
One of the most common tasks performed by data scientists and data analysts are prediction and machine learning. This course will cover the basic components of building and applying prediction functions with an emphasis on practical applications. The course will provide basic grounding in concepts such as training and tests sets, overfitting, and error rates. The course will also introduce a range of model based and algorithmic machine learning methods including regression, classification trees, Naive Bayes, and random forests. The course will cover the complete process of building prediction functions including data collection, feature creation, algorithms, and evaluation.
You can choose to take this course only. Learn more.
WEEK 1Week 1: Prediction, Errors, and Cross ValidationThis week will cover prediction, relative importance of steps, errors, and cross validation.Reading · Welcome to Practical Machine LearningReading · SyllabusReading · PreCourse SurveyVideo · Prediction motivationVideo · What is prediction?Video · Relative importance of stepsVideo · In and out of sample errorsVideo · Prediction study designVideo · Types of errorsVideo · Receiver Operating CharacteristicVideo · Cross validationVideo · What data should you use?Quiz · Quiz 1
WEEK 2Week 2: The Caret PackageThis week will introduce the caret package, tools for creating features and preprocessing.Video · Caret packageVideo · Data slicingVideo · Training optionsVideo · Plotting predictorsVideo · Basic preprocessingVideo · Covariate creationVideo · Preprocessing with principal components analysisVideo · Predicting with RegressionVideo · Predicting with Regression Multiple CovariatesQuiz · Quiz 2
WEEK 3Week 3: Predicting with trees, Random Forests, & Model Based PredictionsThis week we introduce a number of machine learning algorithms you can use to complete your course project.Video · Predicting with treesVideo · BaggingVideo · Random ForestsVideo · BoostingVideo · Model Based PredictionQuiz · Quiz 3
WEEK 4Week 4: Regularized Regression and Combining PredictorsThis week, we will cover regularized regression and combining predictors.Video · Regularized regressionVideo · Combining predictorsVideo · ForecastingVideo · Unsupervised PredictionQuiz · Quiz 4Reading · Course Project Instructions (READ FIRST)Peer Review · Prediction Assignment WriteupQuiz · Course Project Prediction QuizReading · PostCourse Survey

COURSE 9
Developing Data Products
 Subtitles
 English
About the Course
A data product is the production output from a statistical analysis. Data products automate complex analysis tasks or use technology to expand the utility of a data informed model, algorithm or inference. This course covers the basics of creating data products using Shiny, R packages, and interactive graphics. The course will focus on the statistical fundamentals of creating a data product that can be used to tell a story about data to a mass audience.
You can choose to take this course only. Learn more.
WEEK 1Course OverviewIn this overview module, we’ll go over some information and resources to help you get started and succeed in the course.Video · Welcome to Developing Data ProductsReading · SyllabusReading · WelcomeReading · Book: Developing Data Products in RReading · Community SiteReading · R and RStudio Links & Tutorials
Shiny, GoogleVis, and PlotlyNow we can turn to the first substantive lessons. In this module, you’ll learn how to develop basic applications and interactive graphics in shiny, compose interactive HTML graphics with GoogleVis, and prepare data visualizations with Plotly.Reading · ShinyReading · Shinyapps.io ProjectVideo · Shiny 1.1Video · Shiny 1.2Video · Shiny 1.3Video · Shiny 1.4Video · Shiny 1.5Video · Shiny 2.1Video · Shiny 2.2Video · Shiny 2.3Video · Shiny 2.4Video · Shiny 2.5Video · Shiny 2.6Video · Shiny Gadgets 1.1Video · Shiny Gadgets 1.2Video · Shiny Gadgets 1.3Video · GoogleVis 1.1Video · GoogleVis 1.2Video · Plotly 1.1Video · Plotly 1.2Video · Plotly 1.3Video · Plotly 1.4Video · Plotly 1.5Video · Plotly 1.6Video · Plotly 1.7Video · Plotly 1.8Quiz · Quiz 1
WEEK 2R Markdown and LeafletDuring this module, we’ll learn how to create R Markdown files and embed R code in an Rmd. We’ll also explore Leaflet and use it to create interactive annotated maps.Video · R Markdown 1.1Video · R Markdown 1.2Video · R Markdown 1.3Video · R Markdown 1.4Video · R Markdown 1.5Video · R Markdown 1.6Reading · Three Ways to Share R Markdown ProductsVideo · Leaflet 1.1Video · Leaflet 1.2Video · Leaflet 1.3Video · Leaflet 1.4Video · Leaflet 1.5Video · Leaflet 1.6Quiz · Quiz 2Peer Review · R Markdown and Leaflet
WEEK 3R PackagesIn this module, we’ll dive into the world of creating R packages and practice developing an R Markdown presentation that includes a data visualization built using Plotly.Reading · R PackagesVideo · R Packages (Part 1)Video · R Packages (Part 2)Video · Building R Packages DemoVideo · R Classes and Methods (Part 1)Video · R Classes and Methods (Part 2)Quiz · Quiz 3Peer Review · R Markdown Presentation & Plotly
WEEK 4Swirl and Course ProjectWeek 4 is all about the Course Project, producing a Shiny Application and reproducible pitch.Video · Swirl 1.1Video · Swirl 1.2Video · Swirl 1.3Peer Review · Course Project: Shiny Application and Reproducible PitchReading · PostCourse Survey

COURSE 10
Data Science Capstone
 Commitment
 49 hours/week
 Subtitles
 English
About the Capstone Project
The capstone project class will allow students to create a usable/public data product that can be used to show your skills to potential employers. Projects will be drawn from realworld problems and will be conducted with industry, government, and academic partners.
You can choose to take this course only. Learn more.
WEEK 1Overview, Understanding the Problem, and Getting the DataThis week, we introduce the project so you can get a clear grip on the problem at hand and begin working with the dataset.Video · Welcome to the Capstone ProjectReading · Project OverviewVideo · Welcome from SwiftKeyVideo · You Are a Data Scientist NowReading · SyllabusVideo · Introduction to Task 0: Understanding the ProblemReading · Task 0 – Understanding the problemReading · About the CoporaVideo · Introduction to Task 1: Getting and Cleaning the DataReading · Task 1 – Getting and cleaning the dataVideo · Regular Expressions: Part 1 (Optional)Video · Regular Expressions: Part 2 (Optional)Quiz · Quiz 1: Getting Started
WEEK 2Exploratory Data Analysis and ModelingThis week, we move on to the next tasks, exploratory data analysis and modeling. You’ll also submit your milestone report and review submissions from your classmates.Video · Introduction to Task 2: Exploratory Data AnalysisReading · Task 2 – Exploratory Data AnalysisVideo · Introduction to Task 3: ModelingReading · Task 3 – ModelingPeer Review · Milestone Report
WEEK 3Prediction ModelThis week, you’ll build and evaluate your prediction model. The goal is to make your model efficient and accurate.Video · Introduction to Task 4: Prediction ModelReading · Task 4 – Prediction ModelQuiz · Quiz 2: Natural language processing I
WEEK 4Creative ExplorationThis week’s goal is to improve the predictive accuracy while reducing computational runtime and model complexity.Video · Introduction to Task 5: Creative ExplorationReading · Task 5 – Creative ExplorationQuiz · Quiz 3: Natural language processing II
WEEK 5Data ProductThis week, you’ll work on developing the first component of your final project, your data product.Video · Introduction to Task 6: Data ProductReading · Task 6 – Data Product
WEEK 6Slide DeckThis week, you’ll work on developing the second component of your final project, a slide deck to accompany your data product.Video · Introduction to Task 7: Slide DeckReading · Task 7 – Slide Deck
WEEK 7Final Project Submission and EvaluationThis week, you’ll submit your final project and review the work of your classmates.Peer Review · Final Project SubmissionVideo · Congratulations!
Creators
Johns Hopkins University is recognized as a destination for excellent, ambitious scholars and a world leader in teaching and research. The mission of The Johns Hopkins University is to educate its students and cultivate their capacity for lifelong learning, to foster independent and original research, and to bring the benefits of discovery to the world.
The mission of The Johns Hopkins University is to educate its students and cultivate their capacity for lifelong learning, to foster independent and original research, and to bring the benefits of discovery to the world.
Ad: Yes, I will not charge you a single dime for setting up your blog based on WordPress. Click here for more information
Ad: Get a mobile app based on your website. Get it published under Google Play and Apple app store in no time! Get more visitors towards your business. Click here for more information
Recent Comments