Statistics for Data Science Online Course
This course equips you with essential statistical skills that will help you make sense of complex data and apply statistical analysis to real-world scenarios. While modern software and programming tools automate much of this work, the true value of this course lies in developing your critical thinking and analytical reasoning.
You’ll gain a solid foundation in statistics, including how to handle different data types, calculate correlation and covariance, and interpret statistical results with confidence.
As data-driven decision-making becomes increasingly important across industries, careers in data science are rapidly growing in demand. With businesses recognizing the power of leveraging their data, the need for professionals with strong statistical insight will only continue to rise.
Course Curriculum
Introduction
- What does the Course Cover?
Sample or population data?
- Understanding the difference between a population and a sample
The fundamentals of descriptive statistics
- The various types of data we can work with
- Levels of measurement
- Categorical variables. Visualization techniques for categorical variables
- Numerical variables. Using a frequency distribution table
- Histogram charts
- Cross tables and scatter plots
Measures of central tendency, asymmetry, and variability
- The main measures of central tendency: mean, median, mode
- Measuring skewness
- Measuring how data is spread out: calculating variance
- Standard deviation and coefficient of variation
- Calculating and understanding covariance
- The correlation coefficient
Practical example: descriptive statistics
- Practical example
Distributions
- Introduction to inferential statistics
- What is a distribution?
- The Normal distribution
- The standard normal distribution
- Understanding the central limit theorem
- Standard error
Estimators and estimates
- Working with estimators and estimates
- Confidence intervals - an invaluable tool for decision making
- Calculating confidence intervals within a population with a known variance
- Student’s T distribution
- Calculating confidence intervals within a population with an unknown variance
- What is a margin of error and why is it important in Statistics?
Confidence intervals: advanced topics
- Calculating confidence intervals for two means with dependent samples
- Calculating confidence intervals for two means with independent samples (part 1)
- Calculating confidence intervals for two means with independent samples (part 2)
- Calculating confidence intervals for two means with independent samples (part 3)
Practical example: inferential statistics
- Practical example: inferential statistics
Hypothesis testing: Introduction
- The null and the alternative hypothesis
- Establishing a rejection region and a significance level
- Type I error vs Type II error
Hypothesis testing: Let's start testing!
- Test for the mean. Population variance known
- What is the p-value and why is it one of the most useful tool for statisticians?
- Test for the mean. Population variance unknown
- Test for the mean. Dependent samples
- Test for the mean. Independent samples (Part 1)
- Test for the mean. Independent samples (Part 2)
Practical example: hypothesis testing
- Practical example: hypothesis testing
The fundamentals of regression analysis
- Introduction to regression analysis
- Correlation and causation
- The linear regression model made easy
- What is the difference between correlation and regression?
- A geometrical representation of the linear regression model
- A practical example - Reinforced learning
Subtleties of regression analysis
- Decomposing the linear regression model - understanding its nuts and bolts
- What is R- squared and how does it help us?
- The ordinary least squares setting and its practical applications
- Studying regression tables
- The multiple linear regression model
- Adjusted R-squared
- What does the F-statistic show us and why we need to understand it?
Assumptions for linear regression analysis
- OLS assumptions
- A1. Linearity
- A2. No endogeneity
- A3. Normality and homoscedasticity
- A4. No autocorrelation
- A5. No multicollinearity
Dealing with categorical data
- Dummy variables
Practical example: regression analysis
- Practical example: regression analysis