Supervised Machine Learning: Regression | Coursera (IBM) |
This course serves as an introduction to Regression, one of the core modeling families in supervised Machine Learning. You will explore how to train regression models to predict continuous outcomes and learn to use error metrics to evaluate and compare different models. Additionally, the course covers best practices such as the application of train-test splits and various regularization techniques.
Upon completion, you will be able to distinguish between the uses and applications of classification and regression in supervised machine learning. You will also learn to describe and implement linear regression models, employ a range of error metrics to choose the most appropriate linear regression model for your data, and understand the role of regularization in preventing overfitting. The course will also cover regularization techniques including Ridge, LASSO, and Elastic Net.
This course is designed for aspiring data scientists who seek practical experience with Supervised Machine Learning Regression techniques in a business context. To benefit fully from the course, you should have experience with programming in a Python development environment and a basic understanding of Data Cleaning, Exploratory Data Analysis, Calculus, Linear Algebra, Probability, and Statistics.
Notice!
Always refer to the module on your course for the most accurate and up-to-date information.
Attention!
If you have any questions that are not covered in this post, please feel free to leave them in the comments section below. Thank you for your engagement.
WEEK 1 QUIZ
1. You can use supervised machine learning for all of the following examples, EXCEPT:- Segment customers by their demographics.
- Supervised learning
- Supervised Machine Learning
- Develop multiple models.
- LR.predict(X_test)
- ROC index
- Observations Features Parameters Answer: None of the above
- Machine Learning is automated and requires no programming
- The Sum Squared Error measures the distance between the truth and predicted values.
10. When learning about regression we saw the outcome as a continuous number. Given the below options what is an example of regression?
- Housing prices
WEEK 2 QUIZ
1. The main purpose of splitting your data into a training and test sets is:- To avoid overfitting
- Measure error and performance of the model
- Data leakage
- Linear Regression
- Fit the actual model and learn the parameters
- Non-linear effects.
- Prediction and Interpretation.
- Unseen data
- X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4)
- polyFeat = PolynomialFeatures(degree=3)
WEEK 3 QUIZ
1. In K-fold cross-validation, how will increasing k affect the variance (across subsamples) of estimated model parameters?- Increasing k will usually increase the variance of estimated model parameters.
- Each of the k subsamples in K-fold cross-validation is used as a test set
- K-fold cross-validation will still lead to underfitting, for any k
A high variance of parameter estimates across cross-validation subsamples indicates likely overfitting.
5. Reviewing the below graph, what is the model considered when associated with the left side of this curve before hitting the plateau?
- Underfitting
- Overfitting
- 'cross_val_predict'
- Cross-validation is an essential step in hyperparameter tuning.
- We can manually generate folds by using the KFold function.
- ANSWER - ALL THE ABOVE
- GridSearchCV scans over a dictionary of parameters.
- GridSearchCV finds the hyperparameter set that has the best out-of-sample score.
- GridSearchCV retrains on all data with the "best" hyperparameters.
- ANSWER - ALL THE ABOVE
- KFold and StratifiedKFold
WEEK 4 QUIZ
1. Which of the following statements about model complexity is TRUE?- Higher model complexity leads to a higher chance of overfitting.
- Underfitting is characterized by higher errors in both training and test samples.
- Regularization decreases the likelihood of overfitting relative to training data.
- The larger a feature’s scale, the more likely its estimated impact will be influenced by regularization.
- Ridge
- Elastic Net combines L1 and L2 regularization.
- Add a term to the loss function proportional to a regularization parameter.
- Less likely to set feature coefficients to zero.
- It enforces the coefficients to be lower, but not 0
- It minimizes irrelevant features
- It penalizes the size magnitude of the regression coefficients by adding a squared term
- ANSWER: All of the above
- Only LassoCV use L1 regularization function.
WEEK 5 QUIZ
1. When working with regularization, what is the view that illuminates the actual optimization problem and shows why LASSO generally zeros out coefficients?- Answer: Geometric view
- Answer: Probabilistic view
- Answer: Analytical view
- Answer: Features should rarely or never be scaled prior to implementing regularization.
- Answer: The cost function minimum
- Answer: Regularization imposes certain priors on the regression coefficients.
- Answer: Analytic View
- Answer: Higher lambda decreases variance, means smaller coefficients.
- Answer: All of the above (We can derive the posterior probability by knowing the probability of the target and the prior distribution, The prior distribution is derived from independent draws of a prior coefficient density function that we choose when regularizing, L2 (ridge) regularization imposes a Gaussian prior on the coefficients, while L1 (lasso) regularization imposes a Laplacian prior.)
- Answer: We reduce the complexity of the model by minimizing the error on our training set.