![]() |
Data Analysis with R | Coursera IBM |
The R programming language is specifically designed for data analysis. It serves as a crucial tool for bridging the gap between the data-related problems you aim to solve and the answers needed to achieve your goals. This course begins with a question and guides you through the process of answering it using data. Initially, you will learn essential techniques for preparing your data for analysis. Following this, you will explore your data through exploratory data analysis, which helps you summarize your data and identify significant relationships between variables that can provide insights. Once your data is ready, you will develop a model and learn how to evaluate and refine its performance. This structured approach ensures your data analysis meets your standards and gives you confidence in the results.
In this course, you will gain practical experience by acting as a data analyst working with airline departure and arrival data to predict flight delays. Utilizing the Airline Reporting Carrier On-Time Performance Dataset, you will practice reading data files, preprocessing data, creating and improving models, and evaluating them to select the best one.
Notice!
Always refer to the module on your course for the most accurate and up-to-date information.
Attention!
If you have any questions that are not covered in this post, please feel free to leave them in the comments section below. Thank you for your engagement.
Week 01 Quiz Answers
Graded Quiz Answers
1. What is the purpose of the Data Asset eXchange?- Provides data that you can explore to conduct data analysis.
- Provides data that you can use for a small fee.
- Helps you exchange data with others.
- Provides data that is only useful for learning purposes.
- CarrierDelay
- Distance
- SecurityDelay
- ArrDelay
- Assigns a value to a variable.
- Assigns a value to a global variable.
- Combines two functions into a single operation.
- Combines multiple functions into a single operation.
- read_delim()
- read_tsv()
- read_csv()
- read_any()
- Both return a statistical summary of the data.
- Both group data by the specified variables.
- Both compute summary statistics.
- There is no similarity between the summarize() and group_by() functions.
Week 02 Quiz Answers
Graded Quiz answers
1. You want to access the “Date” column of a data frame called sales_data so you can perform an operation on it. What is the correct way to refer to this column?- sales_data%Date
- sales_data$Date
- sales_data.Date
- sales_data#Date
- drop_na()
- replace_na()
- is.na()
- drop_columns()
- dataframe %>% mutate_if(Status, sep = “-“, into = c(“error_type”, “severity_level”)
- dataframe %>% separate(Status, sep = “-“,into = c(“error_type”, “severity_level”)
- dataframe %>% mutate_all(Status, sep = “-“,into = c(“error_type”, “severity_level”)
- dataframe %>% sapply(Status, sep = “-“,into = c(“error_type”, “severity_level”)4. What are two benefits of data normalization?
- Helps you better understand data distribution.
- Brings data into a common standard of expression that allows you to make meaningful comparisons.
- Minimize the effects of outliers, which can influence the result more.
- Enables a fair comparison between the different features and making sure they have the same impact.
- Scatter plot
- Histogram
- Line chart
- Bar chart
- Reformat the categorical variable that its contents are in two or more columns.
- Convert categorical variables to dummy variables.
- Convert categorical variables to dummy variables and assign the value of another variable to each category.
- Size down three variables into one.
Week 03 Quiz Answers
Graded Quiz Answers
1. Which of the following forms of exploratory data analysis generates short summaries about the sample and measures of the data?- Correlation
- Pearson correlation
- Analysis of variance (ANOVA)
- Descriptive statistics
- Scatter plots
- Histograms
- Heatmaps
- Boxplots
- A large F-test score implies a strong correlation between variable categories and the target variable.
- A large F-test score implies a poor correlation between variable categories and the target variable.
- A small F-test score implies a strong correlation between variable categories and the target variable.
- A small F-test score implies a poor correlation between variable categories and the target variable.
- Nothing. The scatter plot alone can show the correlation completely.
- Add a correlation line.
- You should not use a scatter plot for visualizing the correlation between two variables.
- Add a regression line.
- The P value is greater than 0.1.
- The P value is less than 0.05.
- The P value is less than 0.1.
- The P value is less than 0.001.
Week 04 Quiz Answers
Graded Quiz Answers
1. In model development, you can develop more accurate models when you have which of the following?- Relevant data.
- Larger quantities of data.
- Fewer independent variables.
- More dependent variables.
- linear_model <- predict(Y ~ Z, data = new_dataset)
- linear_model <- lm(X ~ Y, data = new_dataset)
- linear_model <- lm(Y ~ X, data = new_dataset)
- linear_model <- predict(X ~ Y, data = new_dataset)
- 95%
- 100%
- 85%
- 90%
- Q-Q plot
- Residual plot
- Scale-location plot
- Regression plots
- Quadratic, meaning that the predictor variable in the model is squared.
- Cubic, meaning that the predictor variable in the model is cubed.
- Squared, meaning that the predictor variable in the model is squared.
- Simple linear regression.
- The X variable causes the Y variable to positively change 89% of the time.
- 89% of the response variable variation is explained by a linear model.
- There is a strong negative correlation between the variables.
- 89% of the response variable variation is explained by a polynomial model.
- When using a simple linear regression (SLR) model.
- When using a polynomial regression model.
- When using a multiple linear regression (MLR) model.
- This depends on your data. The model that fits the data better has the smaller MSE.
Week 05 Quiz Answers
Graded Quiz Answers
1. Which situations are helped by using the cross-validation method to train your model? Select two answers.- Working with models with small amounts of data.
- Determining if a model can be generalized for a broader group.
- Working with models with large amounts of data.
- Working with models that are underfit.
- Reduce model complexity.
- Use regularization.
- Increase model complexity.
- Reduce the number of features in the training data.
- Ridge regression penalizes the sum of the absolute values of the coefficients while Lasso regression penalizes the sum of squared coefficients.
- There is no major difference between Ridge and Lasso regression.
- Lasso regression penalizes the sum of the absolute values of the coefficients while Ridge regression penalizes the sum of squared coefficients.
- Lasso regression increases or decreases the value of Lambda to penalize complex models more or less.
- tune()
- grid_regular()
- tune_grid()
- add_model()