You have a request ? Contact Us Join Us

Data Analysis with Python | Coursera Quiz Answers

Coursera: IBM Data Analyst Professional Certificate.
Coursera: Data Analysis with Python
Data Analysis with Python | Coursera

Graded Quiz: Importing Data Sets

1. What Python library is primarily used for machine learning?
  • Numpy
  • matplotlib
  • pandas
  • scikit-learn
2. We have the list headers_list:
headers_list=['A','B','C']
We also have the data frame df that contains three columns. What syntax should you use to replace the headers of the data frame df with values in the list headers_list?
  • df.tail() = headers_list
  • df.tail(headers_list)
  • df.columns = headers_list
  • df.head(headers_list)
3. What task does the following command perform?
df = pandas.read_csv("A.csv")
  • Changes the name of the column in ‘df’ to the ones as in "A.csv"
  • Loads the data from a CSV file called "A.csv" into a data frame ‘df’
  • Displays the contents of the CSV file
  • Saves the data frame df to a CSV file called "A.csv"
4. Consider the segment of the following data frame:
image_title_here
What is the type of attribute “make”?
  • string
  • object
  • int64
  • float64
5. How do you generate descriptive statistics for all the columns for the data frame df?
  • df.describe()
  • df.info
  • df.describe(include = "all")
  • df.statistics(include = “all”)

Graded Quiz: Data Wrangling

1. Which of the following methods should you use to replace a missing value of an attribute with continuous values?
  • Use the average of the other values in the column
  • Use an educated guess
  • Use the mean square error of the other data in the column
  • Use the difference between the minimum and maximum values of the other data in the column
2. Which of the following helps you decide on bin values when pre-processing data?
  • Convert objects to ints
  • Use the interquartile range
  • Divide the average by the standard deviation
  • Visualize the distribution using a histogram
3. Which of the following data types should numbers with decimals be if you want to use them as input for training a statistical model?
666, 1.1, 232, 23.12
  • int
  • float
  • data frame
  • object
4. Which of the following is the primary purpose of simple feature scaling?
  • It brings data into a common standard of expression
  • To get rid of “not a number” or NaN values
  • To make comparing and analyzing values easier.
  • So all the variables have a similar influence on the models you build
5. Which of the following is the primary purpose of the get_dummies() method?
  • To help you group your data into bins
  • Converts numerical values into categorical ones
  • Converts categorical values into numerical ones
  • Converts the data’s data type

Graded Quiz: Exploratory Data Analysis

1. What method provides summary statistics of a data frame?
  • describe()
  • head()
  • tail()
  • summary()
2. As the Pearson Correlation value nears zero, then ...
  • It indicates that two variables are not correlated
  • It indicates minimal deviation in a variable's values from the mean
  • It indicates the mean of the data is near zero
  • It indicates uncertainty about the correlation between two variables
3. What range of Pearson Coefficient ‘p’ is considered too high to support any certainty about the correlation of variables?
  •  p < 0.001
  • 0.001 < p < 0.05
  • 0.05 < p < 0.1
  • p > 0.1
4. Consider the following data frame:
df_test = df[['body-style,' 'price']]
The following operation is applied:
df_grp = df_test.groupby(['body-style'], as_index=False).mean()
What are the resulting values of: df_grp[‘price’]?
  • It averages the body-style variable data values.
  • The average price
  • It writes the mean value of each body style price to the data frame.
  • It averages the price for each body style
5. What is the Pearson Correlation between two variables if the input variable is equal to the output variable?
  • Between -1 and 0
  • 1
  • Between 0 and 1
  • -1

Graded Quiz: Model Development

1. What does the following line of code do?
lm = LinearRegression()
  • Predicts output values of a linear regression object.
  • Creates a linear regression object and stores it in the lm variable.
  • Assigns a linear regression model to the lm variable.
  • Fits a regression object to the variable lm.
2. What steps do the following lines of code perform?
  Input = [('scale',StandardScaler()),('model',LinearRegression())]
  pipe = Pipeline(Input)
  pipe.fit(Z,y)
  ypipe = pipe.predict(Z)
  • Performs a polynomial transform on the features Z
  • Finds the correlation between Z and y
  • Calculates the Coefficient of Determination
  • Performs a prediction using a linear regression model
3. If X is a data frame with 100 rows and 5 columns, and y is the target with 100 samples, and assuming you have imported all the relevant libraries and data have, and executed the following line of code:
  LR = LinearRegression()
  LR.fit(X, y)
  yhat = LR.predict(X)
How many samples does yhat contain?
  • 500
  • 20
  • 100
  • 5
4. Which statement about R2, the coefficient of determination, is true?
  • Its value can be either 0 or 1.
  • Its value can be in the range of -1 to 1, inclusive.
  • Its value can be any positive number.
  • Its value can be between 0 and 1 inclusive.
5. Consider the following equation:
y=b0+b1​x
The variable y is _________?
  • The predictor or independent variable
  • The target or dependent variable
  • The intercept
  • The degree of the polynomial

Graded Quiz: Model Evaluation and Refinement

1. What is the result of the following code?
 cross_val_predict (lr2e, x_data, y_data, cv=3)
  • The average R2 on the test data for each of the two folds
  • Calculates the free parameter alpha
  • Performs multiple out-of-sample evaluations
  • The predicted values of the test data using cross-validation
2. How would you organize the values 1, 10, and 100 as possible values of alpha for Grid Search?
  • parameter = [{'alpha': [1,10,100]}]
  • parameter = alpha(1,10,100)
  • parameter=[1,10,100]
  • parameter = Ridge(alpha=[1,10,100])
3. You do the following steps with a data set:
  1. Divide a data set into testing and training sets.
  2. Create a linear model with the training set.
  3. Find the average R2 value on your training data. It is found to be 0.5.
  4. Perform a 100th-order polynomial transform on your data.
  5. Use these transformed values to train another model.
  6. Find the new value for R2. It is found to be 0.99.
Which of the following statements is correct?
  • You should use the simpler model
  • Create another linear model with all of the data and compare results
  • 100-th order polynomial will work better on the rest of your data
  • You should use your test data to test the model further
4. What is the purpose of “folding” your data sets?
  • Folds are used for cross-validation
  • To find R2 values on a training set and a test set of data
  • Folding is used primarily for polynomial transformations
  • To find the actual predicted values of the model before calculating R2
5. In the following image, the blue curve represents a model, the blue dots represent the data, and the orange curve represents the true function. Which of the following is true about the model?












  • It displays overfitting
  • No conclusions can be drawn about the model
  • The model is a good fit
  • It displays underfitting

Final Exam

1. When looking at a CSV file, what character separates each value?
  • A tab
  • An apostrophe
  • An equal sign
  • A comma
2. What Python library is used for statistical modeling, including regression and classification?
  • Scikit-learn
  • Matplotlib
  • Jupyter
  • Numpy
3. In order to read data using the Python Pandas package, what are the two most important factors?
  • File types and encoding scheme
  • Format and file path
  • File types and format
  • Encoding scheme and file path
4. For a Pandas data frame, what does the attribute “dtypes” return?
  • It returns the data type of the object
  • It returns the data types of each column
  • It returns the first five rows of the data frame
  • It returns the last five rows of the data frame
5. In a data set, what term refers to the column name?
  • Title
  • Row
  • Type
  • Header
6. The Matplotlib library is mostly used for what?
  • Statistical modeling
  • Machine learning algorithms
  • Data analysis
  • Data visualization
7. What is the output of the following code segment of the data frame df?
df.tail(10)
  • It returns the first 10 rows of the data frame
  • It returns the header of the data frame
  • It returns the last 10 rows of the data frame
  • It returns all of the rows of the data frame
8. What is the dropna() method used for?
  • Dropping missing values 
  • Replacing missing values 
  • Dropping specified values 
  • Identifying missing values
9. Which type of plot is binning best suited to graph?
  • Scatter plot
  • Histogram
  • Line plot
  • Box plot
10. What is the primary purpose of standardizing a set of values?
  • So you can see the spread of the data set and identify outliers.
  • It places different variables on the same scale, allowing you to compare them more easily.
  • To see how many standard deviations each value is from the mean.
  • To find how well a data set fits a model.
11. What is it called when you subtract the mean from the values in a data set and divide by the standard deviation?
  • Data standardization
  • Min-max method
  • Binning
  • One-hot encoding
12. What function can you use to replace values in a column of a data frame?
  • exchange()
  • header()
  • rename()
  • replace()
13. What does a negative linear relationship between an input variable and an output variable imply?
  • That as the input increases, the output decreases at an ever-increasing rate.
  • That as the input increases, the output decreases at about the same rate.
  • The output does not adequately explain the input.
  • That as the input increases, the output increases at about the same rate.
14. What is the interquartile range of a data set?
  • The data between the upper and lower quartiles represents the interquartile range.
  • The middle of the data
  • The difference in the range of values in the uppermost quartile with the range of values in the lower-most quartile
  • The range of the data, split into four equal-sized groups
15. If the predicted function is:
y = b0+b1x
The method is:
  • Polynomial Regression
  • Exponential Regression
  • Linear regression
  • Multiple Linear Regression
16. What is a model estimator?
  • A mathematical equation that can be used to predict values not in the data set
  • The estimate of the output value of a data set given an input value
  • The mean, mode, median, and standard deviation of a data set
  • The descriptive statistics of a data set
17. How are residuals calculated?
  • x−x
  • y−y
  • b0+b1​x
  • y−y
18. Which statement is true about overfitting?
  • The model is too flexible and fits the noise rather than the function.
  • If the model is noisy, you need a low-order polynomial so you don’t overfit the data.
  • The higher the order of the polynomial, the less overfitting occurs.
  • If a model is overfit with the training data it will also overfit the testing data.
19. Say you have several differently ordered polynomial models. Which of the following statistics will best help you decide which model to use?
  • Coefficient of determination
  • Mean-squared error
  • Alpha
  • Correlation coefficient
20.Say What does the GridSearchCV() method do?
  • It iterates over hyperparameters using cross-validation.
  • It gives you R2 values for different orders of polynomial models.
  • It’s another way to cross-validate your data set.
  • It selects the appropriate hyperparameters for your model.

Related Articles

Post a Comment

Cookie Consent
We serve cookies on this site to analyze traffic, remember your preferences, and optimize your experience.
Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
AdBlock Detected!
We have detected that you are using adblocking plugin in your browser.
The revenue we earn by the advertisements is used to manage this website, we request you to whitelist our website in your adblocking plugin.