Data Analysis with Python | Coursera Quiz Answers

Data Analysis with Python | Coursera

Graded Quiz: Importing Data Sets

1. What Python library is primarily used for machine learning?

Numpy
matplotlib
pandas
scikit-learn

2. We have the list headers_list:

headers_list=['A','B','C']

We also have the data frame df that contains three columns. What syntax should you use to replace the headers of the data frame df with values in the list headers_list?

df.tail() = headers_list
df.tail(headers_list)
df.columns = headers_list
df.head(headers_list)

3. What task does the following command perform?

df = pandas.read_csv("A.csv")

Changes the name of the column in ‘df’ to the ones as in "A.csv"
Loads the data from a CSV file called "A.csv" into a data frame ‘df’
Displays the contents of the CSV file
Saves the data frame df to a CSV file called "A.csv"

4. Consider the segment of the following data frame:

What is the type of attribute “make”?

string
object
int64
float64

5. How do you generate descriptive statistics for all the columns for the data frame df?

df.describe()
df.info
df.describe(include = "all")
df.statistics(include = “all”)

Graded Quiz: Data Wrangling

1. Which of the following methods should you use to replace a missing value of an attribute with continuous values?

Use the average of the other values in the column
Use an educated guess
Use the mean square error of the other data in the column
Use the difference between the minimum and maximum values of the other data in the column

2. Which of the following helps you decide on bin values when pre-processing data?

Convert objects to ints
Use the interquartile range
Divide the average by the standard deviation
Visualize the distribution using a histogram

3. Which of the following data types should numbers with decimals be if you want to use them as input for training a statistical model?

666, 1.1, 232, 23.12

int
float
data frame
object

4. Which of the following is the primary purpose of simple feature scaling?

It brings data into a common standard of expression
To get rid of “not a number” or NaN values
To make comparing and analyzing values easier.
So all the variables have a similar influence on the models you build

5. Which of the following is the primary purpose of the get_dummies() method?

To help you group your data into bins
Converts numerical values into categorical ones
Converts categorical values into numerical ones
Converts the data’s data type

Graded Quiz: Exploratory Data Analysis

1. What method provides summary statistics of a data frame?

describe()
head()
tail()
summary()

2. As the Pearson Correlation value nears zero, then ...

It indicates that two variables are not correlated
It indicates minimal deviation in a variable's values from the mean
It indicates the mean of the data is near zero
It indicates uncertainty about the correlation between two variables

3. What range of Pearson Coefficient ‘p’ is considered too high to support any certainty about the correlation of variables?

p < 0.001
0.001 < p < 0.05
0.05 < p < 0.1
p > 0.1

4. Consider the following data frame:

df_test = df[['body-style,' 'price']]

The following operation is applied:

df_grp = df_test.groupby(['body-style'], as_index=False).mean()

What are the resulting values of: df_grp[‘price’]?

It averages the body-style variable data values.
The average price
It writes the mean value of each body style price to the data frame.
It averages the price for each body style

5. What is the Pearson Correlation between two variables if the input variable is equal to the output variable?

Between -1 and 0
1
Between 0 and 1
-1

Graded Quiz: Model Development

1. What does the following line of code do?

lm = LinearRegression()

Predicts output values of a linear regression object.
Creates a linear regression object and stores it in the lm variable.
Assigns a linear regression model to the lm variable.
Fits a regression object to the variable lm.

2. What steps do the following lines of code perform?

  Input = [('scale',StandardScaler()),('model',LinearRegression())]
  pipe = Pipeline(Input)
  pipe.fit(Z,y)
  ypipe = pipe.predict(Z)

Performs a polynomial transform on the features Z
Finds the correlation between Z and y
Calculates the Coefficient of Determination
Performs a prediction using a linear regression model

3. If X is a data frame with 100 rows and 5 columns, and y is the target with 100 samples, and assuming you have imported all the relevant libraries and data have, and executed the following line of code:

  LR = LinearRegression()
  LR.fit(X, y)
  yhat = LR.predict(X)

How many samples does yhat contain?

500
20
100
5

4. Which statement about R2, the coefficient of determination, is true?

Its value can be either 0 or 1.
Its value can be in the range of -1 to 1, inclusive.
Its value can be any positive number.
Its value can be between 0 and 1 inclusive.

5. Consider the following equation:

y=b0+b1x

The variable y is _________?

The predictor or independent variable
The target or dependent variable
The intercept
The degree of the polynomial

Graded Quiz: Model Evaluation and Refinement

1. What is the result of the following code?

cross_val_predict (lr2e, x_data, y_data, cv=3)

The average R2 on the test data for each of the two folds
Calculates the free parameter alpha
Performs multiple out-of-sample evaluations
The predicted values of the test data using cross-validation

2. How would you organize the values 1, 10, and 100 as possible values of alpha for Grid Search?

parameter = [{'alpha': [1,10,100]}]
parameter = alpha(1,10,100)
parameter=[1,10,100]
parameter = Ridge(alpha=[1,10,100])

3. You do the following steps with a data set:

Divide a data set into testing and training sets.
Create a linear model with the training set.
Find the average R2 value on your training data. It is found to be 0.5.
Perform a 100th-order polynomial transform on your data.
Use these transformed values to train another model.
Find the new value for R2. It is found to be 0.99.

Which of the following statements is correct?

You should use the simpler model
Create another linear model with all of the data and compare results
100-th order polynomial will work better on the rest of your data
You should use your test data to test the model further

4. What is the purpose of “folding” your data sets?

Folds are used for cross-validation
To find R2 values on a training set and a test set of data
Folding is used primarily for polynomial transformations
To find the actual predicted values of the model before calculating R2

5. In the following image, the blue curve represents a model, the blue dots represent the data, and the orange curve represents the true function. Which of the following is true about the model?

It displays overfitting
No conclusions can be drawn about the model
The model is a good fit
It displays underfitting

Final Exam

1. When looking at a CSV file, what character separates each value?

A tab
An apostrophe
An equal sign
A comma

2. What Python library is used for statistical modeling, including regression and classification?

Scikit-learn
Matplotlib
Jupyter
Numpy

3. In order to read data using the Python Pandas package, what are the two most important factors?

File types and encoding scheme
Format and file path
File types and format
Encoding scheme and file path

4. For a Pandas data frame, what does the attribute “dtypes” return?

It returns the data type of the object
It returns the data types of each column
It returns the first five rows of the data frame
It returns the last five rows of the data frame

5. In a data set, what term refers to the column name?

Title
Row
Type
Header

6. The Matplotlib library is mostly used for what?

Statistical modeling
Machine learning algorithms
Data analysis
Data visualization

7. What is the output of the following code segment of the data frame df?

df.tail(10)

It returns the first 10 rows of the data frame
It returns the header of the data frame
It returns the last 10 rows of the data frame
It returns all of the rows of the data frame

8. What is the dropna() method used for?

Dropping missing values
Replacing missing values
Dropping specified values
Identifying missing values

9. Which type of plot is binning best suited to graph?

Scatter plot
Histogram
Line plot
Box plot

10. What is the primary purpose of standardizing a set of values?

So you can see the spread of the data set and identify outliers.
It places different variables on the same scale, allowing you to compare them more easily.
To see how many standard deviations each value is from the mean.
To find how well a data set fits a model.

11. What is it called when you subtract the mean from the values in a data set and divide by the standard deviation?

Data standardization
Min-max method
Binning
One-hot encoding

12. What function can you use to replace values in a column of a data frame?

exchange()
header()
rename()
replace()

13. What does a negative linear relationship between an input variable and an output variable imply?

That as the input increases, the output decreases at an ever-increasing rate.
That as the input increases, the output decreases at about the same rate.
The output does not adequately explain the input.
That as the input increases, the output increases at about the same rate.

14. What is the interquartile range of a data set?

The data between the upper and lower quartiles represents the interquartile range.
The middle of the data
The difference in the range of values in the uppermost quartile with the range of values in the lower-most quartile
The range of the data, split into four equal-sized groups

15. If the predicted function is:

y = b0+b1x

The method is:

Polynomial Regression
Exponential Regression
Linear regression
Multiple Linear Regression

16. What is a model estimator?

A mathematical equation that can be used to predict values not in the data set
The estimate of the output value of a data set given an input value
The mean, mode, median, and standard deviation of a data set
The descriptive statistics of a data set

17. How are residuals calculated?

x−x
y−y
b0+b1x
y−y

18. Which statement is true about overfitting?

The model is too flexible and fits the noise rather than the function.
If the model is noisy, you need a low-order polynomial so you don’t overfit the data.
The higher the order of the polynomial, the less overfitting occurs.
If a model is overfit with the training data it will also overfit the testing data.

19. Say you have several differently ordered polynomial models. Which of the following statistics will best help you decide which model to use?

Coefficient of determination
Mean-squared error
Alpha
Correlation coefficient

20.Say What does the GridSearchCV() method do?

It iterates over hyperparameters using cross-validation.
It gives you R2 values for different orders of polynomial models.
It’s another way to cross-validate your data set.
It selects the appropriate hyperparameters for your model.

TeamsCloud

Data Analysis with Python | Coursera Quiz Answers

Graded Quiz: Importing Data Sets

Graded Quiz: Data Wrangling

Graded Quiz: Exploratory Data Analysis

Graded Quiz: Model Development

Graded Quiz: Model Evaluation and Refinement

Final Exam

Post a Comment

Introduction to Project Management

IBM Data Analyst Capstone Project | Coursera Quiz Answers

Coursera IBM Professional Certificates

Coursera Meta Professional Certificates

Using Alibaba Cloud Elasticsearch for Log Monitoring(Exam) Answers

TeamsCloud