Introduction to Data Engineering

Week 01 Quiz Answers

1. A modern data ecosystem includes a network of continually evolving entities. It includes:

Data sources, databases, and programming languages
Social media sources, data repositories, and APIs
Data providers, databases, and programming languages
Data sources, enterprise data repository, business stakeholders, and tools, applications, and infrastructure to manage data

2. Data Engineers work within the data ecosystem to:

Develop and maintain data architectures
Analyze data for actionable insights
Analyze data for deriving insights
Provide business intelligence solutions by monitoring data on different business functions

3. The goal of data engineering is to make quality data available for fact-finding and decision-making. Which one of these statements captures the process of data engineering?

Processing data and making it available to users securely
Collecting, processing, and storing data
Collecting, processing, and making data available to users securely
Collecting, processing, storing, and making data available to users securely

4. Data extracted from disparate sources can be stored in:

Data Lakes only
Databases only
Databases, data warehouses, data lakes, or any other type of data repository
Data Warehouses only

5. From the provided list, select the three emerging technologies that are shaping today’s data ecosystem.

Cloud Computing, Machine Learning, and Big Data
Big Data, Internet of Things, and Dashboarding
Machine Language, Cloud Computing, and Internet of Things
Cloud Computing, Internet of Things, and Dashboarding

Graded Quiz 02 Answers

1. Which one of these functional skills is essential to the role of a Data Engineer?

Inspect analytics-ready data for deriving insights
Proficiency in Mathematics
The ability to work with the software development lifecycle
Proficiency in working with ETL Tools

2. Oracle Exadata, IBM Db2 Warehouse on Cloud, IBM Netezza Performance Server, and Amazon RedShift are some of the popular __________________ in use today.

NoSQL Databases
Data Warehouses
Big Data Platforms
ETL Tools

3. Data Engineers manage the infrastructure required for the ingestion, processing, and storage of data.

True
False

Week 02 Quiz Answers

Graded Quiz 01 Answers

1. There are two main types of data repositories – Transactional and Analytical. For high-volume day-to-day operational data such as banking transactions, Transactional, or OLTP, systems are the ideal choice.

True
False

2. Which of the following is an example of unstructured data?

Zipped files
XML
Spreadsheets
Video and Audio files

3. Which one of these file formats is independent of software, hardware, and operating systems, and can be viewed the same way on any device?

XML
XLSX
Delimited text file
PDF

4. Which data source can return data in plain text, XML, HTML, or JSON among others?

APIs
XML
Delimited text file
PDF

5. In the data engineer’s ecosystem, languages are classified by type. What are shell and scripting languages most commonly used for?

Manipulating data
Automating repetitive operational tasks
Building apps
Querying data

Graded Quiz 02 Answers

1. Data Marts and Data Warehouses have typically been relational, but the emergence of what technology has helped to let these be used for non-relational data?

NoSQL
SQL
Data Lake
ETL

2. What is one of the most significant advantages of an RDBMS?

Enforces a limit on the length of data fields
Can store only structured data
Is ACID-Compliant
Requires source and destination tables to be identical for migrating data

3. Which one of the NoSQL database types uses a graphical model to represent and store data, and is particularly useful for visualizing, analyzing, and finding connections between different pieces of data?

Key value store
Document-based
Column-based
Graph-based

4. Which of the data repositories serves as a pool of raw data and stores large amounts of structured, semi-structured, and unstructured data in their native formats?

Data Warehouses
Data Marts
Relational Databases
Data Lakes

5. While data integration combines disparate data into a unified view of the data, a data pipeline covers the entire data movement journey from source to destination systems, and ETL is a process within data integration.

True
False

Graded Quiz 03 answers

1. What does the attribute “Veracity” imply in the context of Big Data?

Scale of data
The speed at which data accumulates
Accuracy and conformity of data to facts
Diversity of the type and sources of data

2. ______________, in the context of Big Data, is the speed at which data accumulates.

Velocity
Volume
Value
Variety

3. What does the attribute “Value” imply in the context of Big Data?

The diversity of the type and sources of data
The accuracy and conformity of data to facts
Our ability and need to turn data into value
The speed at which data accumulates

4. Apache Spark is a general-purpose data processing engine designed to extract and process Big Data for a wide range of applications. What is one of its key use cases?

Consolidate data across the organization
Perform complex analytics in real-time
Scalable and reliable Big Data storage
Fast recovery from hardware failures

5. Which of the Big Data processing tools is used for reading, writing, and managing large data set files that are stored in either HDFS or Apache HBase?

Hive
ETL
Hadoop
Spark

Week 03 Quiz Answers

Graded Quiz 01 Answers

1. Which one of these steps is an intrinsic part of the “Data Processing Layer” of a data platform?

Transform and merge extracted data, either logically or physically
Read data in batch or streaming modes from storage and apply transformations
Deliver processed data to data consumers
Transfer data from data sources to the data platform in streaming, batch, or both modes

2. Systems that are used for capturing high-volume transactional data need to be designed for high-speed read, write, and update operations.

True
False

3. What is the role of “Network Access Control” systems in the area of network security?

To inspect incoming network traffic for intrusion attempts and vulnerabilities
To ensure attackers cannot tap into data while it is in transit
To ensure endpoint security by allowing only authorized devices to connect to the network
To create silos, or virtual local area networks, within a network so that you can segregate your assets

4. ____________ ensures that users access information based on their roles and the privileges assigned to their roles.

Firewalls
Authorization
Authentication
Security Monitoring

5. Security Monitoring and Intelligence systems:

Ensure users access information based on their role and privileges
Create virtual local area networks within a network so that you can segregate your assets
Create an audit history for triage and compliance purposes
Ensure only authorized devices can connect to a network

Graded Quiz 02 Answers

1. Web scraping is used to extract what type of data?

Data from news sites and NoSQL databases
Images, videos, and data from NoSQL databases
Text, videos, and images
Text, videos, and data from relational databases

2. ___________ focuses on cleaning the database of unused data and reducing redundancy and inconsistency.

Data Profiling
Normalization
Denormalization
Data Visualization

3. OpenRefine is an open-source tool that allows you to:

Transform data into a variety of formats such as TSV, CSV, XLS, XML, and JSON
Use add-ins such as Microsoft Power Query to identify issues and clean data
Enforces applicable data governance policies automatically
Automatically detect schemas, data types, and anomalies

4. When you’re combining rows of data from multiple source tables into a single table, what kind of data transformation are you performing?

Unions
Denormalization
Normalization
Joins

5. When you detect a value in your data set that is vastly different from other observations in the same data set, what would you report that as?

Syntax error
Outlier
Irrelevant data
Missing value

Graded Quiz 03 Answers

1. What are some of the querying techniques you can apply to identify extreme values in a data column

Performing partial matches of data values
Slicing a data set
Maximum and Minimum values in a data column
Aggregation

2. You can perform partial matches of data values in a data column using:

Count function
Average function
Filtering patterns
Slicing a data set

3. Tools for ______________ break up a job into a series of logical steps which are monitored for completion and time to completion.

Application Performance Monitoring
Monitoring Query Performance
Job-level Runtime Monitoring
Monitoring the amount of data being processed in a data pipeline

4. Database partitioning helps optimize databases for performance. It does this by:

Reducing inconsistencies and anomalies in data
Dividing large tables into smaller individual tables
Tracking request response time and error messages
Minimizing the number of times a disk needs to be accessed when a query is processed

5. Database normalization is a design technique that helps reduce inconsistencies and anomalies from data.

True
False

Graded Quiz 04 Answers

1. In which phase of the data lifecycle do you establish the data you need, the amount of data you need, and how you intend to use the data you are collecting

Data Acquisition
Data Retention
Data Sharing
Data Processing

2. The process of _____________ abstracts the presentation layer without changing the data in the database physically.

Anonymization
Encryption
Pseudonymization
Data Profiling

Week 04 Quiz Answers

Graded Quiz Answers

1. Data Engineering is a highly technical field. While communication, collaboration, and project management skills are somewhat useful, you don’t need these skills in order to grow in your role as a data engineer.

True
False

2. As a Lead Data Engineer what are some of the things you may be responsible for in addition to your hands-on skills?

Converting business requirements into technical specifications
Identify correlations, find patterns, and apply statistical methods to analyze and mine data
Visualize data to interpret and present the findings of data analysis
Provide business intelligence solutions by extracting insights from data

3. What are some of the factors that influence your growth on your journey from an Associate Data Engineer to a Principal Data Engineer role?

Domain specialization, such as in Healthcare, Banking, and Technology
If you spend enough time at one level, you are bound to grow into the next level role in a couple of years.
A Master’s degree in either Mathematics or Statistics
The amount of experience you gain within your chosen area of specialization and your understanding of other areas within data engineering

4. If you are an IT Support Specialist or a Software Tester gaining an entry into the field of data engineering will not be possible for you.

True
False

5. If you have basic familiarity with coding, you can develop some baseline technical skills that can get you started on your journey as a Data Engineer. What are some of these baseline skills?

Designing data pipelines
Familiarity with Big Data processing tools
Knowledge of operating systems, databases, and programming and query languages
Architecting data warehouses

TeamsCloud

Introduction to Data Engineering

Week 01 Quiz Answers

Graded Quiz 02 Answers

Week 02 Quiz Answers

Graded Quiz 01 Answers

Graded Quiz 02 Answers

Graded Quiz 03 answers

Week 03 Quiz Answers

Graded Quiz 01 Answers

Graded Quiz 02 Answers

Graded Quiz 03 Answers

Graded Quiz 04 Answers

Week 04 Quiz Answers

Graded Quiz Answers

Post a Comment

Project Lifecycle, Information Sharing, and Risk Management | Coursera Quiz Answers

Data Visualization with Python | Coursera Quiz Answers

Mobile Development and JavaScript | Coursera Quiz Answers

Alibaba Cloud's Machine Learning Platform: PAI (Exam)

Architecting Solutions on AWS | Coursera Quiz Answers

TeamsCloud