Week 01 Quiz Answers
1. A modern data ecosystem includes a network of continually evolving
entities. It includes:
- Data sources, databases, and programming languages
- Social media sources, data repositories, and APIs
- Data providers, databases, and programming languages
-
Data sources, enterprise data repository, business stakeholders, and
tools, applications, and infrastructure to manage data
2. Data Engineers work within the data ecosystem to:
- Develop and maintain data architectures
- Analyze data for actionable insights
- Analyze data for deriving insights
-
Provide business intelligence solutions by monitoring data on different
business functions
3. The goal of data engineering is to make quality data available for
fact-finding and decision-making. Which one of these statements captures the
process of data engineering?
- Processing data and making it available to users securely
- Collecting, processing, and storing data
- Collecting, processing, and making data available to users securely
-
Collecting, processing, storing, and making data available to users
securely
4. Data extracted from disparate sources can be stored in:
- Data Lakes only
- Databases only
-
Databases, data warehouses, data lakes, or any other type of data
repository
- Data Warehouses only
5. From the provided list, select the three emerging technologies that are
shaping today’s data ecosystem.
- Cloud Computing, Machine Learning, and Big Data
- Big Data, Internet of Things, and Dashboarding
- Machine Language, Cloud Computing, and Internet of Things
- Cloud Computing, Internet of Things, and Dashboarding
Graded Quiz 02 Answers
1. Which one of these functional skills is essential to the role of a Data
Engineer?
- Inspect analytics-ready data for deriving insights
- Proficiency in Mathematics
-
The ability to work with the software development lifecycle
- Proficiency in working with ETL Tools
2. Oracle Exadata, IBM Db2 Warehouse on Cloud, IBM Netezza Performance
Server, and Amazon RedShift are some of the popular __________________ in use
today.
- NoSQL Databases
- Data Warehouses
- Big Data Platforms
- ETL Tools
3. Data Engineers manage the infrastructure required for the ingestion,
processing, and storage of data.
Week 02 Quiz Answers
Graded Quiz 01 Answers
1. There are two main types of data repositories – Transactional and
Analytical. For high-volume day-to-day operational data such as banking
transactions, Transactional, or OLTP, systems are the ideal choice.
2. Which of the following is an example of unstructured data?
- Zipped files
- XML
- Spreadsheets
- Video and Audio files
3. Which one of these file formats is independent of software, hardware, and
operating systems, and can be viewed the same way on any device?
- XML
- XLSX
- Delimited text file
- PDF
4. Which data source can return data in plain text, XML, HTML, or JSON among
others?
- APIs
- XML
- Delimited text file
- PDF
5. In the data engineer’s ecosystem, languages are classified by type. What
are shell and scripting languages most commonly used for?
- Manipulating data
- Automating repetitive operational tasks
- Building apps
- Querying data
Graded Quiz 02 Answers
1. Data Marts and Data Warehouses have typically been relational, but the
emergence of what technology has helped to let these be used for
non-relational data?
2. What is one of the most significant advantages of an RDBMS?
- Enforces a limit on the length of data fields
- Can store only structured data
- Is ACID-Compliant
-
Requires source and destination tables to be identical for migrating data
3. Which one of the NoSQL database types uses a graphical model to represent
and store data, and is particularly useful for visualizing, analyzing, and
finding connections between different pieces of data?
- Key value store
- Document-based
- Column-based
- Graph-based
4. Which of the data repositories serves as a pool of raw data and stores
large amounts of structured, semi-structured, and unstructured data in their
native formats?
- Data Warehouses
- Data Marts
- Relational Databases
- Data Lakes
5. While data integration combines disparate data into a unified view of the
data, a data pipeline covers the entire data movement journey from source to
destination systems, and ETL is a process within data integration.
Graded Quiz 03 answers
1. What does the attribute “Veracity” imply in the context of Big Data?
- Scale of data
- The speed at which data accumulates
- Accuracy and conformity of data to facts
- Diversity of the type and sources of data
2. ______________, in the context of Big Data, is the speed at which data
accumulates.
- Velocity
- Volume
- Value
- Variety
3. What does the attribute “Value” imply in the context of Big Data?
- The diversity of the type and sources of data
- The accuracy and conformity of data to facts
- Our ability and need to turn data into value
- The speed at which data accumulates
4. Apache Spark is a general-purpose data processing engine designed to
extract and process Big Data for a wide range of applications. What is one of
its key use cases?
- Consolidate data across the organization
- Perform complex analytics in real-time
- Scalable and reliable Big Data storage
- Fast recovery from hardware failures
5. Which of the Big Data processing tools is used for reading, writing, and
managing large data set files that are stored in either HDFS or Apache HBase?
Week 03 Quiz Answers
Graded Quiz 01 Answers
1. Which one of these steps is an intrinsic part of the “Data Processing
Layer” of a data platform?
- Transform and merge extracted data, either logically or physically
-
Read data in batch or streaming modes from storage and apply
transformations
- Deliver processed data to data consumers
-
Transfer data from data sources to the data platform in streaming, batch,
or both modes
2. Systems that are used for capturing high-volume transactional data need to
be designed for high-speed read, write, and update operations.
3. What is the role of “Network Access Control” systems in the area of network
security?
-
To inspect incoming network traffic for intrusion attempts and
vulnerabilities
- To ensure attackers cannot tap into data while it is in transit
-
To ensure endpoint security by allowing only authorized devices to
connect to the network
-
To create silos, or virtual local area networks, within a network so that
you can segregate your assets
4. ____________ ensures that users access information based on their roles and
the privileges assigned to their roles.
- Firewalls
- Authorization
- Authentication
- Security Monitoring
5. Security Monitoring and Intelligence systems:
- Ensure users access information based on their role and privileges
-
Create virtual local area networks within a network so that you can
segregate your assets
-
Create an audit history for triage and compliance purposes
- Ensure only authorized devices can connect to a network
Graded Quiz 02 Answers
1. Web scraping is used to extract what type of data?
- Data from news sites and NoSQL databases
- Images, videos, and data from NoSQL databases
- Text, videos, and images
- Text, videos, and data from relational databases
2. ___________ focuses on cleaning the database of unused data and reducing
redundancy and inconsistency.
- Data Profiling
- Normalization
- Denormalization
- Data Visualization
3. OpenRefine is an open-source tool that allows you to:
-
Transform data into a variety of formats such as TSV, CSV, XLS, XML,
and JSON
-
Use add-ins such as Microsoft Power Query to identify issues and clean
data
- Enforces applicable data governance policies automatically
- Automatically detect schemas, data types, and anomalies
4. When you’re combining rows of data from multiple source tables into a
single table, what kind of data transformation are you performing?
- Unions
- Denormalization
- Normalization
- Joins
5. When you detect a value in your data set that is vastly different from
other observations in the same data set, what would you report that as?
- Syntax error
- Outlier
- Irrelevant data
- Missing value
Graded Quiz 03 Answers
1. What are some of the querying techniques you can apply to identify extreme
values in a data column
- Performing partial matches of data values
- Slicing a data set
- Maximum and Minimum values in a data column
- Aggregation
2. You can perform partial matches of data values in a data column using:
- Count function
- Average function
- Filtering patterns
- Slicing a data set
3. Tools for ______________ break up a job into a series of logical steps
which are monitored for completion and time to completion.
- Application Performance Monitoring
- Monitoring Query Performance
- Job-level Runtime Monitoring
- Monitoring the amount of data being processed in a data pipeline
4. Database partitioning helps optimize databases for performance. It does
this by:
- Reducing inconsistencies and anomalies in data
- Dividing large tables into smaller individual tables
- Tracking request response time and error messages
-
Minimizing the number of times a disk needs to be accessed when a query is
processed
5. Database normalization is a design technique that helps reduce
inconsistencies and anomalies from data.
Graded Quiz 04 Answers
1. In which phase of the data lifecycle do you establish the data you need,
the amount of data you need, and how you intend to use the data you are
collecting
- Data Acquisition
- Data Retention
- Data Sharing
- Data Processing
2. The process of _____________ abstracts the presentation layer without
changing the data in the database physically.
- Anonymization
- Encryption
- Pseudonymization
- Data Profiling
Week 04 Quiz Answers
Graded Quiz Answers
1. Data Engineering is a highly technical field. While communication,
collaboration, and project management skills are somewhat useful, you don’t
need these skills in order to grow in your role as a data engineer.
2. As a Lead Data Engineer what are some of the things you may be responsible
for in addition to your hands-on skills?
-
Converting business requirements into technical specifications
-
Identify correlations, find patterns, and apply statistical methods to
analyze and mine data
-
Visualize data to interpret and present the findings of data analysis
-
Provide business intelligence solutions by extracting insights from data
3. What are some of the factors that influence your growth on your journey
from an Associate Data Engineer to a Principal Data Engineer role?
-
Domain specialization, such as in Healthcare, Banking, and Technology
-
If you spend enough time at one level, you are bound to grow into the next
level role in a couple of years.
- A Master’s degree in either Mathematics or Statistics
-
The amount of experience you gain within your chosen area of
specialization and your understanding of other areas within data
engineering
4. If you are an IT Support Specialist or a Software Tester gaining an entry
into the field of data engineering will not be possible for you.
5. If you have basic familiarity with coding, you can develop some baseline
technical skills that can get you started on your journey as a Data Engineer.
What are some of these baseline skills?
- Designing data pipelines
- Familiarity with Big Data processing tools
-
Knowledge of operating systems, databases, and programming and query
languages
- Architecting data warehouses