You have a request ? Contact Us Join Us

Introduction to Data Engineering

Answers of Introduction to Data Engineering. IBM Data Engineering Professional Certificate.
Estimated read time: 15 min
Introduction to Data Engineering

Week 01 Quiz Answers

1. A modern data ecosystem includes a network of continually evolving entities. It includes:
  • Data sources, databases, and programming languages
  • Social media sources, data repositories, and APIs
  • Data providers, databases, and programming languages
  • Data sources, enterprise data repository, business stakeholders, and tools, applications, and infrastructure to manage data
2. Data Engineers work within the data ecosystem to:
  • Develop and maintain data architectures
  • Analyze data for actionable insights
  • Analyze data for deriving insights
  • Provide business intelligence solutions by monitoring data on different business functions
3. The goal of data engineering is to make quality data available for fact-finding and decision-making. Which one of these statements captures the process of data engineering?
  • Processing data and making it available to users securely
  • Collecting, processing, and storing data
  • Collecting, processing, and making data available to users securely
  • Collecting, processing, storing, and making data available to users securely
4. Data extracted from disparate sources can be stored in:
  • Data Lakes only
  • Databases only
  • Databases, data warehouses, data lakes, or any other type of data repository
  • Data Warehouses only
5. From the provided list, select the three emerging technologies that are shaping today’s data ecosystem.
  • Cloud Computing, Machine Learning, and Big Data
  • Big Data, Internet of Things, and Dashboarding
  • Machine Language, Cloud Computing, and Internet of Things
  • Cloud Computing, Internet of Things, and Dashboarding

Graded Quiz 02 Answers

1. Which one of these functional skills is essential to the role of a Data Engineer?
  • Inspect analytics-ready data for deriving insights
  • Proficiency in Mathematics
  • The ability to work with the software development lifecycle
  • Proficiency in working with ETL Tools
2. Oracle Exadata, IBM Db2 Warehouse on Cloud, IBM Netezza Performance Server, and Amazon RedShift are some of the popular __________________ in use today.
  • NoSQL Databases
  • Data Warehouses
  • Big Data Platforms
  • ETL Tools
3. Data Engineers manage the infrastructure required for the ingestion, processing, and storage of data.
  • True
  • False

Week 02 Quiz Answers

Graded Quiz 01 Answers

1. There are two main types of data repositories – Transactional and Analytical. For high-volume day-to-day operational data such as banking transactions, Transactional, or OLTP, systems are the ideal choice.
  • True
  • False
2. Which of the following is an example of unstructured data?
  • Zipped files
  • XML
  • Spreadsheets
  • Video and Audio files
3. Which one of these file formats is independent of software, hardware, and operating systems, and can be viewed the same way on any device?
  • XML
  • XLSX
  • Delimited text file
  • PDF
4. Which data source can return data in plain text, XML, HTML, or JSON among others?
  • APIs
  • XML
  • Delimited text file
  • PDF
5. In the data engineer’s ecosystem, languages are classified by type. What are shell and scripting languages most commonly used for?
  • Manipulating data
  • Automating repetitive operational tasks
  • Building apps
  • Querying data

Graded Quiz 02 Answers

1. Data Marts and Data Warehouses have typically been relational, but the emergence of what technology has helped to let these be used for non-relational data?
  • NoSQL
  • SQL
  • Data Lake
  • ETL
2. What is one of the most significant advantages of an RDBMS?
  • Enforces a limit on the length of data fields
  • Can store only structured data
  • Is ACID-Compliant
  • Requires source and destination tables to be identical for migrating data
3. Which one of the NoSQL database types uses a graphical model to represent and store data, and is particularly useful for visualizing, analyzing, and finding connections between different pieces of data?
  • Key value store
  • Document-based
  • Column-based
  • Graph-based
4. Which of the data repositories serves as a pool of raw data and stores large amounts of structured, semi-structured, and unstructured data in their native formats?
  • Data Warehouses
  • Data Marts
  • Relational Databases
  • Data Lakes
5. While data integration combines disparate data into a unified view of the data, a data pipeline covers the entire data movement journey from source to destination systems, and ETL is a process within data integration.
  • True
  • False

Graded Quiz 03 answers

1. What does the attribute “Veracity” imply in the context of Big Data?
  • Scale of data
  • The speed at which data accumulates
  • Accuracy and conformity of data to facts
  • Diversity of the type and sources of data
2. ______________, in the context of Big Data, is the speed at which data accumulates.
  • Velocity
  • Volume
  • Value
  • Variety
3. What does the attribute “Value” imply in the context of Big Data?
  • The diversity of the type and sources of data
  • The accuracy and conformity of data to facts
  • Our ability and need to turn data into value
  • The speed at which data accumulates
4. Apache Spark is a general-purpose data processing engine designed to extract and process Big Data for a wide range of applications. What is one of its key use cases?
  • Consolidate data across the organization
  • Perform complex analytics in real-time
  • Scalable and reliable Big Data storage
  • Fast recovery from hardware failures
5. Which of the Big Data processing tools is used for reading, writing, and managing large data set files that are stored in either HDFS or Apache HBase?
  • Hive
  • ETL
  • Hadoop
  • Spark

Week 03 Quiz Answers

Graded Quiz 01 Answers

1. Which one of these steps is an intrinsic part of the “Data Processing Layer” of a data platform?
  • Transform and merge extracted data, either logically or physically
  • Read data in batch or streaming modes from storage and apply transformations
  • Deliver processed data to data consumers
  • Transfer data from data sources to the data platform in streaming, batch, or both modes
2. Systems that are used for capturing high-volume transactional data need to be designed for high-speed read, write, and update operations.
  • True
  • False
3. What is the role of “Network Access Control” systems in the area of network security?
  • To inspect incoming network traffic for intrusion attempts and vulnerabilities
  • To ensure attackers cannot tap into data while it is in transit
  • To ensure endpoint security by allowing only authorized devices to connect to the network
  • To create silos, or virtual local area networks, within a network so that you can segregate your assets
4. ____________ ensures that users access information based on their roles and the privileges assigned to their roles.
  • Firewalls
  • Authorization
  • Authentication
  • Security Monitoring
5. Security Monitoring and Intelligence systems:
  • Ensure users access information based on their role and privileges
  • Create virtual local area networks within a network so that you can segregate your assets
  • Create an audit history for triage and compliance purposes
  • Ensure only authorized devices can connect to a network

Graded Quiz 02 Answers

1. Web scraping is used to extract what type of data?
  • Data from news sites and NoSQL databases
  • Images, videos, and data from NoSQL databases
  • Text, videos, and images
  • Text, videos, and data from relational databases
2. ___________ focuses on cleaning the database of unused data and reducing redundancy and inconsistency.
  • Data Profiling
  • Normalization
  • Denormalization
  • Data Visualization
3. OpenRefine is an open-source tool that allows you to:
  • Transform data into a variety of formats such as TSV, CSV, XLS, XML, and JSON
  • Use add-ins such as Microsoft Power Query to identify issues and clean data
  • Enforces applicable data governance policies automatically
  • Automatically detect schemas, data types, and anomalies
4. When you’re combining rows of data from multiple source tables into a single table, what kind of data transformation are you performing?
  • Unions
  • Denormalization
  • Normalization
  • Joins
5. When you detect a value in your data set that is vastly different from other observations in the same data set, what would you report that as?
  • Syntax error
  • Outlier
  • Irrelevant data
  • Missing value

Graded Quiz 03 Answers

1. What are some of the querying techniques you can apply to identify extreme values in a data column
  • Performing partial matches of data values
  • Slicing a data set
  • Maximum and Minimum values in a data column
  • Aggregation
2. You can perform partial matches of data values in a data column using:
  • Count function
  • Average function
  • Filtering patterns
  • Slicing a data set
3. Tools for ______________ break up a job into a series of logical steps which are monitored for completion and time to completion.
  • Application Performance Monitoring
  • Monitoring Query Performance
  • Job-level Runtime Monitoring
  • Monitoring the amount of data being processed in a data pipeline
4. Database partitioning helps optimize databases for performance. It does this by:
  • Reducing inconsistencies and anomalies in data
  • Dividing large tables into smaller individual tables
  • Tracking request response time and error messages
  • Minimizing the number of times a disk needs to be accessed when a query is processed
5. Database normalization is a design technique that helps reduce inconsistencies and anomalies from data.
  • True
  • False

Graded Quiz 04 Answers

1. In which phase of the data lifecycle do you establish the data you need, the amount of data you need, and how you intend to use the data you are collecting
  • Data Acquisition
  • Data Retention
  • Data Sharing
  • Data Processing
2. The process of _____________ abstracts the presentation layer without changing the data in the database physically.
  • Anonymization
  • Encryption
  • Pseudonymization
  • Data Profiling

Week 04 Quiz Answers

Graded Quiz Answers

1. Data Engineering is a highly technical field. While communication, collaboration, and project management skills are somewhat useful, you don’t need these skills in order to grow in your role as a data engineer.
  • True
  • False
2. As a Lead Data Engineer what are some of the things you may be responsible for in addition to your hands-on skills?
  • Converting business requirements into technical specifications
  • Identify correlations, find patterns, and apply statistical methods to analyze and mine data
  • Visualize data to interpret and present the findings of data analysis
  • Provide business intelligence solutions by extracting insights from data
3. What are some of the factors that influence your growth on your journey from an Associate Data Engineer to a Principal Data Engineer role?
  • Domain specialization, such as in Healthcare, Banking, and Technology
  • If you spend enough time at one level, you are bound to grow into the next level role in a couple of years.
  • A Master’s degree in either Mathematics or Statistics
  • The amount of experience you gain within your chosen area of specialization and your understanding of other areas within data engineering
4. If you are an IT Support Specialist or a Software Tester gaining an entry into the field of data engineering will not be possible for you.
  • True
  • False
5. If you have basic familiarity with coding, you can develop some baseline technical skills that can get you started on your journey as a Data Engineer. What are some of these baseline skills?
  • Designing data pipelines
  • Familiarity with Big Data processing tools
  • Knowledge of operating systems, databases, and programming and query languages
  • Architecting data warehouses

Post a Comment

Cookie Consent
We serve cookies on this site to analyze traffic, remember your preferences, and optimize your experience.
Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
AdBlock Detected!
We have detected that you are using adblocking plugin in your browser.
The revenue we earn by the advertisements is used to manage this website, we request you to whitelist our website in your adblocking plugin.