Deck 2: Data Management

Full screen (f)
exit full mode
Question
A common adage used in data management and utilization circles about poor data quality is

A) "a bad workman always blames his tools."
B) "a cat has nine lives."
C) "garbage in, garbage out."
D) "adversity and loss make a man wise."
Use Space or
up arrow
down arrow
to flip the card.
Question
Achieving value through big data requires a thorough understanding of the goals and objectives of a business.
Question
Which of the following characteristics of big data increases complexity and reduces confidence in the data?

A) variety
B) volume
C) veracity
D) velocity
Question
Most traditional Extract, Transform, and Load (ETL) tools can process only relational datasets for unstructured data, semi-structured, and machinery sensor data, but newer systems are much more flexible.
Question
Describe and explain the four processes of data transformation.
Question
In a dataset, values that are at a considerable distance from any of the other data clusters are treated as outliers.
Question
Data inconsistencies can lead to uncoordinated strategic decision making and embarrassing dilemmas for businesses.
Question
Which of following characteristics of big data means the data can be converted into meaningful, quality information that can be used to achieve tangible business benefits?

A) velocity
B) variety
C) value
D) volume
Question
A major challenge of today's data management is that

A) the inbound data goes on to increase exponentially.
B) data is not held in vaults for viewing by business department heads.
C) converting structured data into a usable form is a complex process.
D) data tends to trickle in, making simultaneous processing difficult.
Question
A ________ of data can contain as much information as half of the contents of all U.S. academic research libraries.

A) petabyte
B) terabyte
C) megabyte
D) gigabyte
Question
Companies that adopt ________ typically achieve up to 6 percent higher productivity and output than their peer companies.

A) multi-channel interactions
B) static information and fixed rules
C) social media sales campaigns
D) data-driven decision making
Question
A company combined monthly sales data into a single group to calculate the total sales by a quarter or a year. This is an example of the ________ step of data transformation.

A) feature construction
B) aggregation
C) dummy coding
D) normalization
Question
Compare and contrast relational and non-relational databases.
Question
In data aggregation, the ________ function helps in obtaining the total values within a dataset.

A) SUM ()
B) MAX ()
C) MIN ()
D) AVG ()
Question
Streaming data, the continuous transfer of data from numerous sources in different formats, cannot be included to capture customer data.
Question
In data transformation, ________ involves subtracting a variable from the mean and then dividing it by the standard deviation.

A) dummy coding
B) normalization
C) new column construction
D) overfitting
Question
Asking the question "Is the data correct, reliable, and precisely measured?" will help in determining the data quality of ________.

A) accuracy
B) format
C) consistency
D) timeliness
Question
Unlike relational databases, non-relational databases

A) make drilling down to very specific types of data easy.
B) distribute all data into structured tables.
C) are used by companies dealing with fixed, well-defined data.
D) provide greater flexibility for storing ever-changing data.
Question
Structured querying language is a language developed by IBM and used to access and update data stored in a database.
Question
The ________ characteristic of big data enables real-time business responses and strategies.

A) volume
B) variety
C) velocity
D) value
Question
Hadoop, a data warehouse built using HIVE, provides SQL-like query to access data stored in different file systems and databases that are used by HIVE.
Question
The veracity characteristic of big data provides businesses a holistic understanding of their customers and market situations.
Question
In data aggregation, employing the COUNT () function within a dataset returns the

A) average value within the dataset.
B) sum of values within the dataset.
C) number of rows corresponding to a certain feature.
D) maximum value within the dataset.
Question
In the context of data quality, ________ is the extent to which information is adequately presented or delivered for efficient and effective understanding.

A) accuracy
B) timeliness
C) consistency
D) format
Question
Explain how the Extract, Transform, and Load (ETL) integration process uses Hadoop to capture, store, process, secure, and then analyze complex data.
Question
Identify the second step the MapReduce platform performs to solve problems of big data computation.

A) Provide SQL-like query to access data stored in different file systems and databases.
B) Map the data by dividing it into manageable subsets.
C) Distribute data to a group of networked computers for storing and processing.
D) Combine the answers from the computer nodes into a single answer for the original problem handled.
Question
In the context of database management systems, primary keys and foreign keys are important in relational databases because they help

A) database users collate data from different tables.
B) manage the volume and speed of incoming data.
C) customers search for products online effectively.
D) databases consistently maintain the same properties.
Question
Imagine a company wants to send a follow-up email to its best customers immediately after a purchase. However, email addresses were not collected from the customers at the time of purchase. This is an example of missing the data quality of ________.

A) accuracy
B) timeliness
C) consistency
D) variety
Question
A gigabyte of data can contain ________.

A) a single character of text
B) all the X-rays in a large hospital
C) a small novel
D) Beethoven's 5th Symphony
Question
In the context of data preparation, a power calculation helps determine

A) if any data observations have a considerable distance to other data clusters.
B) whether an existing, valuable customer is likely to leave the business.
C) the what, when, and who of the analysis.
D) what outcomes will be estimated from a sample with a sufficient level of precision.
Question
In data transformation, ________ involves creating a dichotomous value from a categorical value.

A) feature construction
B) overfitting
C) dummy coding
D) data compression
Unlock Deck
Sign up to unlock the cards in this deck!
Unlock Deck
Unlock Deck
1/31
auto play flashcards
Play
simple tutorial
Full screen (f)
exit full mode
Deck 2: Data Management
1
A common adage used in data management and utilization circles about poor data quality is

A) "a bad workman always blames his tools."
B) "a cat has nine lives."
C) "garbage in, garbage out."
D) "adversity and loss make a man wise."
"garbage in, garbage out."
2
Achieving value through big data requires a thorough understanding of the goals and objectives of a business.
True
3
Which of the following characteristics of big data increases complexity and reduces confidence in the data?

A) variety
B) volume
C) veracity
D) velocity
veracity
4
Most traditional Extract, Transform, and Load (ETL) tools can process only relational datasets for unstructured data, semi-structured, and machinery sensor data, but newer systems are much more flexible.
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
5
Describe and explain the four processes of data transformation.
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
6
In a dataset, values that are at a considerable distance from any of the other data clusters are treated as outliers.
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
7
Data inconsistencies can lead to uncoordinated strategic decision making and embarrassing dilemmas for businesses.
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
8
Which of following characteristics of big data means the data can be converted into meaningful, quality information that can be used to achieve tangible business benefits?

A) velocity
B) variety
C) value
D) volume
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
9
A major challenge of today's data management is that

A) the inbound data goes on to increase exponentially.
B) data is not held in vaults for viewing by business department heads.
C) converting structured data into a usable form is a complex process.
D) data tends to trickle in, making simultaneous processing difficult.
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
10
A ________ of data can contain as much information as half of the contents of all U.S. academic research libraries.

A) petabyte
B) terabyte
C) megabyte
D) gigabyte
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
11
Companies that adopt ________ typically achieve up to 6 percent higher productivity and output than their peer companies.

A) multi-channel interactions
B) static information and fixed rules
C) social media sales campaigns
D) data-driven decision making
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
12
A company combined monthly sales data into a single group to calculate the total sales by a quarter or a year. This is an example of the ________ step of data transformation.

A) feature construction
B) aggregation
C) dummy coding
D) normalization
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
13
Compare and contrast relational and non-relational databases.
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
14
In data aggregation, the ________ function helps in obtaining the total values within a dataset.

A) SUM ()
B) MAX ()
C) MIN ()
D) AVG ()
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
15
Streaming data, the continuous transfer of data from numerous sources in different formats, cannot be included to capture customer data.
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
16
In data transformation, ________ involves subtracting a variable from the mean and then dividing it by the standard deviation.

A) dummy coding
B) normalization
C) new column construction
D) overfitting
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
17
Asking the question "Is the data correct, reliable, and precisely measured?" will help in determining the data quality of ________.

A) accuracy
B) format
C) consistency
D) timeliness
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
18
Unlike relational databases, non-relational databases

A) make drilling down to very specific types of data easy.
B) distribute all data into structured tables.
C) are used by companies dealing with fixed, well-defined data.
D) provide greater flexibility for storing ever-changing data.
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
19
Structured querying language is a language developed by IBM and used to access and update data stored in a database.
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
20
The ________ characteristic of big data enables real-time business responses and strategies.

A) volume
B) variety
C) velocity
D) value
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
21
Hadoop, a data warehouse built using HIVE, provides SQL-like query to access data stored in different file systems and databases that are used by HIVE.
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
22
The veracity characteristic of big data provides businesses a holistic understanding of their customers and market situations.
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
23
In data aggregation, employing the COUNT () function within a dataset returns the

A) average value within the dataset.
B) sum of values within the dataset.
C) number of rows corresponding to a certain feature.
D) maximum value within the dataset.
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
24
In the context of data quality, ________ is the extent to which information is adequately presented or delivered for efficient and effective understanding.

A) accuracy
B) timeliness
C) consistency
D) format
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
25
Explain how the Extract, Transform, and Load (ETL) integration process uses Hadoop to capture, store, process, secure, and then analyze complex data.
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
26
Identify the second step the MapReduce platform performs to solve problems of big data computation.

A) Provide SQL-like query to access data stored in different file systems and databases.
B) Map the data by dividing it into manageable subsets.
C) Distribute data to a group of networked computers for storing and processing.
D) Combine the answers from the computer nodes into a single answer for the original problem handled.
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
27
In the context of database management systems, primary keys and foreign keys are important in relational databases because they help

A) database users collate data from different tables.
B) manage the volume and speed of incoming data.
C) customers search for products online effectively.
D) databases consistently maintain the same properties.
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
28
Imagine a company wants to send a follow-up email to its best customers immediately after a purchase. However, email addresses were not collected from the customers at the time of purchase. This is an example of missing the data quality of ________.

A) accuracy
B) timeliness
C) consistency
D) variety
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
29
A gigabyte of data can contain ________.

A) a single character of text
B) all the X-rays in a large hospital
C) a small novel
D) Beethoven's 5th Symphony
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
30
In the context of data preparation, a power calculation helps determine

A) if any data observations have a considerable distance to other data clusters.
B) whether an existing, valuable customer is likely to leave the business.
C) the what, when, and who of the analysis.
D) what outcomes will be estimated from a sample with a sufficient level of precision.
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
31
In data transformation, ________ involves creating a dichotomous value from a categorical value.

A) feature construction
B) overfitting
C) dummy coding
D) data compression
Unlock Deck
Unlock for access to all 31 flashcards in this deck.
Unlock Deck
k this deck
locked card icon
Unlock Deck
Unlock for access to all 31 flashcards in this deck.