Question 1

Suppose we had a data set of from a call center where customers were asked to choose between the following three options: hear account information, billing questions, and customer service. Using the given order of the three options, and using 0-1 dummy variables to encode the categorical variables, which of the following combinations would yield an entry "customer service"?

A)000
B)100
C)010
D)001

001

Accepted Answer

Given the order of options (hear account information, billing questions, customer service) and using 0-1 dummy variables, "customer service" corresponds to the third option, which is encoded as 001.

Question 2

If the Euclidean distance were to be represented in a right triangle, which of the following would be considered the distance between two observations of a cluster?&#10;A)The short leg&#10;B)The long leg&#10;C)The hypotenuse&#10;D)Euclidean distance is not related to right triangles.

Accepted Answer

The hypotenuse of a right triangle represents the Euclidean distance between two observations in a cluster. The other two sides represent the difference in values between the variables being compared.

Question 3

__________ can be used to partition observations in a manner to obtain clusters with the least amount of information loss due to the aggregation.&#10;A)Single linkage&#10;B)Ward's method&#10;C)Average group linkage&#10;D)Dendrogram

Accepted Answer

Ward's method partitions observations in a way that minimizes the variance within each cluster, resulting in clusters that maintain the greatest amount of information. Single linkage, average group linkage, and dendrogram do not necessarily result in the least amount of information loss due to the aggregation.

Question 4

In preparing categorical variables for analysis, it is usually best to&#10;A)convert the categories to numeric representations.&#10;B)convert the categories to binary, dummy variables.&#10;C)combine as many categories as possible.&#10;D)let them remain categorical.

Accepted Answer

Converting categorical variables to binary, dummy variables allows for easier comparison between categories and avoids assigning numerical values that may not accurately reflect the underlying relationship between the categories. Additionally, keeping the categories separate allows for more nuanced analysis and the ability to investigate relationships between specific categories.

Question 5

Average linkage is a measure of calculating dissimilarity between two clusters by&#10;A)finding the distance between the two most dissimilar observations in the two clusters.&#10;B)computing the average distance between every pair of observations between two clusters.&#10;C)finding the distance between the two closest observations in the two clusters.&#10;D)computing the distance between the cluster centroids.

Accepted Answer

Average linkage calculates the dissimilarity between two clusters based on the average distance between every pair of observations between the two clusters.

Question 6

Euclidean distance can be used to measure the distance between __________ in cluster analysis.&#10;A)objects&#10;B)clusters&#10;C)observations&#10;D)ward

Accepted Answer

The answer of Euclidean distance can be used to measure...

Question 7

__________ is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters.&#10;A)Single linkage&#10;B)Complete linkage&#10;C)Average linkage&#10;D)Average group linkage

Accepted Answer

The measure described in the question is known as the complete linkage. In complete linkage, the dissimilarity between clusters is calculated by considering the two most dissimilar observations between the clusters. This method tends to produce more compact and spherical clusters compared to single linkage and average linkage.

Question 8

__________ is a method of calculating dissimilarity between clusters by calculating the distance between the centroids of the two clusters.&#10;A)Single linkage&#10;B)Complete linkage&#10;C)Average linkage&#10;D)Centroid linkage

Accepted Answer

Centroid linkage is the method of calculating dissimilarity between clusters by calculating the distance between the centroids of the two clusters. This method is less sensitive to outliers and can accurately identify clusters with similar centroids.

Question 9

k-means clustering is the process of&#10;A)agglomerating observations into a series of nested groups based on a measure of similarity.&#10;B)organizing observations into distinct groups based on a measure of similarity.&#10;C)reducing the number of variables to consider in data-mining.&#10;D)estimating the value of a continuous outcome variable.

Accepted Answer

k-means clustering is a process of dividing observations into distinct groups based on a measure of similarity. It does not involve agglomerating observations into nested groups or reducing variables, nor does it estimate the value of a continuous outcome variable.

Question 10

__________ approaches are designed to describe patterns and relationships in large data sets with many observations of many variables.&#10;A)Data mining&#10;B)Unsupervised learning&#10;C)Dimension reduction&#10;D)Data sampling

Accepted Answer

Unsupervised learning approaches are specifically designed to identify patterns and relationships in data sets without pre-existing labels, making it suitable for exploring large data sets with many observations and variables.

Question 11

Jaccard's coefficient is different from the matching coefficient in that the former&#10;A)measures overlap while the latter measures dissimilarity.&#10;B)does not count matching zero entries while the latter does.&#10;C)deals with categorical variable while the latter deals with continuous variables.&#10;D)is affected by the scale used to measure variables while the latter is not.

Accepted Answer

The answer of Jaccard's coefficient is different from the matching...

Question 12

Euclidean distance can be used to calculate the dissimilarity between two observations. Let u = (25, $350) correspond to a 25-year-old customer that spent $350 at Store A in the previous fiscal year. Let v = (53, $420) correspond to a 53-year-old customer that spent $4,100 at Store A in the previous fiscal year. Calculate the dissimilarity between these two observations using Euclidean distance.

A)66.21
B)72.28
C)75.39
D)88.57

Accepted Answer

The answer of Euclidean distance can be used to calculate...

Question 13

When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called the&#10;A)matching coefficient.&#10;B)Jaccard's coefficient.&#10;C)Euclidean distance.&#10;D)antecedent.

Accepted Answer

The answer of When clustering only by dummy variables that...

Question 14

Single linkage is a measure of calculating dissimilarity between clusters by&#10;A)considering only the two most dissimilar observations in the two clusters.&#10;B)computing the average dissimilarity between every pair of observations between the two clusters.&#10;C)considering only the two most similar observations in the two clusters.&#10;D)considering the distance between the cluster centroids.

Accepted Answer

The answer of Single linkage is a measure of calculating...

Question 15

Which of the following is true of Euclidean distances?&#10;A)It is used to measure dissimilarity between categorical variable observations.&#10;B)It is not affected by the scale on which variables are measured.&#10;C)It increases with the increase in similarity between variable values.&#10;D)It is commonly used as a method of measuring dissimilarity between quantitative observations.

Accepted Answer

The answer of Which of the following is true of...

Question 16

The data preparation technique used in market segmentation to divide consumers into different homogeneous groups is called&#10;A)data visualization.&#10;B)cluster analysis.&#10;C)market analysis.&#10;D)supervised learning.

Accepted Answer

The answer of The data preparation technique used in market...

Question 17

Suppose the dissimilarity between clusters A and B has the value 24 and the dissimilarity between cluster B and C has the value 12. Use McQuitty's method to determine the dissimilarity of clusters A and B.

A)12
B)18
C)24
D)36

Accepted Answer

The answer of Suppose the dissimilarity between clusters A and...

Question 18

The goal of __________ is to use the variable values to identify relationships between observations.&#10;A)unsupervised learning&#10;B)data mining&#10;C)McQuitty's method&#10;D)Ward's method

Accepted Answer

The answer of The goal of __________ is to use...

Question 19

Observation refers to the&#10;A)estimated continuous outcome variable.&#10;B)set of recorded values of variables associated with a single entity.&#10;C)goal of predicting a categorical outcome based on a set of variables.&#10;D)mean of all variable values associated with one particular entity.

Accepted Answer

The answer of Observation refers to the&#10;A)estimated continuous outcome variable.&#10;B)set...

Question 20

A method for modifying variables that reduces bias prior to cluster analysis is&#10;A)standardization.&#10;B)weighting.&#10;C)removing outliers.&#10;D)randomizing.

Accepted Answer

The answer of A method for modifying variables that reduces...

Question 21

An analysis of items frequently co-occurring in transactions is known as&#10;A)market segmentation.&#10;B)market basket analysis.&#10;C)regression analysis.&#10;D)cluster analysis.

Accepted Answer

The answer of An analysis of items frequently co-occurring in...

Question 22

In k-means clustering, k represents the&#10;A)number of variables.&#10;B)number of clusters.&#10;C)number of observations in a cluster.&#10;D)mean of the cluster.

Accepted Answer

The answer of In k-means clustering, k represents the&#10;A)number of...

Question 23

The __________ the lift ratio, the __________ the association rule.&#10;A)higher; stronger&#10;B)higher; weaker&#10;C)lower; stronger&#10;D)lower; weaker

Accepted Answer

The answer of The __________ the lift ratio, the __________...

Question 24

A cluster's __________ can be measured by the difference between the distance value at which a cluster is originally formed and the distance value at which it is merged with another cluster in a dendrogram.

A)dimension
B)affordability
C)durability
D)span

Accepted Answer

The answer of A cluster's __________ can be measured by...

Question 25

In which of the following scenarios would it be appropriate to use hierarchical clustering?&#10;A)When the number of observations in the dataset is relatively high&#10;B)When it is not necessary to know the nesting of clusters&#10;C)When the number of clusters is known beforehand&#10;D)When binary or ordinal data needs to be clustered

Accepted Answer

The answer of In which of the following scenarios would...

Question 26

Which statement is true of an association rule?&#10;A)It is ultimately judged on how actionable it is and how well it explains the relationship between item sets.&#10;B)It is a data reduction technique that reduces large information into smaller homogeneous groups.&#10;C)It uses analytic models to describe the relationship between metrics that drive business performance.&#10;D)It seeks to classify a categorical outcome into two or more categories.

Accepted Answer

The answer of Which statement is true of an association...

Question 27

Single linkage can be used to measure the distance between clusters that are the __________ in cluster analysis.&#10;A)most similar&#10;B)most different&#10;C)farthest apart&#10;D)closest

Accepted Answer

The answer of Single linkage can be used to measure...

Question 28

Complete linkage can be used to measure the distance between clusters that are the __________ in cluster analysis.&#10;A)most similar&#10;B)most different&#10;C)farthest apart&#10;D)closest

Accepted Answer

The answer of Complete linkage can be used to measure...

Question 29

To identify patterns across transactions, we can use&#10;A)association rules.&#10;B)complete linkage.&#10;C)centroid linkage.&#10;D)k-means.

Accepted Answer

The answer of To identify patterns across transactions, we can...

Question 30

The strength of the association rule is known as __________ and is calculated as the ratio of the confidence of an association rule to the benchmark confidence.&#10;A)lift&#10;B)antecedent&#10;C)support count&#10;D)consequent

Accepted Answer

The answer of The strength of the association rule is...

Question 31

A __________ refers to the number of times a collection of items occurs together in a transaction data set.&#10;A)consequent&#10;B)validation count&#10;C)support count&#10;D)antecedent

Accepted Answer

The answer of A __________ refers to the number of...

Question 32

Hierarchical clustering using __________ results in a sequence of aggregated clusters that minimizes the loss of information between the individual observation level and the cluster level.&#10;A)McQuitty's method&#10;B)centroid linkage&#10;C)median linkage&#10;D)Ward's method

Accepted Answer

The answer of Hierarchical clustering using __________ results in a...

Question 33

The lift ratio of an association rule with a confidence value of 0.45 and in which the consequent occurs in 6 out of 10 cases is&#10;A)1.40.&#10;B)0.54.&#10;C)1.00.&#10;D)0.75.

Accepted Answer

The answer of The lift ratio of an association rule...

Question 34

__________ is a measure that computes the dissimilarity between a cluster AB and a cluster C by averaging the distance between A and C and the distance between B and C.&#10;A)Ward's method&#10;B)Jaccard's coefficient&#10;C)McQuitty's method&#10;D)None of these are correct.

Accepted Answer

The answer of __________ is a measure that computes the...

Question 35

Suppose that the confidence of an association rule is 0.75 and the total number of transactions is 250. How many of those transactions support the consequent if the lift ratio is 1.875?&#10;A)100&#10;B)125&#10;C)150&#10;D)175

Accepted Answer

The answer of Suppose that the confidence of an association...

Question 36

Complete linkage can be used to measure the distance between _________ in cluster analysis.&#10;A)objects&#10;B)clusters&#10;C)observations&#10;D)wards

Accepted Answer

The answer of Complete linkage can be used to measure...

Question 37

The strength of a cluster can be measured by comparing the average distance in a cluster to the distance between cluster centroids. One rule of thumb is that the ratio for between-cluster distance to within-cluster distance should exceed what value for useful clusters?

A)0.5
B)1
C)1.5
D)2

Accepted Answer

The answer of The strength of a cluster can be...

Question 38

A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as a&#10;A)dendrogram.&#10;B)scatter chart.&#10;C)decile-wise lift chart.&#10;D)cumulative lift tree.

Accepted Answer

The answer of A tree diagram used to illustrate the...

Question 39

The process of extracting useful information from text data is known as __________.&#10;A)text mining&#10;B)tokenization&#10;C)stemming&#10;D)corpus

Accepted Answer

The answer of The process of extracting useful information from...

Question 40

The endpoint of a k-means clustering algorithm occurs when&#10;A)Euclidean distance between clusters is minimized.&#10;B)Euclidean distance between observations in a cluster is maximized.&#10;C)no further changes are observed in cluster structure and number.&#10;D)all of the observations are encompassed within a single large cluster with mean k.

Accepted Answer

The answer of The endpoint of a k-means clustering algorithm...

Question 41

The process of converting a word to its stem, or root word, is referred to as __________.&#10;A)data cleaning&#10;B)stemming&#10;C)tokenization&#10;D)stacking

Accepted Answer

The answer of The process of converting a word to...

Question 42

In the text mining process, the text is first preprocessed by deriving a smaller set of _________ from the larger set of words contained in a collection of documents.&#10;A)tokens&#10;B)stems&#10;C)terms&#10;D)stack

Accepted Answer

The answer of In the text mining process, the text...

Question 43

A collection of text documents to be analyzed is called a ___________.&#10;A)book&#10;B)corpus&#10;C)library&#10;D)consequent

Accepted Answer

The answer of A collection of text documents to be...

Question 44

The process of dividing text into separate terms is referred to as __________.&#10;A)data cleaning&#10;B)stemming&#10;C)tokenization&#10;D)stacking

Accepted Answer

The answer of The process of dividing text into separate...

Deck 4: Descriptive Data Mining