Question 1

Which of the following statements a), b) or c) is false?&#10;A) Different database users are often interested in different data and different relationships among the data.&#10;B) Most users require only subsets of a database table's rows and columns.&#10;C) You use xe &#34;Structured Query Language (SQL)&#34;Structured Query Language (xe &#34;SQL (Structured Query Language)&#34;xe &#34;SQL (Structured Query Language)&#34;SQL) to define xe &#34;query&#34;queries. Queries specify which subsets of the data to select from a table.&#10;D) All of the above statements are true.

Accepted Answer

All the statements provided are true regarding database usage and the functionality of SQL in querying databases. Statement A is true because different users may need access to different parts of the database depending on their role or interest. Statement B is true as most users do not need access to all the data within a table and often only require specific rows and columns. Statement C is accurate in describing the use of SQL for defining queries to select subsets of data from a table.

Question 2

Which of the following statements a), b) or c) is false?&#10;A) A xe &#34;primary key&#34;primary key is a column (or group of columns) with a value that's unique for each row. This guarantees that each row can be identified by its primary key.&#10;B) Examples of primary keys are social security numbers, employee ID numbers and part numbers in an inventory system-values in each of these are guaranteed to be unique.&#10;C) The rows of a relational database table are always listed in ascending order by primary key.&#10;D) All of the above statements are true.

Accepted Answer

The statement "The rows of a relational database table are always listed in ascending order by primary key" is false. Rows in a relational database table can be sorted in any order, there is no requirement or guarantee that they will be sorted by the primary key.

Question 3

Which of the following statements is false?&#10;A) An Uxe &#34;UPDATE SQL statement&#34;PDATE statement modifies existing values in a table.&#10;B) The UPDATE keyword is followed by the table to update, the keyword Sxe &#34;SET SQL clause&#34;ET and a comma-separated list of column_name : value pairs indicating the columns to change and their new values.&#10;C) An UPDATE's change will be applied to every row if you do not specify a xe &#34;WHERE SQL clause&#34;WHERE clause. To make a change to only one row, it's best to use the row's unique primary key in the WHERE clause.&#10;D) For statements that modify the database, the Cursor object's rowcount attribute contains an integer value representing the number of rows that were modified. If this value is 0, no changes were made.

Accepted Answer

The correct syntax for an UPDATE statement uses an equal sign (=) to assign new values to columns, not a colon (:).

Question 4

Which of the following statements is false?&#10;A) You'll often select only a subset of the rows in a database that satisfy certain xe &#34;selection criteria&#34;selection criteria.&#10;B) Only xe &#34;row in a database table&#34;rows that satisfy the selection criteria mentioned in Part (a)-formally called xe &#34;predicate&#34;predicates-are selected.&#10;C) SQL's Wxe &#34;WHERE SQL clause&#34;HERE clause specifies a query's selection criteria.&#10;D) The following code selects from a titles table the title, edition and copyright for all books with the copyright year 2016:&#10;Pd)read_sql(&#34;&#34;&#34;SELECT title, edition, copyright&#10;FROM titles&#10;WHERE copyright > '2016'&#34;&#34;&#34;, connection)

Accepted Answer

The code in option D selects books with a copyright year greater than 2016, which means it does not select books with the copyright year 2016. Therefore, the statement that the code selects all books with the copyright year 2016 is false.

Question 5

Xe &#34;author_ISBN table of books database&#34;In SQL, a foreign key is a column in one table that matches a ________ column in another table.&#10;A) domestic key&#10;B) candidate key&#10;C) primary key&#10;D) None of the above

Accepted Answer

A foreign key is a column in one table that matches the primary key column in another table.

Question 6

xe &#34;underscore:_ SQL wildcard character&#34;________ in a pattern string indicates a single wildcard character at that position.&#10;A) at sign (@)&#10;B) uxe &#34;_ SQL wildcard character&#34;nderscore (_)&#10;C) hash sign (#)&#10;D) None of the above.

Accepted Answer

The underscore (_) in SQL is a single wildcard character that matches any single character in a string.

Question 7

Xe &#34;author_ISBN table of books database&#34;In a relational database, every foreign-key value must appear as the primary-key value in a row of another table so the DBMS can ensure that the foreign-key value is valid. This is known as the ________.&#10;A) Rule of Entity Integrity&#10;B) xe &#34;Rule of Referential Integrity&#34;Rule of Referential Integrity&#10;C) Rule of Guaranteed Access&#10;D) None of the above

Accepted Answer

The Rule of Referential Integrity states that every foreign key value must appear as a primary key value in another table in order to ensure that the foreign key value is valid. The other options listed (Rule of Entity Integrity and Rule of Guaranteed Access) do not relate to this specific concept.

Question 8

Which of the following statements a), b) or c) is false?&#10;A) As data continues growing exponentially, we want to learn from that data and do so at blazing speed.&#10;B) Learning from big data requires sophisticated algorithms, hardware, software and networking designs.&#10;C) With more data, and especially with big data, machine learning can be even more effective.&#10;D) All of the above statements are true.

Accepted Answer

All the statements provided highlight key aspects of handling and learning from big data, emphasizing the importance of sophisticated technology and the effectiveness of machine learning with larger datasets. None of the statements is false.

Question 9

Which of the following statements is false?&#10;A) A xe &#34;relational database&#34;xe &#34;relational database&#34;relational database is a logical table-based representation of data that allows the data to be accessed without consideration of its physical structure.&#10;B) The following diagram shows a sample Employee table that might be used in a personnel system:&#10; &#10;C) Part (b)'s Employee table's primary purpose is to store employees' attributes.&#10;D) Tables are composed of columns, each describing a single entity. In Part (b)'s Employee table, each column represents one employee. Columns are composed of rows containing individual attribute values.

Accepted Answer

The answer of Which of the following statements is false?&#10;A)...

Question 10

Which of the following statements a), b) or c) is false?&#10;A) As big-data processing needs grow, the information-technology community is continually looking for ways to increase performance.&#10;B) Spark was developed to perform certain big-data tasks more efficiently by breaking them into pieces that do lots of disk I/O across many computers.&#10;C) Spark streaming processes streaming data in mini-batches. Spark streaming gathers data for a short time interval you specify, then gives you that batch of data to process.&#10;D) You can use xe &#34;Spark (Apache):Spark SQL&#34;Spark SQL to query data stored in a Spark DataFrame which, unlike pandas DataFrames, may contain data distributed over many computers in a cluster.

Accepted Answer

Spark was developed to improve upon the limitations of Hadoop's MapReduce, by performing computations in memory rather than through extensive disk I/O operations. This approach significantly speeds up data processing tasks.

Question 11

Which of the following statements a), b) or c) is false?&#10;A) In the context of big data, NoSQL means what its name implies.&#10;B) NoSQL databases are meant for xe &#34;unstructured data&#34;unstructured data, like photos, videos and the natural language found in e-mails, text messages and social-media posts, and xe &#34;semi-structured data&#34;semi-structured data like JSON and XML documents.&#10;C) Semi-structured data often wraps unstructured data with additional information called metadata.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 12

Which of the following statements is false?&#10;A) For decades, relational database management systems (RDBMs) have been the standard in data processing.&#10;B) RDBMs require xe &#34;structured data&#34;unstructured data that fits into neat rectangular tables.&#10;C) As the size of the data and the number of tables and relationships increases, relational databases become more difficult to manipulate efficiently.&#10;D) xe &#34;NoSQL database&#34;NoSQL and xe &#34;NewSQL database&#34;NewSQL databases have emerged to deal with the kinds of big data storage and processing demands that traditional relational databases cannot meet.

Accepted Answer

The answer of Which of the following statements is false?&#10;A)...

Question 13

Which of the following statements a), b) or c) is false?&#10;A) The open-source xe &#34;SQLite database management system&#34;SQLite database management system is included with Python.&#10;B) Only the SQLite database management system has Python support.&#10;C) Each database management system that has Python support typically provides a module that adheres to Python's Database Application Programming Interface (DB-API), which specifies common object and method names for manipulating any database.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 14

Which of the following statements a), b) or c) is false?&#10;A) You can merge data from multiple tables, referred to as xe &#34;joining database tables&#34;joining the tables, with Ixe &#34;INNER JOIN SQL clause&#34;NNER JOIN.&#10;B) The INNER JOIN's Oxe &#34;ON clause&#34;N clause uses a primary-key column in one table and a foreign-key column in the other table to determine which rows to merge from each table.&#10;C) xe &#34;qualified name&#34;Qualified name syntax (tableName.columnName) is required if the columns have the same name in both tables.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 15

Which of the following statements is false?&#10;A) SQL can be used only to retrieve data from a relational database.&#10;B) The pandas method read_sql uses a Cursor behind the scenes to execute queries and access the rows of the results.&#10;C) The INSERT INTO statement inserts a xe &#34;row in a database table&#34;row into a table.&#10;D) The SQL keywords INSERT INTO are followed by the table in which to insert the new row and a comma-separated list of column names in parentheses.

Accepted Answer

The answer of Which of the following statements is false?&#10;A)...

Question 16

Xe &#34;author_ISBN table of books database&#34;A goal when designing a relational database is to minimize data ________ among the tables.&#10;A) dependency&#10;B) binding&#10;C) duplication&#10;D) None of the above

Accepted Answer

The answer of Xe &#34;author_ISBN table of books database&#34;A goal...

Question 17

Which of the following statements a), b) or c) is false?&#10;A) Much of today's data is so large that it cannot fit on one system.&#10;B) As big data grew, we needed distributed data storage and parallel processing capabilities to process vast amounts of data more efficiently. This led to complex technologies like xe &#34;Hadoop (Apache)&#34;xe &#34;Apache Hadoop&#34;Apache Hadoop for distributed data processing with massive parallelism among clusters of computers where the intricate details are handled for you automatically and correctly.&#10;C) You can configure a multi-node Hadoop cluster using the Microsoft Azure HDInsight cloud service, then use it to execute a Hadoop MapReduce job implemented in Python.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 18

In the SQL query: SELECT * FROM authors&#10;The xe &#34;* SQL wildcard character&#34;asterisk (*) is a ________ indicating that the query should get all the columns from the authors table.&#10;A) potpourri character&#10;B) catchall&#10;C) wildcard&#10;D) None of the above

Accepted Answer

The answer of In the SQL query: SELECT * FROM...

Question 19

Which of the following statements is false?&#10;A) A database is an integrated collection of data.&#10;B) Database management systems allow for convenient access and storage of data without concern for the internal representation of databases.&#10;C) Rxe &#34;relational database:relational database management system (RDBMS)&#34;elational database management systems (RDBMSs) store data in xe &#34;&#34;tables and define relationships among the tables. xe &#34;Structured Query Language (SQL)&#34;&#10;D) Table Query Language is used almost universally with relational database systems to manipulate data and perform queries, which request information that satisfies given criteria.

Accepted Answer

The answer of Which of the following statements is false?&#10;A)...

Question 20

Which of the following statements a), b) or c) is false?&#10;A) Each xe &#34;column:in a database table&#34;column in a relational database table represents a different data attribute.&#10;B) Columns are unique (by primary key) within a table, but particular row values may be duplicated between columns.&#10;C) Several rows in an Employee table's Department column could contain the same department number.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 21

Which of the following statements a), b) or c) is false.&#10;A) The types of applications that use NoSQL databases typically do not require the guarantees that ACID-compliant databases provide.&#10;B) Many NoSQL databases typically adhere to the BASE (Basic Availability, Soft-state, Eventual consistency) model, which focuses more on the database's availability.&#10;C) Whereas BASE databases guarantee consistency when you write to the database, ACID databases provide consistency at some later point in time.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 22

A ________ shades areas in a Folium map using the values you specify to determine color.&#10;A) chromatic&#10;B) choropleth&#10;C) variagator&#10;D) None of the above

Accepted Answer

The answer of A ________ shades areas in a Folium...

Question 23

Which of the following statements a), b) or c) about Google's initial search implementation is false?&#10;A) Google developed a clustering system, tying together vast numbers of inexpensive &#34;commodity computers&#34;-called nodes.&#10;B) Because having more computers and more connections between them meant greater chance of hardware failures, Google also built in high levels of redundancy to ensure that the system would continue functioning even if nodes within clusters failed.&#10;C) The data was distributed across all the inexpensive &#34;commodity computers.&#34; To satisfy a search request, all the computers in the cluster searched in parallel the portion of the web they stored locally. Then the results of those searches were gathered up and reported back to the user.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 24

Which of the following statements a), b) or c) is false?&#10;A) When Google was launched in 1998, there were approximately 2.4 million websites-truly big data at the time. Today there are now nearly two billion websites (almost a thousandfold increase since 1998).&#10;B) When Google was developing their search engine, they knew that they needed to return search results quickly. The only practical way to do this was to store and index the entire Internet using a clever combination of secondary storage and main memory.&#10;C) Popular computers of that time couldn't hold that amount of data and could not analyze that amount of data fast enough to guarantee prompt search-query responses.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 25

Which of the following statements a), b) or c) is false?&#10;A) A document database stores xe &#34;semi-structured data&#34;semi-structured data, such as xe &#34;JSON (JavaScript Object Notation)&#34;JSON or xe &#34;XML&#34;XML documents.&#10;B) In document databases, you typically add indexes for specific attributes, so you can more efficiently locate and manipulate documents.&#10;C) The most popular document database (and most popular overall NoSQL database) is Neo4j.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 26

Which of the following statements a), b) or c) about graph databases is false?&#10;A) A graph database models relationships between objects.&#10;B) The objects are called nodes (or vertices) and the relationships are called edges.&#10;C) Edges are bidirectional.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 27

Relational databases typically use ACID (Atomicity, xe "consistency"Consistency, Isolation, Durability) transactions. Which of the following ACID attributes is described by: "ensures that the database is modified only if all of a transaction's steps are successful?"

A) Atomicity
B) Consistency
C) Isolation
D) Durability

Accepted Answer

The answer of Relational databases typically use ACID (Atomicity, xe...

Question 28

Which of the following statements a), b) or c) is false?&#10;A) The four xe &#34;NoSQL database&#34;NoSQL database categories are xe &#34;NoSQL database:key-value&#34;xe &#34;key-value:database[key value]&#34;hierarchical, xe &#34;NoSQL database:document database&#34;xe &#34;document database&#34;document, xe &#34;NoSQL database:columnar database&#34;xe &#34;columnar database (NoSQL)&#34;columnar (also called xe &#34;NoSQL database:column based &#34;xe &#34;columnar database (NoSQL):column-oriented database&#34;column-based) and xe &#34;NoSQL database:graph database&#34;xe &#34;graph database&#34;graph.&#10;B) NewSQL databases blend features of relational and NoSQL databases.&#10;C) We presented a case study in which we stored and manipulated a large number of JSON tweet objects in a NoSQL document database, then summarized the data in an interactive visualization displayed on a Folium map of the United States.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 29

Which of the following statements a), b) or c) is false?&#10;A) MongoDB is a document database capable of storing and retrieving JSON documents.&#10;B) Twitter's APIs return tweets to you as xe &#34;JSON (JavaScript Object Notation):object&#34;JSON objects, which you can write directly into a MongoDB database.&#10;C) MongoDB provides the free cloud-based MongoDB Atlas cluster for installation on your local computer.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 30

Relational databases typically use ACID (Atomicity, xe "consistency"Consistency, Isolation, Durability) transactions. Which of the following ACID attributes is described by: "ensures that the database is modified only if all of a transaction's steps are successful?"

A) Atomicity
B) Consistency
C) Isolation
D) Durability

Accepted Answer

The answer of Relational databases typically use ACID (Atomicity, xe...

Question 31

Which of the following statements a), b) or c) is false?&#10;A) A graph database stores nodes, edges and their attributes. If you use social networks, like Instagram, Snapchat, Twitter and Facebook, consider your xe &#34;social graph&#34;social graph, which consists of the people you know (nodes) and the relationships between them (edges). Every person has their own social graph, and these are interconnected.&#10;B) The famous &#34;six degrees of separation&#34; problem says that any two people in the world are connected to one another by following a maximum of six edges in the worldwide social graph.&#10;C) Facebook's algorithms use the social graphs of their billions of users to determine which stories should appear in each user's news feed.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 32

Which of the following statements is false?&#10;A) Pandas DataFrame method groupby groups data by a specified column's values, as in:&#10;Tweets_counts_by_state = tweet_counts_df.groupby(&#10;'State', as_index=False).sum()&#10;B) The as_index=False keyword argument in Part (a) indicates that the values on which grouping was performed ('State' in this case) should be values in a row of the resulting GroupBy object, rather than the indices for the columns.&#10;C) The GroupBy object's sum method, which is called at the end of the snippet in Part (a), totals the GroupBy object's numeric data by 'State'.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements is false?&#10;A)...

Question 33

Which of the following statements a), b) or c) is false?&#10;A) Like Python dictionaries, key-value databases store key-value pairs, but they're optimized for distributed systems and big-data processing.&#10;B) For performance, key-value databases tend to replicate data in multiple cluster nodes.&#10;C) Some key-value databases are implemented in memory for performance, and others store data on disk.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 34

Which of the following statements about text searching in MongoDB is false?&#10;A) To xe &#34;MongoDB document database:text search&#34;xe &#34;text search&#34;text search in MongoDB, you must create a xe &#34;MongoDB document database:text index&#34;xe &#34;text index&#34;text index for the collection. This specifies which document field(s) to search.&#10;B) Each text index is defined as a tuple containing the field name to search and the index type ('text_index').&#10;C) MongoDB's xe &#34;MongoDB document database:wildcard specifier ($**)&#34;xe &#34;wildcard specifier ($**)&#34;wildcard specifier $** indicates that every text field in a document should be indexed for a full-text search.&#10;D) Once an index is defined for a Collection, you can use its count_documents method to count the total number of documents in the collection that contain the specified text.

Accepted Answer

The answer of Which of the following statements about text...

Question 35

Which of the following statements a), b) or c) is false?&#10;A) To store tweets' JSON as documents in a MongoDB database, you must first connect to your MongoDB Atlas cluster via a pymongo MongoClient, which receives your cluster's xe &#34;connection string (MongoDB)&#34;connection string as its argument, as in:&#10;From pymongo import MongoClient&#10;Atlas_client = MongoClient(keys.mongo_connection_string)&#10;B) The following code uses a pymongo MongoClient to get a pymongo Database object representing a senators database, creating the database if it does not exist:&#10;Db = atlas_client.senators&#10;C) Before storing JSON objects in a collection of a MongoDB database, you must explicitly create the collection.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 36

________ your IP address is a MongoDB security measure which ensures that only IP addresses you verify are allowed to interact with your MongoDB Atlas cluster.&#10;A) Blocking&#10;B) Blacklisting&#10;C) Whitelisting&#10;D) None of the above.

Accepted Answer

The answer of ________ your IP address is a MongoDB...

Question 37

The following code loads senators.csv into a ________. import pandas as pd
Senators_df = pd.read_csv('senators.csv')

A) NumPy two-dimensional array
B) two-dimensional list
C) pandas DataFrame
D) dictionary

Accepted Answer

The answer of The following code loads senators.csv into a...

Question 38

YouTube videos including the associated metadata are ________ data.&#10;A) semi-structured&#10;B) structured&#10;C) unstructured&#10;D) None of the above

Accepted Answer

The answer of YouTube videos including the associated metadata are...

Question 39

Which of the following statements about columnar databases a), b) or c) is false?&#10;A) A columnar database is similar to a relational database, but it stores unstructured data in columns rather than rows.&#10;B) Because all of a column's elements are stored together, selecting all the data for a given column is more efficient.&#10;C) Consider our authors table in the books database: first last&#10;Id&#10;1 Paul Deitel&#10;2 Harvey Deitel&#10;3 Abbey Deitel&#10;4 Dan Quirk&#10;5 Alexander Wald&#10;If we consider each row as a Python tuple, the rows would be represented as (1, 'Paul', 'Deitel'), (2, 'Harvey', 'Deitel'), etc. In a columnar database, all the values for a given column would be stored together, as in (1, 2, 3, 4, 5), ('Paul', 'Harvey', 'Abbey', 'Dan', 'Alexander') and ('Deitel', 'Deitel', 'Deitel', 'Quirk', 'Wald').&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements about columnar...

Question 40

A JSON dialect called ________ describes the boundaries of shapes such as countries, states, etc. for use in maps.&#10;A) BoundaryJSON&#10;B) GeoJSON&#10;C) TopographyJSON&#10;D) None of the above

Accepted Answer

The answer of A JSON dialect called ________ describes the...

Question 41

Consider the following reducer code: 1 #!/usr/bin/env python3
2 # length_reducer.py
3 """Counts the number of words with each length."""
4 import sys
5 from itertools import groupby
6 from operator import itemgetter
7
8 def tokenize_input():
9 """Split each line of standard input into a key and a value."""
10 for line in sys.stdin:
11 yield line.strip().split('\t')
12
13 # produce key-value pairs of word lengths and counts separated by tabs
14 for word_length, group in groupby(tokenize_input(), itemgetter(0)):
15 try:
16 total = sum(int(count) for word_length, count in group)
17 print(word_length + '\t' + str(total))
18 except ValueError:
19 pass # ignore word if its count was not an integer
Which of the following statements a), b) or c) is false?

A) When the MapReduce algorithm executes this reducer, lines 14-19 use the groupby function from the itertools module to group all word lengths of the same value. The first argument calls tokenize_input to get the lists representing the key-value pairs. The second argument indicates that the key-value pairs should be grouped based on the element at index 0 in each list-that is the key.
B) Line 16 totals all the counts for a given key. Line 17 outputs a new key-value pair consisting of the word length and the total number of words of that length.
C) The MapReduce algorithm takes all the final word length and count outputs and writes them to a file in HDFS-the Hadoop file system.
D) All of the above statements are true.

Accepted Answer

The answer of Consider the following reducer code: 1 #!/usr/bin/env...

Question 42

Which of the following statements a), b) or c) is false?&#10;A) For high-performance, Spark distributes the operations you specify in Python to the cluster's nodes for parallel execution. xe &#34;Spark (Apache):streaming&#34;Spark streaming enables you to process data as it's received.&#10;B) Pandas DataFrames enable you to view RDDs as a collection of named columns. You can use pandas DataFrames with Spark SQL to perform queries on distributed data.&#10;C) Spark also includes Spark MLlib (the Spark Machine Learning Library), which enables you to perform machine-learning algorithms.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 43

Which Hadoop ecosystem technology is described by &#34;A service for managing cluster configurations and coordination between clusters?&#34;&#10;A) Sqoop&#10;B) Storm&#10;C) ZooKeeper&#10;D) None of the above

Accepted Answer

The answer of Which Hadoop ecosystem technology is described by...

Question 44

Hadoop streaming uses the standard input and standard output streams as follows:&#10;A) Hadoop supplies the input to the mapping script-called the mapper. This script reads its input from the standard input stream. The mapper writes its results to the standard output stream.&#10;B) Hadoop supplies the mapper's output as the input to the reduction script-called the reducer-which reads from the standard input stream.&#10;C) The reducer writes its results to the standard output stream. Hadoop writes the reducer's output to the Hadoop file system (HDFS).&#10;D) All of the above statements are true.

Accepted Answer

The answer of Hadoop streaming uses the standard input and...

Question 45

Which of the following statements about MapReduce is false&#10;A) In the MapReduce step, Hadoop divides the data into batches that it distributes across the nodes in the cluster.&#10;B) Hadoop also distributes the MapReduce task's code to the nodes in the cluster and executes the code on one node at a time sequentially. Each node processes only the batch of data stored on that node.&#10;C) The reduction step combines the results from all the nodes to produce the final result.&#10;D) To coordinate all this, Hadoop uses YARN (&#34;yet another resource negotiator&#34;) to manage all the resources in the cluster and schedule tasks for execution.

Accepted Answer

The answer of Which of the following statements about MapReduce...

Question 46

Consider the following reducer code: 1 #!/usr/bin/env python3
2 # length_reducer.py
3 """Counts the number of words with each length."""
4 import sys
5 from itertools import groupby
6 from operator import itemgetter
7
8 def tokenize_input():
9 """Split each line of standard input into a key and a value."""
10 for line in sys.stdin:
11 yield line.strip().split('\t')
12
13 # produce key-value pairs of word lengths and counts separated by tabs
14 for word_length, group in groupby(tokenize_input(), itemgetter(0)):
15 try:
16 total = sum(int(count) for word_length, count in group)
17 print(word_length + '\t' + str(total))
18 except ValueError:
19 pass # ignore word if its count was not an integer
Which of the following statements a), b) or c) is false?

A) Function tokenize_input is a generator function that reads and splits the key-value pairs produced by the mapper.
B) The mapper script sends its output directly to the reducer script.
C) For each line, tokenize_input strips any leading or trailing whitespace (such as the terminating newline) and yields a list containing the key and a value.
D) All of the above statements are true.

Accepted Answer

The answer of Consider the following reducer code: 1 #!/usr/bin/env...

Question 47

Which of the following statements a), b) or c) is false?&#10;A) For languages like Python that are not natively supported in Hadoop, you must use Hadoop streaming to implement your tasks.&#10;B) In Hadoop streaming, the Python scripts that implement the mapping and reduction steps use network sockets to communicate with Hadoop.&#10;C) Usually, the standard input stream reads from the keyboard and the standard output stream writes to the command line. However, these can be redirected (as Hadoop does) to read from other sources and write to other destinations.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 48

Which of the following statements a), b) or c) is false?&#10;A) Docker is a tool for packaging software into containers that bundle everything required to execute that software across platforms.&#10;B) Some software packages require complicated setup and configuration. For many of these, there are preexisting Docker containers that you can download for free and execute locally on your desktop or notebook computers.&#10;C) You can create custom Docker containers that are configured with the versions of every piece of software and every library you used in your study. This would enable others to recreate the environment you used, then reproduce your work, and will help you reproduce your results at a later time.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 49

Which of the following statements a), b) or c) is false?&#10;A) When you process truly big data, performance is crucial.&#10;B) Spark is geared to disk-based batch processing-reading the data from disk, processing the data and writing the results back to disk.&#10;C) Many big-data applications demand better performance than is possible with disk-intensive operations. In particular, fast streaming applications that require either real-time or near-real-time processing won't work in a disk-based architecture.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 50

Which Hadoop ecosystem technology is described by &#34;SQL querying of non-relational data in Hadoop and NoSQL databases.&#34;&#10;A) Ambari&#10;B) Drill&#10;C) Flume&#10;D) HBase

Accepted Answer

The answer of Which Hadoop ecosystem technology is described by...

Question 51

Which of the following statements is false?&#10;A) To develop its initial search implementation, Google needed to develop the clustering hardware and software, including distributed storage.&#10;B) Google published its designs in the research paper &#34;The Google File System,&#34; but did not open source its software.&#10;C) Programmers at Yahoo!, working from Google's designs in the &#34;Google File System&#34; paper, then built their own system.&#10;D) Yahoo! open-sourced their work and the Eclipse Foundation implemented the system as Hadoop.

Accepted Answer

The answer of Which of the following statements is false?&#10;A)...

Question 52

Which of the following Hadoop ecosystem technologies is described by &#34;real-time messaging, stream processing and storage, typically to transform and process high-volume streaming data, such as website activity and streaming IoT data.&#34;&#10;A) Hive&#10;B) Impala&#10;C) Kafka&#10;D) Pig

Accepted Answer

The answer of Which of the following Hadoop ecosystem technologies...

Question 53

Which of the following statements a), b) or c) is false?&#10;A) By default, Hadoop expects the mapper's output and the reducer's input and output to be in the form of key-value pairs separated by a tab.&#10;B) In a mapper script, the notation #!/usr/bin/env python3 tells Hadoop to execute the Python code using python3. This line must come before all other comments and code in the file.&#10;C) At the time of this writing, Microsoft HDInsight clusters contain Python 2.7.12 and Python 3.5.2, so you can use f-strings in your code.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 54

Which of the following statements is false?&#10;A) Two key Hadoop components are HDFS (Hadoop Distributed File System) for storing massive amounts of data throughout a cluster, and MapReduce for implementing the tasks that process the data.&#10;B) Hadoop MapReduce is similar in concept to the functional-style programming, just on a xe &#34;massively parallel processing&#34;massively parallel scale.&#10;C) A MapReduce task performs two steps-mapping and reduction.&#10;D) The mapping step processes the original data across the entire cluster and maps it into tuples of key-value pairs. The reduction step, which also may include filtering, then combines those tuples to produce the results of the MapReduce task.

Accepted Answer

The answer of Which of the following statements is false?&#10;A)...

Question 55

Which of the following statements a), b) or c) is false?&#10;A) Most major cloud vendors have support for Hadoop and Spark computing clusters that you can configure to meet your application's requirements.&#10;B) Multi-node cloud-based clusters typically are free services.&#10;C) Microsoft Azure's HDInsight service provides Hadoop capabilities.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 56

Which of the following statements is false?&#10;A) Spark was initially developed in 2009 at U. C. Berkeley and funded by xe &#34;DARPA (the Defense Advanced Research Projects Agency)&#34;DARPA (the Defense Advanced Research Projects Agency).&#10;B) Spark was created as a distributed execution engine for high-performance natural language processing.&#10;C) Spark uses an in-memory architecture that &#34;has been used to sort 100 TB of data 3X faster than Hadoop MapReduce on 1/10th of the machines&#34; and runs some workloads up to 100 times faster than Hadoop.&#10;D) Spark's significantly better performance on batch-processing tasks is leading many companies to replace Hadoop MapReduce with Spark.

Accepted Answer

The answer of Which of the following statements is false?&#10;A)...

Question 57

Consider the following mapper code: 1 #!/usr/bin/env python3&#10;2 # length_mapper.py&#10;3 &#34;&#34;&#34;Maps lines of text to key-value pairs of word lengths and 1.&#34;&#34;&#34;&#10;4 import sys&#10;5&#10;6 def tokenize_input():&#10;7 &#34;&#34;&#34;Split each line of standard input into a list of strings.&#34;&#34;&#34;&#10;8 for line in sys.stdin:&#10;9 yield line.split()&#10;10&#10;11 # read each line in the the standard input and for every word&#10;12 # produce a key-value pair containing the word, a tab and 1&#10;13 for line in tokenize_input():&#10;14 for word in line:&#10;15 print(str(len(word)) + '	1')&#10;Which of the following statements a), b) or c) is false.&#10;A) Generator function tokenize_input reads lines of text from the standard input stream and for each returns a list of strings.&#10;B) When Hadoop executes the script, lines 13-15 iterate through the lists of strings from tokenize_input. For each list (line) and for every string (word) in that list, the script outputs a key-value pair with the word's length as the key, a tab (	) and the value 1, indicating that there is one word (so far) of that length. Of course, there probably are many words of that length.&#10;C) The MapReduce algorithm's reduction step will summarize these key-value pairs.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Consider the following mapper code: 1 #!/usr/bin/env...

Question 58

Which of the following statements a), b) or c) is false?&#10;A) Every time you start a container with docker run, Docker gives you a new instance that contains any libraries you installed previously.&#10;B) The command docker stop container_name&#10;Will shut down the specified container. The command&#10;Docker restart container_name&#10;Will restart the specified container.&#10;C) Docker also provides a GUI app called xe &#34;Kitematic (Docker GUI app)&#34;Kitematic that you can use to manage your containers, including stopping and restarting them.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 59

Which of the following statements a), b) or c) is false?&#10;A) Hadoop providers typically also provide Spark support.&#10;B) Databricks is a Spark-specific vendor-they provide a &#34;zero-management cloud platform built around Spark.&#34; Their website also is an excellent resource for learning Spark.&#10;C) The paid Databricks platform runs on Amazon AWS or Microsoft Azure. Databricks also provides a free Databricks Community Edition, which is a great way to get started with both Spark and the Databricks environment.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 60

Which of the following statements a), b) or c) is false?&#10;A) Numerous cloud vendors provide Hadoop as a service.&#10;B) In addition, companies like xe &#34;Cloudera CDH&#34;Cloudera and xe &#34;Hortonworks&#34;Hortonworks (recently merged) offer integrated Hadoop-ecosystem components and tools via the major cloud vendors.&#10;C) Cloudera and xe &#34;Hortonworks&#34;Hortonworks also offer free downloadable environments that you can run on the desktop for learning, development and testing before you commit to cloud-based hosting, which can incur significant costs.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 61

Which of the following statements is false?&#10;A) You can use SQL to query data in xe &#34;resilient distributed dataset (RDD)&#34;resilient distributed datasets (RDDs).&#10;B) Spark SQL uses a Spark DataFrame to get a table view of the underlying RDDs.&#10;C) A SparkSession (module pyspark.sql) is used to create a xe &#34;DataFrame (Spark)&#34;xe &#34;modules:pyspark.sql&#34;xe &#34;pyspark.sql module&#34;xe &#34;modules:pyspark.sql&#34;xe &#34;DataFrame (Spark):pyspark.sql module&#34;DataFrame from an xe &#34;&#34;RDD. There can be only one SparkSession object per Spark application.&#10;D) In Spark streaming, a DStream is a sequence of xe &#34;RDD (resilient distributed dataset)&#34;xe &#34;resilient distributed dataset (RDD)&#34;RDDs each representing a mini-batch of data to process

Accepted Answer

The answer of Which of the following statements is false?&#10;A)...

Question 62

Which of the following statements a), b) or c) is false?&#10;A) In the late 1960s, the Internet began as the xe &#34;ARPANET&#34;ARPANET, which initially connected four universities and grew to 10 nodes by the end of 1970.&#10;B) In the last 50 years, the Internet has grown to billions of computers, smartphones, tablets and an enormous range of other device types connected to the Internet worldwide.&#10;C) Every device is a &#34;thing&#34; in the xe &#34;IoT (Internet of Things)&#34;xe &#34;Internet of Things (IoT)&#34;Internet of Things (IoT).&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 63

Every RDD has access to the current SparkContext via the attribute ________.&#10;A) cluster&#10;B) broadcast&#10;C) context&#10;D) connection

Accepted Answer

The answer of Every RDD has access to the current...

Question 64

The following dashboard visualizes simulated sensors from the PubNub simulated IoT sensors stream:   For each sensor, the visualization shows a Gauge (the semicircular visualizations) and a ________ (the jagged lines) to visualize the data.&#10;A) Sparkleline&#10;B) Glowline&#10;C) Sparkline&#10;D) None of the above

Accepted Answer

The answer of The following dashboard visualizes simulated sensors from...

Deck 17: Big Data: Hadoop, Spark, Nosql and Iot