Deck 1: Cloudera Certified Administrator for Apache Hadoop (CCAH)

ملء الشاشة (f)
exit full mode
سؤال
Cluster Summary: 45 files and directories, 12 blocks = 57 total. Heap size is 15.31 MB/193.38MB(7%) 11ec3bb2_d99a_f026_916b_b9ef210356fc__00 Refer to the above screenshot. You configure a Hadoop cluster with seven DataNodes and on of your monitoring UIs displays the details shown in the exhibit. What does the this tell you?

A) The DataNode JVM on one host is not active
B) Because your under-replicated blocks count matches the Live Nodes, one node is dead, and your DFS Used % equals 0%, you can't be certain that your cluster has all the data you've written it.
C) Your cluster has lost all HDFS data which had bocks stored on the dead DatNode
D) The HDFS cluster is in safe mode
استخدم زر المسافة أو
up arrow
down arrow
لقلب البطاقة.
سؤال
You are working on a project where you need to chain together MapReduce, Pig jobs. You also need the ability to use forks, decision points, and path joins. Which ecosystem project should you use to perform these actions?

A) Oozie
B) ZooKeeper
C) HBase
D) Sqoop
E) HUE
سؤال
You suspect that your NameNode is incorrectly configured, and is swapping memory to disk. Which Linux commands help you to identify whether swapping is occurring? (Select all that apply)

A) free
B) df
C) memcat
D) top
E) jps
F) vmstat
G) swapinfo
سؤال
Which two features does Kerberos security add to a Hadoop cluster? (Choose two)

A) User authentication on all remote procedure calls (RPCs)
B) Encryption for data during transfer between the Mappers and Reducers
C) Encryption for data on disk ("at rest")
D) Authentication for user access to the cluster against a central server
E) Root access to the cluster for users hdfs and mapred but non-root access for clients
سؤال
You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25 KB. Because you Hadoop cluster isn't optimized for storing and processing many small files, you decide to do the following actions: 1. Group the individual images into a set of larger files 2. Use the set of larger files as input for a MapReduce job that processes them directly with python using Hadoop streaming. Which data serialization system gives the flexibility to do this?

A) CSV
B) XML
C) HTML
D) Avro
E) SequenceFiles
F) JSON
سؤال
You are configuring your cluster to run HDFS and MapReducer v2 (MRv2) on YARN. Which two daemons needs to be installed on your cluster's master nodes? (Choose two)

A) HMaster
B) ResourceManager
C) TaskManager
D) JobTracker
E) NameNode
F) DataNode
سؤال
Which is the default scheduler in YARN?

A) YARN doesn't configure a default scheduler, you must first assign an appropriate scheduler class in yarn-site.xml
B) Capacity Scheduler
C) Fair Scheduler
D) FIFO Scheduler
سؤال
What two processes must you do if you are running a Hadoop cluster with a single NameNode and six DataNodes, and you want to change a configuration parameter so that it affects all six DataNodes. (Choose two)

A) You must modify the configuration files on the NameNode only. DataNodes read their configuration from the master nodes
B) You must modify the configuration files on each of the DataNodes machines
C) You don't need to restart any daemon, as they will pick up changes automatically
D) You must restart the NameNode daemon to apply the changes to the cluster
E) You must restart all six DatNode daemon to apply the changes to the cluster
سؤال
A slave node in your cluster has 4 TB hard drives installed (4 x 2TB). The DataNode is configured to store HDFS blocks on all disks. You set the value of the dfs.datanode.du.reserved parameter to 100 GB. How does this alter HDFS block storage?

A) 25GB on each hard drive may not be used to store HDFS blocks
B) 100GB on each hard drive may not be used to store HDFS blocks
C) All hard drives may be used to store HDFS blocks as long as at least 100 GB in total is available on the node
D) A maximum if 100 GB on each hard drive may be used to store HDFS blocks
سؤال
Identify two features/issues that YARN is designated to address: (Choose two)

A) Standardize on a single MapReduce API
B) Single point of failure in the NameNode
C) Reduce complexity of the MapReduce APIs
D) Resource pressure on the JobTracker
E) Ability to run framework other than MapReduce, such as MPI
F) HDFS latency
سؤال
Which scheduler would you deploy to ensure that your cluster allows short jobs to finish within a reasonable time without starting long-running jobs?

A) Complexity Fair Scheduler (CFS)
B) Capacity Scheduler
C) Fair Scheduler
D) FIFO Scheduler
سؤال
Which process instantiates user code, and executes map and reduce tasks on a cluster running MapReduce v2 (MRv2) on YARN?

A) NodeManager
B) ApplicationMaster
C) TaskTracker
D) JobTracker
E) NameNode
F) DataNode
G) ResourceManager
سؤال
What does CDH packaging do on install to facilitate Kerberos security setup?

A) Automatically configures permissions for log files at & MAPRED_LOG_DIR/ user logs
B) Creates users for hdfs and mapreduce to facilitate role assignment
C) Creates directories for temp, hdfs, and mapreduce with the correct permissions
D) Creates a set of pre-configured Kerberos keytab files and their permissions
E) Creates and configures your kdc with default cluster values
سؤال
Your cluster is running MapReduce version 2 (MRv2) on YARN. Your ResourceManager is configured to use the FairScheduler. Now you want to configure your scheduler such that a new user on the cluster can submit jobs into their own queue application submission. Which configuration should you set?

A) You can specify new queue name when user submits a job and new queue can be created dynamically if the property yarn.scheduler.fair.allow-undecleared-pools = true
B) Yarn.scheduler.fair.user.fair-as-default-queue = false and yarn.scheduler.fair.allow-undecleared-pools = true
C) You can specify new queue name when user submits a job and new queue can be created dynamically if yarn .schedule.fair.user-as-default-queue = false
D) You can specify new queue name per application in allocations.xml file and have new jobs automatically assigned to the application queue
سؤال
You are planning a Hadoop cluster and considering implementing 10 Gigabit Ethernet as the network fabric. Which workloads benefit the most from faster network fabric?

A) When your workload generates a large amount of output data, significantly larger than the amount of intermediate data
B) When your workload consumes a large amount of input data, relative to the entire capacity if HDFS
C) When your workload consists of processor-intensive tasks
D) When your workload generates a large amount of intermediate data, on the order of the input data itself
سؤال
On a cluster running CDH 5.0 or above, you use the hadoop fs -put command to write a 300MB file into a previously empty directory using an HDFS block size of 64 MB. Just after this command has finished writing 200 MB of this file, what would another use see when they look in directory?

A) The directory will appear to be empty until the entire file write is completed on the cluster
B) They will see the file with a ._COPYING_ extension on its name. If they view the file, they will see contents of the file up to the last completed block (as each 64MB block is written, that block becomes available)
C) They will see the file with a ._COPYING_ extension on its name. If they attempt to view the file, they will get a ConcurrentFileAccessException until the entire file write is completed on the cluster
D) They will see the file with its original name. If they attempt to view the file, they will get a ConcurrentFileAccessException until the entire file write is completed on the cluster
سؤال
Which three basic configuration parameters must you set to migrate your cluster from MapReduce 1 (MRv1) to MapReduce V2 (MRv2)? (Choose three)

A) Configure the NodeManager to enable MapReduce services on YARN by setting the following property in yarn-site.xml: yarn.nodemanager.hostname your_nodeManager_shuffle
B) Configure the NodeManager hostname and enable node services on YARN by setting the following property in yarn-site.xml: your_nodeManager_hostname
C) Configure a default scheduler to run on YARN by setting the following property in mapred-site.xml: mapreduce.jobtracker.taskScheduler org.apache.hadoop.mapred.JobQueueTaskScheduler
D) Configure the number of map tasks per jon YARN by setting the following property in mapred: mapreduce.job.maps 2
E) Configure the ResourceManager hostname and enable node services on YARN by setting the following property in yarn-site.xml: yarn.resourcemanager.hostname your_resourceManager_hostname
F) Configure MapReduce as a Framework running on YARN by setting the following property in mapred-site.xml: mapreduce.framework.name yarn
سؤال
Assuming a cluster running HDFS, MapReduce version 2 (MRv2) on YARN with all settings at their default, what do you need to do when adding a new slave node to cluster?

A) Nothing, other than ensuring that the DNS (or/etc/hosts files on all machines) contains any entry for the new node.
B) Restart the NameNode and ResourceManager daemons and resubmit any running jobs.
C) Add a new entry to /etc/nodes on the NameNode host.
D) Restart the NameNode of dfs.number.of.nodes in hdfs-site.xml
سؤال
You are running a Hadoop cluster with a NameNode on host mynamenode, a secondary NameNode on host mysecondarynamenode and several DataNodes. Which best describes how you determine when the last checkpoint happened?

A) Execute hdfs namenode -report on the command line and look at the Last Checkpoint information
B) Execute hdfs dfsadmin -saveNamespace on the command line which returns to you the last checkpoint value in fstime file
C) Connect to the web UI of the Secondary NameNode (http://mysecondary:50090/) and look at the "Last Checkpoint" information
D) Connect to the web UI of the NameNode (http://mynamenode:50070) and look at the "Last Checkpoint" information
سؤال
Your cluster is configured with HDFS and MapReduce version 2 (MRv2) on YARN. What is the result when you execute: hadoop jar SampleJar MyClass on a client machine?

A) SampleJar.Jar is sent to the ApplicationMaster which allocates a container for SampleJar.Jar
B) Sample.jar is placed in a temporary directory in HDFS
C) SampleJar.jar is sent directly to the ResourceManager
D) SampleJar.jar is serialized into an XML file which is submitted to the ApplicatoionMaster
سؤال
Given: 11ec3bb2_d99c_4fb7_916b_074cfa3d65b6__00 You want to clean up this list by removing jobs where the State is KILLED. What command you enter?

A) Yarn application -refreshJobHistory
B) Yarn application -kill application_1374638600275_0109
C) Yarn rmadmin -refreshQueue
D) Yarn rmadmin -kill application_1374638600275_0109
سؤال
Assume you have a file named foo.txt in your local directory. You issue the following three commands: Hadoop fs -mkdir input Hadoop fs -put foo.txt input/foo.txt Hadoop fs -put foo.txt input What happens when you issue the third command?

A) The write succeeds, overwriting foo.txt in HDFS with no warning
B) The file is uploaded and stored as a plain file named input
C) You get a warning that foo.txt is being overwritten
D) You get an error message telling you that foo.txt already exists, and asking you if you would like to overwrite it.
E) You get a error message telling you that foo.txt already exists. The file is not written to HDFS
F) You get an error message telling you that input is not a directory
G) The write silently fails
سؤال
You have recently converted your Hadoop cluster from a MapReduce 1 (MRv1) architecture to MapReduce 2 (MRv2) on YARN architecture. Your developers are accustomed to specifying map and reduce tasks (resource allocation) tasks when they run jobs: A developer wants to know how specify to reduce tasks when a specific job runs. Which method should you tell that developers to implement?

A) MapReduce version 2 (MRv2) on YARN abstracts resource allocation away from the idea of "tasks" into memory and virtual cores, thus eliminating the need for a developer to specify the number of reduce tasks, and indeed preventing the developer from specifying the number of reduce tasks.
B) In YARN, resource allocations is a function of megabytes of memory in multiples of 1024mb. Thus, they should specify the amount of memory resource they need by executing -D mapreduce-reduces.memory-mb-2048
C) In YARN, the ApplicationMaster is responsible for requesting the resource required for a specific launch. Thus, executing -D yarn.applicationmaster.reduce.tasks=2 will specify that the ApplicationMaster launch two task contains on the worker nodes.
D) Developers specify reduce tasks in the exact same way for both MapReduce version 1 (MRv1) and MapReduce version 2 (MRv2) on YARN. Thus, executing -D mapreduce.job.reduces-2 will specify reduce tasks.
E) In YARN, resource allocation is function of virtual cores specified by the ApplicationManager making requests to the NodeManager where a reduce task is handeled by a single container (and thus a single virtual core). Thus, the developer needs to specify the number of virtual cores to the NodeManager by executing -p yarn.nodemanager.cpu-vcores=2
سؤال
Choose three reasons why should you run the HDFS balancer periodically? (Choose three)

A) To ensure that there is capacity in HDFS for additional data
B) To ensure that all blocks in the cluster are 128MB in size
C) To help HDFS deliver consistent performance under heavy loads
D) To ensure that there is consistent disk utilization across the DataNodes
E) To improve data locality MapReduce
سؤال
You have a cluster running with a FIFO scheduler enabled. You submit a large job A to the cluster, which you expect to run for one hour. Then, you submit job B to the cluster, which you expect to run a couple of minutes only. You submit both jobs with the same priority. Which two best describes how FIFO Scheduler arbitrates the cluster resources for job and its tasks? (Choose two)

A) Because there is a more than a single job on the cluster, the FIFO Scheduler will enforce a limit on the percentage of resources allocated to a particular job at any given time
B) Tasks are scheduled on the order of their job submission
C) The order of execution of job may vary
D) Given job A and submitted in that order, all tasks from job A are guaranteed to finish before all tasks from job B
E) The FIFO Scheduler will give, on average, and equal share of the cluster resources over the job lifecycle
F) The FIFO Scheduler will pass an exception back to the client when Job B is submitted, since all slots on the cluster are use
سؤال
In CDH4 and later, which file contains a serialized form of all the directory and files inodes in the filesystem, giving the NameNode a persistent checkpoint of the filesystem metadata?

A) fstime
B) VERSION
C) Fsimage_N (where N reflects transactions up to transaction ID N)
D) Edits_N-M (where N-M transactions between transaction ID N and transaction ID N)
سؤال
You have just run a MapReduce job to filter user messages to only those of a selected geographical region. The output for this job is in a directory named westUsers, located just below your home directory in HDFS. Which command gathers these into a single file on your local file system?

A) Hadoop fs -getmerge -R westUsers.txt
B) Hadoop fs -getemerge westUsers westUsers.txt
C) Hadoop fs -cp westUsers/* westUsers.txt
D) Hadoop fs -get westUsers westUsers.txt
سؤال
You are running  Hadoop cluster with all monitoring facilities properly configured. Which scenario will go undeselected?

A) HDFS is almost full
B) The NameNode goes down
C) A DataNode is disconnected from the cluster
D) Map or reduce tasks that are stuck in an infinite loop
E) MapReduce jobs are causing excessive memory swaps
سؤال
Your cluster implements HDFS High Availability (HA). Your two NameNodes are named nn01 and nn02. What occurs when you execute the command: hdfs haadmin -failover nn01 nn02?

A) nn02 is fenced, and nn01 becomes the active NameNode
B) nn01 is fenced, and nn02 becomes the active NameNode
C) nn01 becomes the standby NameNode and nn02 becomes the active NameNode
D) nn02 becomes the standby NameNode and nn01 becomes the active NameNode
سؤال
You have installed a cluster HDFS and MapReduce version 2 (MRv2) on YARN. You have no dfs.hosts entry(ies) in your hdfs-site.xml configuration file. You configure a new worker node by setting fs.default.name in its configuration files to point to the NameNode on your cluster, and you start the DataNode daemon on that worker node. What do you have to do on the cluster to allow the worker node to join, and start sorting HDFS blocks?

A) Without creating a dfs.hosts file or making any entries, run the commands hadoop.dfsadmin-refreshModes on the NameNode
B) Restart the NameNode
C) Creating a dfs.hosts file on the NameNode, add the worker Node's name to it, then issue the command hadoop dfsadmin -refresh Nodes = on the Namenode
D) Nothing; the worker node will automatically join the cluster when NameNode daemon is started
فتح الحزمة
قم بالتسجيل لفتح البطاقات في هذه المجموعة!
Unlock Deck
Unlock Deck
1/30
auto play flashcards
العب
simple tutorial
ملء الشاشة (f)
exit full mode
Deck 1: Cloudera Certified Administrator for Apache Hadoop (CCAH)
1
Cluster Summary: 45 files and directories, 12 blocks = 57 total. Heap size is 15.31 MB/193.38MB(7%) 11ec3bb2_d99a_f026_916b_b9ef210356fc__00 Refer to the above screenshot. You configure a Hadoop cluster with seven DataNodes and on of your monitoring UIs displays the details shown in the exhibit. What does the this tell you?

A) The DataNode JVM on one host is not active
B) Because your under-replicated blocks count matches the Live Nodes, one node is dead, and your DFS Used % equals 0%, you can't be certain that your cluster has all the data you've written it.
C) Your cluster has lost all HDFS data which had bocks stored on the dead DatNode
D) The HDFS cluster is in safe mode
A
2
You are working on a project where you need to chain together MapReduce, Pig jobs. You also need the ability to use forks, decision points, and path joins. Which ecosystem project should you use to perform these actions?

A) Oozie
B) ZooKeeper
C) HBase
D) Sqoop
E) HUE
A
3
You suspect that your NameNode is incorrectly configured, and is swapping memory to disk. Which Linux commands help you to identify whether swapping is occurring? (Select all that apply)

A) free
B) df
C) memcat
D) top
E) jps
F) vmstat
G) swapinfo
A,D,F
4
Which two features does Kerberos security add to a Hadoop cluster? (Choose two)

A) User authentication on all remote procedure calls (RPCs)
B) Encryption for data during transfer between the Mappers and Reducers
C) Encryption for data on disk ("at rest")
D) Authentication for user access to the cluster against a central server
E) Root access to the cluster for users hdfs and mapred but non-root access for clients
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
5
You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25 KB. Because you Hadoop cluster isn't optimized for storing and processing many small files, you decide to do the following actions: 1. Group the individual images into a set of larger files 2. Use the set of larger files as input for a MapReduce job that processes them directly with python using Hadoop streaming. Which data serialization system gives the flexibility to do this?

A) CSV
B) XML
C) HTML
D) Avro
E) SequenceFiles
F) JSON
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
6
You are configuring your cluster to run HDFS and MapReducer v2 (MRv2) on YARN. Which two daemons needs to be installed on your cluster's master nodes? (Choose two)

A) HMaster
B) ResourceManager
C) TaskManager
D) JobTracker
E) NameNode
F) DataNode
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
7
Which is the default scheduler in YARN?

A) YARN doesn't configure a default scheduler, you must first assign an appropriate scheduler class in yarn-site.xml
B) Capacity Scheduler
C) Fair Scheduler
D) FIFO Scheduler
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
8
What two processes must you do if you are running a Hadoop cluster with a single NameNode and six DataNodes, and you want to change a configuration parameter so that it affects all six DataNodes. (Choose two)

A) You must modify the configuration files on the NameNode only. DataNodes read their configuration from the master nodes
B) You must modify the configuration files on each of the DataNodes machines
C) You don't need to restart any daemon, as they will pick up changes automatically
D) You must restart the NameNode daemon to apply the changes to the cluster
E) You must restart all six DatNode daemon to apply the changes to the cluster
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
9
A slave node in your cluster has 4 TB hard drives installed (4 x 2TB). The DataNode is configured to store HDFS blocks on all disks. You set the value of the dfs.datanode.du.reserved parameter to 100 GB. How does this alter HDFS block storage?

A) 25GB on each hard drive may not be used to store HDFS blocks
B) 100GB on each hard drive may not be used to store HDFS blocks
C) All hard drives may be used to store HDFS blocks as long as at least 100 GB in total is available on the node
D) A maximum if 100 GB on each hard drive may be used to store HDFS blocks
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
10
Identify two features/issues that YARN is designated to address: (Choose two)

A) Standardize on a single MapReduce API
B) Single point of failure in the NameNode
C) Reduce complexity of the MapReduce APIs
D) Resource pressure on the JobTracker
E) Ability to run framework other than MapReduce, such as MPI
F) HDFS latency
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
11
Which scheduler would you deploy to ensure that your cluster allows short jobs to finish within a reasonable time without starting long-running jobs?

A) Complexity Fair Scheduler (CFS)
B) Capacity Scheduler
C) Fair Scheduler
D) FIFO Scheduler
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
12
Which process instantiates user code, and executes map and reduce tasks on a cluster running MapReduce v2 (MRv2) on YARN?

A) NodeManager
B) ApplicationMaster
C) TaskTracker
D) JobTracker
E) NameNode
F) DataNode
G) ResourceManager
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
13
What does CDH packaging do on install to facilitate Kerberos security setup?

A) Automatically configures permissions for log files at & MAPRED_LOG_DIR/ user logs
B) Creates users for hdfs and mapreduce to facilitate role assignment
C) Creates directories for temp, hdfs, and mapreduce with the correct permissions
D) Creates a set of pre-configured Kerberos keytab files and their permissions
E) Creates and configures your kdc with default cluster values
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
14
Your cluster is running MapReduce version 2 (MRv2) on YARN. Your ResourceManager is configured to use the FairScheduler. Now you want to configure your scheduler such that a new user on the cluster can submit jobs into their own queue application submission. Which configuration should you set?

A) You can specify new queue name when user submits a job and new queue can be created dynamically if the property yarn.scheduler.fair.allow-undecleared-pools = true
B) Yarn.scheduler.fair.user.fair-as-default-queue = false and yarn.scheduler.fair.allow-undecleared-pools = true
C) You can specify new queue name when user submits a job and new queue can be created dynamically if yarn .schedule.fair.user-as-default-queue = false
D) You can specify new queue name per application in allocations.xml file and have new jobs automatically assigned to the application queue
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
15
You are planning a Hadoop cluster and considering implementing 10 Gigabit Ethernet as the network fabric. Which workloads benefit the most from faster network fabric?

A) When your workload generates a large amount of output data, significantly larger than the amount of intermediate data
B) When your workload consumes a large amount of input data, relative to the entire capacity if HDFS
C) When your workload consists of processor-intensive tasks
D) When your workload generates a large amount of intermediate data, on the order of the input data itself
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
16
On a cluster running CDH 5.0 or above, you use the hadoop fs -put command to write a 300MB file into a previously empty directory using an HDFS block size of 64 MB. Just after this command has finished writing 200 MB of this file, what would another use see when they look in directory?

A) The directory will appear to be empty until the entire file write is completed on the cluster
B) They will see the file with a ._COPYING_ extension on its name. If they view the file, they will see contents of the file up to the last completed block (as each 64MB block is written, that block becomes available)
C) They will see the file with a ._COPYING_ extension on its name. If they attempt to view the file, they will get a ConcurrentFileAccessException until the entire file write is completed on the cluster
D) They will see the file with its original name. If they attempt to view the file, they will get a ConcurrentFileAccessException until the entire file write is completed on the cluster
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
17
Which three basic configuration parameters must you set to migrate your cluster from MapReduce 1 (MRv1) to MapReduce V2 (MRv2)? (Choose three)

A) Configure the NodeManager to enable MapReduce services on YARN by setting the following property in yarn-site.xml: yarn.nodemanager.hostname your_nodeManager_shuffle
B) Configure the NodeManager hostname and enable node services on YARN by setting the following property in yarn-site.xml: your_nodeManager_hostname
C) Configure a default scheduler to run on YARN by setting the following property in mapred-site.xml: mapreduce.jobtracker.taskScheduler org.apache.hadoop.mapred.JobQueueTaskScheduler
D) Configure the number of map tasks per jon YARN by setting the following property in mapred: mapreduce.job.maps 2
E) Configure the ResourceManager hostname and enable node services on YARN by setting the following property in yarn-site.xml: yarn.resourcemanager.hostname your_resourceManager_hostname
F) Configure MapReduce as a Framework running on YARN by setting the following property in mapred-site.xml: mapreduce.framework.name yarn
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
18
Assuming a cluster running HDFS, MapReduce version 2 (MRv2) on YARN with all settings at their default, what do you need to do when adding a new slave node to cluster?

A) Nothing, other than ensuring that the DNS (or/etc/hosts files on all machines) contains any entry for the new node.
B) Restart the NameNode and ResourceManager daemons and resubmit any running jobs.
C) Add a new entry to /etc/nodes on the NameNode host.
D) Restart the NameNode of dfs.number.of.nodes in hdfs-site.xml
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
19
You are running a Hadoop cluster with a NameNode on host mynamenode, a secondary NameNode on host mysecondarynamenode and several DataNodes. Which best describes how you determine when the last checkpoint happened?

A) Execute hdfs namenode -report on the command line and look at the Last Checkpoint information
B) Execute hdfs dfsadmin -saveNamespace on the command line which returns to you the last checkpoint value in fstime file
C) Connect to the web UI of the Secondary NameNode (http://mysecondary:50090/) and look at the "Last Checkpoint" information
D) Connect to the web UI of the NameNode (http://mynamenode:50070) and look at the "Last Checkpoint" information
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
20
Your cluster is configured with HDFS and MapReduce version 2 (MRv2) on YARN. What is the result when you execute: hadoop jar SampleJar MyClass on a client machine?

A) SampleJar.Jar is sent to the ApplicationMaster which allocates a container for SampleJar.Jar
B) Sample.jar is placed in a temporary directory in HDFS
C) SampleJar.jar is sent directly to the ResourceManager
D) SampleJar.jar is serialized into an XML file which is submitted to the ApplicatoionMaster
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
21
Given: 11ec3bb2_d99c_4fb7_916b_074cfa3d65b6__00 You want to clean up this list by removing jobs where the State is KILLED. What command you enter?

A) Yarn application -refreshJobHistory
B) Yarn application -kill application_1374638600275_0109
C) Yarn rmadmin -refreshQueue
D) Yarn rmadmin -kill application_1374638600275_0109
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
22
Assume you have a file named foo.txt in your local directory. You issue the following three commands: Hadoop fs -mkdir input Hadoop fs -put foo.txt input/foo.txt Hadoop fs -put foo.txt input What happens when you issue the third command?

A) The write succeeds, overwriting foo.txt in HDFS with no warning
B) The file is uploaded and stored as a plain file named input
C) You get a warning that foo.txt is being overwritten
D) You get an error message telling you that foo.txt already exists, and asking you if you would like to overwrite it.
E) You get a error message telling you that foo.txt already exists. The file is not written to HDFS
F) You get an error message telling you that input is not a directory
G) The write silently fails
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
23
You have recently converted your Hadoop cluster from a MapReduce 1 (MRv1) architecture to MapReduce 2 (MRv2) on YARN architecture. Your developers are accustomed to specifying map and reduce tasks (resource allocation) tasks when they run jobs: A developer wants to know how specify to reduce tasks when a specific job runs. Which method should you tell that developers to implement?

A) MapReduce version 2 (MRv2) on YARN abstracts resource allocation away from the idea of "tasks" into memory and virtual cores, thus eliminating the need for a developer to specify the number of reduce tasks, and indeed preventing the developer from specifying the number of reduce tasks.
B) In YARN, resource allocations is a function of megabytes of memory in multiples of 1024mb. Thus, they should specify the amount of memory resource they need by executing -D mapreduce-reduces.memory-mb-2048
C) In YARN, the ApplicationMaster is responsible for requesting the resource required for a specific launch. Thus, executing -D yarn.applicationmaster.reduce.tasks=2 will specify that the ApplicationMaster launch two task contains on the worker nodes.
D) Developers specify reduce tasks in the exact same way for both MapReduce version 1 (MRv1) and MapReduce version 2 (MRv2) on YARN. Thus, executing -D mapreduce.job.reduces-2 will specify reduce tasks.
E) In YARN, resource allocation is function of virtual cores specified by the ApplicationManager making requests to the NodeManager where a reduce task is handeled by a single container (and thus a single virtual core). Thus, the developer needs to specify the number of virtual cores to the NodeManager by executing -p yarn.nodemanager.cpu-vcores=2
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
24
Choose three reasons why should you run the HDFS balancer periodically? (Choose three)

A) To ensure that there is capacity in HDFS for additional data
B) To ensure that all blocks in the cluster are 128MB in size
C) To help HDFS deliver consistent performance under heavy loads
D) To ensure that there is consistent disk utilization across the DataNodes
E) To improve data locality MapReduce
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
25
You have a cluster running with a FIFO scheduler enabled. You submit a large job A to the cluster, which you expect to run for one hour. Then, you submit job B to the cluster, which you expect to run a couple of minutes only. You submit both jobs with the same priority. Which two best describes how FIFO Scheduler arbitrates the cluster resources for job and its tasks? (Choose two)

A) Because there is a more than a single job on the cluster, the FIFO Scheduler will enforce a limit on the percentage of resources allocated to a particular job at any given time
B) Tasks are scheduled on the order of their job submission
C) The order of execution of job may vary
D) Given job A and submitted in that order, all tasks from job A are guaranteed to finish before all tasks from job B
E) The FIFO Scheduler will give, on average, and equal share of the cluster resources over the job lifecycle
F) The FIFO Scheduler will pass an exception back to the client when Job B is submitted, since all slots on the cluster are use
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
26
In CDH4 and later, which file contains a serialized form of all the directory and files inodes in the filesystem, giving the NameNode a persistent checkpoint of the filesystem metadata?

A) fstime
B) VERSION
C) Fsimage_N (where N reflects transactions up to transaction ID N)
D) Edits_N-M (where N-M transactions between transaction ID N and transaction ID N)
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
27
You have just run a MapReduce job to filter user messages to only those of a selected geographical region. The output for this job is in a directory named westUsers, located just below your home directory in HDFS. Which command gathers these into a single file on your local file system?

A) Hadoop fs -getmerge -R westUsers.txt
B) Hadoop fs -getemerge westUsers westUsers.txt
C) Hadoop fs -cp westUsers/* westUsers.txt
D) Hadoop fs -get westUsers westUsers.txt
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
28
You are running  Hadoop cluster with all monitoring facilities properly configured. Which scenario will go undeselected?

A) HDFS is almost full
B) The NameNode goes down
C) A DataNode is disconnected from the cluster
D) Map or reduce tasks that are stuck in an infinite loop
E) MapReduce jobs are causing excessive memory swaps
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
29
Your cluster implements HDFS High Availability (HA). Your two NameNodes are named nn01 and nn02. What occurs when you execute the command: hdfs haadmin -failover nn01 nn02?

A) nn02 is fenced, and nn01 becomes the active NameNode
B) nn01 is fenced, and nn02 becomes the active NameNode
C) nn01 becomes the standby NameNode and nn02 becomes the active NameNode
D) nn02 becomes the standby NameNode and nn01 becomes the active NameNode
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
30
You have installed a cluster HDFS and MapReduce version 2 (MRv2) on YARN. You have no dfs.hosts entry(ies) in your hdfs-site.xml configuration file. You configure a new worker node by setting fs.default.name in its configuration files to point to the NameNode on your cluster, and you start the DataNode daemon on that worker node. What do you have to do on the cluster to allow the worker node to join, and start sorting HDFS blocks?

A) Without creating a dfs.hosts file or making any entries, run the commands hadoop.dfsadmin-refreshModes on the NameNode
B) Restart the NameNode
C) Creating a dfs.hosts file on the NameNode, add the worker Node's name to it, then issue the command hadoop dfsadmin -refresh Nodes = on the Namenode
D) Nothing; the worker node will automatically join the cluster when NameNode daemon is started
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
locked card icon
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.