Question 1

All but one set of GPU processor cores will be idle,while one SM is bearing the full processing load.

Accepted Answer

In a GPU, processing is done by multiple Streaming Multiprocessors (SMs), each containing multiple processor cores. If the workload is not parallelizable, then only one SM will be utilized, and the other SMs will be idle. Therefore, all but one set of GPU processor cores will be idle, while one SM is bearing the full processing load.

Question 2

The total number of threads defined is typically in the thousands to maximize the utilization of the GPU processor cores as well as maximize the available speedup.

Accepted Answer

This statement is true. By defining thousands of threads, the GPU processor cores can be utilized to their maximum potential, resulting in faster processing times and a greater speedup.

Question 3

Because the GPU and the CPU are designed and optimized for two significantly different types of applications,their architectures differ significantly.

Accepted Answer

GPUs are optimized for parallel processing of large amounts of data, while CPUs are optimized for sequential processing of smaller amounts of data. As a result, their architectures differ in terms of the number of cores, memory hierarchy, and the type of instructions they can execute efficiently.

Question 4

GPGPU is a computing platform and programming model created by NVIDIA.

Accepted Answer

GPGPU stands for General-Purpose computing on Graphics Processing Units, which is a concept, not a specific platform or model created by NVIDIA. NVIDIA's specific platform for GPGPU is called CUDA.

Question 5

The GPU is most efficient when it is processing as many warps as possible to keep the CUDA cores maximally utilized.

Accepted Answer

The GPU's architecture is designed to execute multiple threads simultaneously in groups called warps. By keeping as many warps in flight as possible, the CUDA cores are effectively utilized, which maximizes the GPU's efficiency.

Question 6

For about $200 you can purchase a GPU with 960 parallel processor cores for a workstation.

Accepted Answer

It is possible to purchase a GPU with 960 parallel processor cores for a workstation for around $200.

Question 7

The Fermi architecture upgraded from the IEEE 754-1985 floating-point arithmetic standard to the IEEE 754-2008 standard.

Accepted Answer

The Fermi architecture was upgraded from the IEEE 754-1985 floating-point arithmetic standard to the IEEE 754-2008 standard.

Question 8

In embedded systems the GPU is composed of only a single-digit number of cores,and are typically combined with a number of conventional cores,referred to as _________.&#10;A)arithmetic logic units&#10;B)control units&#10;C)central processing units&#10;D)graphic processing units

Accepted Answer

In embedded systems, the GPU (Graphics Processing Unit) may indeed have a limited number of cores and is often paired with conventional cores known as Central Processing Units (CPUs) to handle general-purpose computing tasks, while the GPU focuses on graphics and parallel processing tasks.

Question 9

It is not important for the programmer to understand the nuances of the various GPU memories.

Accepted Answer

It is important for the programmer to understand the nuances of the various GPU memories in order to optimize the performance and functionality of their program on a GPU. Understanding the differences between shared memory, global memory, constant memory, and texture memory can have a significant impact on the efficiency and effectiveness of the program.

Question 10

The largest GPUs are found in embedded systems.

Accepted Answer

The largest GPUs are found in high-end gaming PCs or specialized workstations, not embedded systems. Embedded systems typically have smaller, more basic GPUs with limited processing power.

Question 11

The equivalent GPU hardware component for a block is the CUDA multiprocessor (SM).

Accepted Answer

The answer of The equivalent GPU hardware component for a...

Question 12

CUDA was created by __________ .&#10;A)Amdahl&#10;B)NVIDIA&#10;C)the U.S.Government&#10;D)Herbert Moore

Accepted Answer

The answer of CUDA was created by __________ .&#10;A)Amdahl&#10;B)NVIDIA&#10;C)the U.S.Government&#10;D)Herbert...

Question 13

GPUs can be found in almost all of today's workstations,laptops, tablets,and smartphones.

Accepted Answer

The answer of GPUs can be found in almost all...

Question 14

A kernel typically will have few to no branching statements.

Accepted Answer

The answer of A kernel typically will have few to...

Question 15

A SoC product architect can create product families or a specific product within a family by placing a single slice or multiple slices on a SoC chip.

Accepted Answer

The answer of A SoC product architect can create product...

Question 16

The __________ is designed specifically to be optimized for fast three-dimensional (3D)graphics rendering and video processing.&#10;A)CPU&#10;B)GPU&#10;C)CU&#10;D)ALU

Accepted Answer

The answer of The __________ is designed specifically to be...

Question 17

A group of threads assigned to a particular SM is a __________ .&#10;A)block&#10;B)grid&#10;C)unit&#10;D)kernel

Accepted Answer

The answer of A group of threads assigned to a...

Question 18

An instance of the kernel on the GPU is a ___________ .&#10;A)thread&#10;B)warp&#10;C)grid&#10;D)block

Accepted Answer

The answer of An instance of the kernel on the...

Question 19

The grid and the block need to have the same dimensions.

Accepted Answer

The answer of The grid and the block need to...

Question 20

CUDA C is a C / C++ based language.

Accepted Answer

The answer of CUDA C is a C / C++...

Question 21

The data-parallel code to be run on the GPU is called a ___________ .

Accepted Answer

The answer of The data-parallel code to be run on...

Question 22

The entire Gen8 compute architecture interfaces to the rest of the SoC components via a dedicated unit called the ____________ .

Accepted Answer

The answer of The entire Gen8 compute architecture interfaces to...

Question 23

A __________ program can be divided into three general sections: code to be run on the device,code to be run on the host,and the code related to the transfer of data between the host and the device.

Accepted Answer

The answer of A __________ program can be divided into...

Question 24

Threads are uniformly bundled in _________ .

Accepted Answer

The answer of Threads are uniformly bundled in _________ ....

Question 25

The dual warp scheduler will break up each thread block it is processing into _______ .&#10;A)kernels&#10;B)warps&#10;C)grids&#10;D)all of the above

Accepted Answer

The answer of The dual warp scheduler will break up...

Question 26

The GPU has found its way into massively parallel programming environments for a wide range of applications,which is where the term __________ is derived from.

Accepted Answer

The answer of The GPU has found its way into...

Question 27

A _________ is a bundle of 32 threads that start at the same starting address and their thread IDs are consecutive.&#10;A)warp&#10;B)grid&#10;C)block&#10;D)grouping

Accepted Answer

The answer of A _________ is a bundle of 32...

Question 28

The first NVIDIA GPU with added GPGPU support hardware was the _________ .

Accepted Answer

The answer of The first NVIDIA GPU with added GPGPU...

Question 29

The parallel code in the form of a function to be run on GPU is the ________ .&#10;A)grid&#10;B)thread&#10;C)kernel&#10;D)none of the above

Accepted Answer

The answer of The parallel code in the form of...

Question 30

The _________ performs transcendental operations,such as cosine,sine,reciprocal,and square root,in a single clock cycle.&#10;A)SM&#10;B)SIMD&#10;C)SFU&#10;D)FMA

Accepted Answer

The answer of The _________ performs transcendental operations,such as cosine,sine,reciprocal,and...

Question 31

To enhance performance,a technique known as __________ is used for the shared L3 data cache.&#10;A)cache banking&#10;B)thread blocking&#10;C)streaming&#10;D)warping

Accepted Answer

The answer of To enhance performance,a technique known as __________...

Question 32

A subslice includes a unit called the _________,which is used for sampling texture and image surfaces.&#10;A)stride&#10;B)sampler&#10;C)EU&#10;D)floating-point

Accepted Answer

The answer of A subslice includes a unit called the...

Question 33

A _________ is a single instance of the kernel function.

Accepted Answer

The answer of A _________ is a single instance of...

Question 34

In 2006 NVIDIA facilitated the use of its new GPGPU language,________ .&#10;A)GPU / GP&#10;B)SIMD&#10;C)CUDA&#10;D)NVIDIA C

Accepted Answer

The answer of In 2006 NVIDIA facilitated the use of...

Question 35

The EU can issue up to ________ different instructions simultaneously from different threads.&#10;A)four&#10;B)five&#10;C)six&#10;D)seven

Accepted Answer

The answer of The EU can issue up to ________...

Question 36

__________ is a parallel computing platform and programming model created by NVIDIA and implemented by the GPUs that they produce.

Accepted Answer

The answer of __________ is a parallel computing platform and...

Question 37

In the CPU the control logic and __________ make up the majority of the CPU's real estate.

Accepted Answer

The answer of In the CPU the control logic and...

Question 38

The number of blocks per kernel launch is called a __________ .

Accepted Answer

The answer of The number of blocks per kernel launch...

Question 39

__________ are caused by limited SFUs,double-precision multiplication,and branching.&#10;A)Structural hazards&#10;B)RAW data hazards&#10;C)Vertical hazards&#10;D)Latency hazards

Accepted Answer

The answer of __________ are caused by limited SFUs,double-precision multiplication,and...

Question 40

___________ is a GPU processing technology.&#10;A)Fermi&#10;B)Kepler&#10;C)Maxwell&#10;D)All of the above

Accepted Answer

The answer of ___________ is a GPU processing technology.&#10;A)Fermi&#10;B)Kepler&#10;C)Maxwell&#10;D)All of...

Question 41

The _________ GPU has a total of 16 SMs x 32 CUDA cores / SM,or 512 CUDA cores.

Accepted Answer

The answer of The _________ GPU has a total of...

Question 42

A GPU uses a massively parallel ________ architecture to perform mainly mathematical operations.

Accepted Answer

The answer of A GPU uses a massively parallel ________...

Question 43

The _________ scheduler breaks up each thread block it is processing into warps.

Accepted Answer

The answer of The _________ scheduler breaks up each thread...

Question 44

The _________ global scheduler unit on the GPU chip distributes the thread blocks to the SMs.

Accepted Answer

The answer of The _________ global scheduler unit on the...

Question 45

The fundamental building block of the Gen8 architecture is the ________ unit.

Accepted Answer

The answer of The fundamental building block of the Gen8...

Deck 19: General-Purpose Graphic Processing Units