Question 1

NVIDIA CUDA Warp is made up of how many threads?&#10;A)512&#10;B)1024&#10;C)312&#10;D)32

Accepted Answer

A CUDA warp consists of 32 threads, which are executed simultaneously by the GPU.

Question 2

Out-of-order instructions is not possible on GPUs.

Accepted Answer

GPUs are designed to handle multiple instructions simultaneously, allowing for out-of-order execution as part of their parallel processing capabilities.

Question 3

CUDA supports programming in ....&#10;A)c or c++ only&#10;B)java, python, and more&#10;C)c, c++, third party wrappers for java, python, and more&#10;D)pascal

Accepted Answer

CUDA primarily supports programming in C and C++, but it also allows for programming through third-party wrappers in other languages such as Java, Python, and more, enabling a broader range of developers to utilize GPU acceleration in their applications.

Question 4

FADD, FMAD, FMIN, FMAX are ----- supported by Scalar Processors of NVIDIA GPU.&#10;A)32-bit ieee floating point instructions&#10;B)32-bit integer instructions&#10;C)both&#10;D)none of the above

Accepted Answer

FADD, FMAD, FMIN, FMAX are 32-bit IEEE floating point instructions supported by Scalar Processors of NVIDIA GPUs, designed for operations with floating-point numbers.

Question 5

Each streaming multiprocessor (SM) of CUDA herdware has ------ scalar processors (SP).&#10;A)1024&#10;B)128&#10;C)512&#10;D)8

Accepted Answer

Each streaming multiprocessor (SM) in CUDA hardware typically has 8 scalar processors (SPs).

Question 6

Each NVIDIA GPU has ------ Streaming Multiprocessors&#10;A)8&#10;B)1024&#10;C)512&#10;D)16

Accepted Answer

The number of Streaming Multiprocessors (SMs) in an NVIDIA GPU varies by model and architecture, but none of the current models have 8, 1024, or 512 SMs. The choice of 16 is also not universally correct for all NVIDIA GPUs, but it's closer to the configurations found in several models compared to the other options. However, it's important to note that the exact number can be significantly higher or lower depending on the specific GPU model and architecture (e.g., Turing, Ampere).

Question 7

CUDA provides ------- warp and thread scheduling. Also, the overhead of thread creation is on the order of ----.&#10;A)&#34;programming-overhead&#34;, 2 clock&#10;B)&#34;zero-overhead&#34;, 1 clock&#10;C)64, 2 clock&#10;D)32, 1 clock

Accepted Answer

CUDA provides "zero-overhead" warp and thread scheduling, and the overhead of thread creation is on the order of 1 clock cycle.

Question 8

Each warp of GPU receives a single instruction and &#34;broadcasts&#34; it to all of its threads. It is a ---- operation.&#10;A)simd (single instruction multiple data)&#10;B)simt (single instruction multiple thread)&#10;C)sisd (single instruction single data)&#10;D)sist (single instruction single thread)

Accepted Answer

The operation described is SIMT (Single Instruction Multiple Thread), where a single instruction is broadcasted to multiple threads within a warp in a GPU.

Question 9

Limitations of CUDA Kernel&#10;A)recursion, call stack, static variable declaration&#10;B)no recursion, no call stack, no static variable declarations&#10;C)recursion, no call stack, static variable declaration&#10;D)no recursion, call stack, no static variable declarations

Accepted Answer

CUDA kernels have limitations including no support for recursion, no call stack, and restrictions on static variable declarations.

Question 10

What is Unified Virtual Machine&#10;A)it is a technique that allow both cpu and gpu to read from single virtual machine, simultaneously.&#10;B)it is a technique for managing separate host and device memory spaces.&#10;C)it is a technique for executing device code on host and host code on device.&#10;D)it is a technique for executing general purpose programs on device instead of host.

Accepted Answer

Unified Virtual Memory (UVM) is a technique that allows both the CPU and GPU to access the same virtual memory space, enabling easier data sharing and management between the two.

Question 11

_______ became the first language specifically designed by a GPU Company to facilitate general purpose computing on ____.&#10;A)python, gpus.&#10;B)c, cpus.&#10;C)cuda c, gpus.&#10;D)java, cpus.

Accepted Answer

The answer of _______ became the first language specifically designed...

Question 12

The CUDA architecture consists of --------- for parallel computing kernels and functions.&#10;A)risc instruction set architecture&#10;B)cisc instruction set architecture&#10;C)zisc instruction set architecture&#10;D)ptx instruction set architecture

Accepted Answer

The answer of The CUDA architecture consists of --------- for...

Question 13

CUDA stands for --------, designed by NVIDIA.&#10;A)common union discrete architecture&#10;B)complex unidentified device architecture&#10;C)compute unified device architecture&#10;D)complex unstructured distributed architecture

Accepted Answer

The answer of CUDA stands for --------, designed by NVIDIA.&#10;A)common...

Question 14

The host processor spawns multithread tasks (or kernels as they are known in CUDA) onto the GPU device.

Accepted Answer

The answer of The host processor spawns multithread tasks (or...

Question 15

The NVIDIA G80 is a ---- CUDA core device, the NVIDIA G200 is a ---- CUDA core device, and the NVIDIA Fermi is a ---- CUDA core device.&#10;A)128, 256, 512&#10;B)32, 64, 128&#10;C)64, 128, 256&#10;D)256, 512, 1024

Accepted Answer

The answer of The NVIDIA G80 is a ---- CUDA...

Question 16

NVIDIA 8-series GPUs offer -------- .&#10;A)50-200 gflops&#10;B)200-400 gflops&#10;C)400-800 gflops&#10;D)800-1000 gflops

Accepted Answer

The answer of NVIDIA 8-series GPUs offer -------- .&#10;A)50-200 gflops&#10;B)200-400...

Question 17

IADD, IMUL24, IMAD24, IMIN, IMAX are ----------- supported by Scalar Processors of NVIDIA GPU.&#10;A)32-bit ieee floating point instructions&#10;B)32-bit integer instructions&#10;C)both&#10;D)none of the above

Accepted Answer

The answer of IADD, IMUL24, IMAD24, IMIN, IMAX are -----------...

Question 18

CUDA Hardware programming model supports:
A. fully generally data-parallel archtecture;
B. General thread launch;
C. Global load-store;
D. Parallel data cache;
E. Scalar architecture;
F. Integers, bit operation

A)a,c,d,f
B)b,c,d,e
C)a,d,e,f
D)a,b,c,d,e,f

Accepted Answer

The answer of CUDA Hardware programming model supports:&#10;A. fully generally...

Question 19

In CUDA memory model there are following memory types available:
A. Registers;
B. Local Memory;
C. Shared Memory;
D. Global Memory;
E. Constant Memory;
F. Texture Memory.

A)a, b, d, f
B)a, c, d, e, f
C)a, b, c, d, e, f
D)b, c, e, f

Accepted Answer

The answer of In CUDA memory model there are following...

Question 20

What is the equivalent of general C program with CUDA C:
Int main(void)
{
Printf("Hello, World!\n");
Return 0;
}

A) int main ( void )
{
Kernel <<<1,1>>>();
Printf("hello, world!\\n");
Return 0;
}
B)__global__ void kernel( void ) { }
Int main ( void ) { kernel <<<1,1>>>();
Printf("hello, world!\\n");
Return 0;
}
C)__global__ void kernel( void ) {
Kernel <<<1,1>>>();
Printf("hello, world!\\n");
Return 0;
}
D)__global__ int main ( void ) {
Kernel <<<1,1>>>();
Printf("hello, world!\\n");
Return 0;
}

Accepted Answer

The answer of What is the equivalent of general C...

Question 21

A simple kernel for adding two integers:
__global__ void add( int *a, int *b, int *c ) { *c = *a + *b; }
Where __global__ is a CUDA C keyword which indicates that:

A)add() will execute on device, add() will be called from host
B)add() will execute on host, add() will be called from device
C)add() will be called and executed on host
D)add() will be called and executed on device

Accepted Answer

The answer of A simple kernel for adding two integers:&#10;__global__...

Question 22

If variable a is host variable and dev_a is a device (GPU) variable, to allocate memory to dev_a select correct statement:&#10;A)cudamalloc( &dev_a, sizeof( int ) )&#10;B)malloc( &dev_a, sizeof( int ) )&#10;C)cudamalloc( (void**) &dev_a, sizeof( int ) )&#10;D)malloc( (void**) &dev_a, sizeof( int ) )

Accepted Answer

The answer of If variable a is host variable and...

Question 23

If variable a is host variable and dev_a is a device (GPU) variable, to copy input from variable a to variable dev_a select correct statement:&#10;A)memcpy( dev_a, &a, size);&#10;B)cudamemcpy( dev_a, &a, size, cudamemcpyhosttodevice );&#10;C)memcpy( (void*) dev_a, &a, size);&#10;D)cudamemcpy( (void*) &dev_a, &a, size, cudamemcpydevicetohost );

Accepted Answer

The answer of If variable a is host variable and...

Question 24

Triple angle brackets mark in a statement inside main function, what does it indicates?&#10;A)a call from host code to device code&#10;B)a call from device code to host code&#10;C)less than comparison&#10;D)greater than comparison

Accepted Answer

The answer of Triple angle brackets mark in a statement...

Deck 2: Nvidia CUDA and GPU Programming