Suppose we have a dual core chip multiprocessor with two level cache hierarchy: Both the cores have their own private first level cache (L1) while they share their second level cache (L2). The first level cache on both the cores is 2-way set associative with cache line size of 2K bytes, and access latency of 30ns per word, while the shared cache is direct mapped with cache line size of 4K bytes and access latency of 80ns per word. Consider a process with two threads running on these cores as follows (assume the size of an integer to be 4 bytes which is same as the word size):
Thread 1:
int A[1024];
for (i=0; i < 1024; i++)
{
A[i] = A[i] + 1;
}
Thread 2:
int B[1024];
for (i=0; i< 1024; i++)
{
B[i] = B[i] + 1;
}
Initially assume that both the arrays A and B are in main memory, whose access latency is 200ns per word. Assume that an int is word sized. Furthermore, assume that A and B when mapped to L2 start at address 0 of a cache line. Assume a write back policy for both L1 and L2 caches.
(a) If the main memory blocks having arrays A and B map to different L2 cache lines, how much time would it take the process to complete its execution in the worst case? (Assuming this is the only process running on the machine.)
(b) If the main memory blocks having arrays A and B map to the same L2 cache line, how much time would it take the process to complete its execution in the worst case? (Assuming this is the only process running on the machine.)
In the worst case, thread 1 could access A[0], thread 2 could access B[0], then thread 1 could access A[1] followed by B[1] access by thread 2 and so on. Every time A[I] or B[i] is accessed, it evicts the other array from L2 cache and so a subsequent access to the other array has to again cause a main memory access.
Correct Answer:
Verified
View Answer
Unlock this answer now
Get Access to more Verified Answers free of charge
Q1: Applying the send/receive programming model as outlined
Q2: Consider the following code that adds two
Q3: Why should there be stride-access for vector
Q4: Consider a system with two multiprocessors with
Q5: Consider a multi-core processor with heterogeneous cores:
Q7: Vector architecture exploits the data-level parallelism to
Q8: Consider a multi-core processor with 64
Q9: Consider the following GPU that consists of
Q10: How would you rewrite the following sequential
Q11: Besides network bandwidth and bisection bandwidth, two
Unlock this Answer For Free Now!
View this answer and more for free by performing one of the following actions
Scan the QR code to install the App and get 2 free unlocks
Unlock quizzes for free by uploading documents