The classic 5-stage pipeline seen in Section 4.5 is IF, ID, EX, MEM, WB. This pipeline is designed specifically to execute the MIPS instruction set. MIPS is a load store architecture that performs one memory operation per instruction, hence a single MEM stage in the pipeline suffices. Also, its most common addressing mode is register displacement addressing. The EX stage is placed before the MEM stage to allow it to be used for address calculation. In this question we will consider a variation in the MIPS instruction set and the interactions of this variation with the pipeline structure.
The particular variation we are considering involves swapping the MEM and EX stages, creating a pipeline that looks like this: IF, ID, MEM, EX, WB. This change has two effects on the instruction set. First, it prevents us from using register displacement addressing (there is no longer an EX in front of MEM to accomplish this). However, in return we can use instructions with one memory input operand, i.e., register-memory instructions. For instance: multf_m f0,f2,(r2) multiplies the contents of register f2 and the value at memory location pointed to by r2, putting the result in f0.
(a) Dropping the register displacement addressing mode is potentially a big loss, since it is the mode most frequently used in MIPS. Why is it so frequent? Give two popular software constructs whose implementation uses register displacement addressing (i.e., uses displacement addressing with non- zero displacements).
(b) What is the difference between a dependence and a hazard?
(c) In this question we will work with the SAXPY loop.
do I = 0,N
Z[I] = A*X[I] + Y[I]
Here is the new assembly code.
0: slli r2,r1,#3 // I is in r1 1: addi r3,r2,#X
2: multf_m f2,f0,(r3) // A is in f0 3: addi r4,r2,#Y
4: addf_m f4,f2,(r4)
5: addi r4,r2,#Z
6: sf f4,(r4)
7: addi r1,r1,#1
8: slei r6,r1,r5 // N is in r5 9: bnez r6,#0
Using the instruction numbers, label the data and control dependences.
(d) Fill in the pipeline diagram for code for the new SAXPY loop. Label the stalls as d* for data-hazard stalls and s* for structural stalls. What is the latency of a single iteration? (The number of cycles between the completion of two successive #0 instructions). For this question, assume that FP addition takes 2 cycles, FP multiplication takes 3 cycles and that all other operations take a single cycle. The functional units are not pipelined. The FP adder, FP multiplier and integer ALU are all separate functional units, such that there are no structural hazards between them. The register file is written by the WB stage in the first half of a clock cycle and is read by the ID stage in the second half of a clock cycle. In addition, the processor has full forwarding. The processor stalls on branches until the outcome is available which is at the end of the EX stage. The processor has no provisions for maintaining "precise state". (e) In the pipeline for MIPS mentioned in the text, what is the reason for forcing non-memory operations to go through the MEM stage rather than proceeding directly to the WB stage?
(f) Aside from the direct loss of register displacement addressing and the subsequent instructions required to explicitly compute addresses, what are two other disadvantages of this sort of pipeline?
(g) Reduce the stalls by pipeline scheduling a single loop iteration. Show the resulting code and fill in the pipeline diagram. You do not need to show the optimal schedule for a correct response.
Correct Answer:
Verified
View Answer
Unlock this answer now
Get Access to more Verified Answers free of charge
Q2: Using any ILP optimization, double the performance
Q3: Consider the following assembly language code:
I0: ADD
Q4: Branch Prediction. Consider the following sequence of
Q5: For the MIPS datapath shown below,
Q6: This problem covers your knowledge of
Q8: A two-part question.
-(Part A) Dependence detection
This question
Q9: A two-part question.
-Given four instructions, how many
Q10: Pipelining is used because it improves instruction
Q11: You are given a 4-stage pipelined
Q12: Consider the datapath below. This machine does
Unlock this Answer For Free Now!
View this answer and more for free by performing one of the following actions
Scan the QR code to install the App and get 2 free unlocks
Unlock quizzes for free by uploading documents