Pipelined Datapaths

In addition to providing a datapath that performs the necessary register transfer μOP’s, an architect needs to be concerned about the speed at which the μOP’s are being performed. Slide 7- 6a illustrates maximum delay values for each of the components in a typical datapath. Note that timing always starts with the beginning of a new clock cycle (e.g. a positive edge of the clock). The illustration shows a maximum of 4 ns (3 ns + 1 ns) is required to place two operands at the inputs to the Function unit. A maximum of 4 ns is also required to execute an operation in the Function unit. Finally a maximum of 4 ns (1 ns + 3 ns) is required to write the results back into a register in the register file. By adding all these delays, we find that 12 ns is required to perform a single μOP. Thus, the fastest rate for executing a μOP is 83.3 MHz (1 MHz = 106 clock cycles per second), where the inverse of 12 x 10-9 equals 83.3 x 106. This is the maximum frequency at which the clock can be operated, since 12 ns is the smallest clock period that will allow each μOP to be completed.
Now suppose that the rate at which the μOP’s are performed is not adequate for a particular application, and that there are no faster components available with which to reduce the 12 ns required to complete a μOP. It is possible to reduce the clock period, which in turn increases the clock frequency, by breaking up the 12 ns delay path with registers. The resultant datapath, shown in slide 7-6b, is referred to as a pipelined datapath, or just a pipeline.
Three registers (crosshatched in the diagram) break the delay of the original datapath into three parts. The register file contains the first register. Crosshatching covers only the top half of the register file, since the lower half is view as combinational logic that selects two registers to read. The next register in the pipeline is used to store the source operands to be provided to the Function unit. The third register stores the result of the Function unit.
The term “pipeline” does not provide the best analogy for the corresponding datapath structure. A better analogy for the datapath structure is an assembly line. A custom product being built (i.e. the execution of an instruction), may pass down an assembly line, through many stages, before it is completed. A conveyor belt moves components from stage to stage by proceeding periodically the length of one stage (i.e. each clock cycle). Components and partially completed assemblies are stored in bins (i.e. the pipeline registers) along the pipeline. The person at the first stage of the assembly line takes one or more of the components to be included in the product from storage bins and places them on the conveyor. The person at the second stage assembles the components. The person at the last stage takes the assembly from the conveyor and puts it into the storage bins to be used for further assembly latter on. Note that each person has a simple task to perform. Moreover, as soon as all the tasks in a particular stage are done, the conveyor can move forward so that the same tasks can be performed for the next items on the conveyor.
Observe that one assembly operation is completed at each “advance of the belt” (i.e. each clock tick). At any given time, three assembly operations are in some stage of completion. How is this faster than having only one stage, as in the case of our conventional datapath? Suppose that a single-stage line with one person doing every step of the assembly takes one minute (20 seconds devoted to each of the three tasks). How many finished assemblies (instructions) come from the single-stage line in 60 seconds? Just one. How many assemblies come from the three-stage line in 60 seconds? Since an assembly operation is completed every 20 seconds at the last stage of the line, three assembled products come forth! Although both of the assembly lines take 60 seconds to complete one assembly, the three-stage line completes three times as many assembled products in a given amount of time. So the throughput (the rate of producing assembled products) of the three-stage line is three times that of the single stage line.
Related to the analogy: The “take from storage bins” stage corresponds to obtaining or fetching operands from the register file, with the storage bins being corresponding to register file locations. The “assembly” stage corresponds to the execution of an operation in the Function unit. Finally, the “put into storage bins” operation corresponds to writing back of the result into a register file location.