First, we denote the two 8-bit magnitudes to be multiplied as A7-0 and B7-0 and the 16-bit product that results as P15-0. We can partition A and B into two 4-bit groups, A7-4, A3-0, B7-4, B3-0, and form their 16-bit product as a sum of several 8-bit products:
To see how this works, let's examine the multiplication of the 8-bit binary numbers 111100102 and 100011002. These correspond to the decimal numbers 242 and 140, respectively.
As a check, we see that 242 * 140 = 33880, which is
equal to 10000100010110002.
(See Appendix A to review base
The hardware implementation follows directly from this observation. It requires four 4-by-4 multipliers, implemented as in Figure 5.31, plus logic to sum the four-bit wide slices of the partial products.
Let's call the four 8-bit partial products PP0, PP1, PP2, and PP3. Then the final product bits are computed as follows:
Of course, any carry-out of the calculation of P7-4
must be added to the sum for P11-8, and likewise for the carry-out
of P11-8 to P15-12.
)the calculation of partial products,
)the summing of the 4-bit product slices, and
)the carry look-ahead unit. We examine each of these in turn.
Calculation of Sums The low-order 4 bits of the final product, P3-0, are the same as PP03-0 and do not participate in the sums. P7-4 and P11-8 are sums of three 4-bit quantities. How do we compute these?
Figure 5.32 shows a way to cascade full adders to implement a function
that sums three 4-bit quantities, denoted A3-0, B3-0,
(Watch that you don't confuse the variable
Ci with adder carry-ins.
) The first level of full
adders sums 1 bit from each of the three numbers to be added. We accomplish
this by using the carry input as a data input. The second-level adders combine
the carry-out from the next lower order stage with the sum from the first-level
adder. The carries simply propagate from right to left among the second-level
adders. This is just like the carry propagations we needed in the 4-by-4
multiplier of Figure 5.28.
Figure 5.33 shows how the logic of Figure 5.32 can be implemented with TTL components. The first-level full adders are provided by 74183 dual binary adders. The second-level adders are implemented by a 74181 arithmetic logic unit, configured for the adder function. This has the extra performance advantage of internal carry look-ahead logic. Note that the ALU block is written in its positive logic form, with positive logic data inputs and outputs and negative logic carry-in and carry-out.
Figure 5.33 provides the basic building block we can use to implement bit slices P7-4 and P11-8 for the result products. This is shown in Figure 5.34.
The rightmost 74181 component and its two associated 74183s implement bit slice P7-4. The logic is cascaded with an identical block of components to implement bit slice P11-8.
Figure 5.34 also includes the implementation of slice
P15-12. The final slice is formed from the partial product PP37-4,
plus any carry-outs from lower-order sums. We implement this using a 74181
component configured as an adder, with the B data inputs set to
0, the A inputs set to the partial product, and the carry-in coming
from the adjacent adder block.
Putting the Pieces Together The last step in the design combines the multiplier block with the accumulation block. To further improve the performance, the carries between the 74181s can be replaced with a 74182 carry look-ahead unit.
This is shown in Figure 5.35.
The generate/propagate outputs of the three 74181s are wired to the corresponding
inputs of the 74182 carry look-ahead unit. The component is drawn with positive
logic generate and propagate inputs and negative logic carries. This matches
the notation used for the ALUs. The Cn input is wired high, matching
the carry-in to the lowest order 74181. The generated Cn + x and
Cn + y carries are routed to the carry-in of the middle and high-order
Package Count and Performance In terms of package count, the complete implementation uses four 74284/74285 multipliers
), four 74183 full
adder packages, three 74181 arithmetic logic units, and one 74182 carry
look-ahead unit. This is a total of 16 packages.
A circuit this complex is far too complicated to analyze by simply counting gate delays. We start by identifying the critical delay path. This is the sequence of propagated signals that limits the performance of the circuit. Once we have determined the critical path, the TTL catalog will provide us with signal delays associated with the individual packages in our implementation.
The first step in the critical path is the calculation of the partial products by the 74284/285 multipliers. Assuming standard TTL components, the typical delay from the arrival of the inputs to valid outputs is 40 ns
(60 ns maximum
The next step in the critical path is the formation of the intermediate sums by the 74183s. We assume LS TTL for these packages. Since the typical adder delay is sensitive to the final value of the sum output, 9 ns for a low-to-high transition and 20 ns for a high-to-low output transition, it is reasonable to average these to get 15 ns. For worst-case delay, we should use the worst-case maximum, which is 33 ns.
The final leg of the critical path is the calculation of the second-stage sums using the carry look-aheads. This consists of three pieces:
) calculation of the group propagates/generates
in the 74181s,
) calculations of the carry-outs
by the 74182 after the propagates and generates become valid, and
calculation of the final sums in the 74181s once the carries are valid.
We assume LS TTL for the 74181 and standard TTL for the 74182. For the 74LS181, from inputs valid to group propagate/generate valid takes 20 ns typical
(30 ns maximum
). In this case,
the propagate is slightly slower than the generate, so this is the signal
that really determines the delay.
Using a standard TTL 74182, the delay from group propagate/generate in to valid carry-outs is 13 ns typical, 22 ns worst case. Returning to the 74LS181, the last piece of the critical path is the delay from carry-in valid to sums valid. This is 15 ns typical, 26 ns worst case.
So the typical delay is 40
) + 20
) + 15
= 103 ns. The worst-case delay is 60 + 33 + 30 + 22 + 26 = 171 ns. There
is a significant difference between the worst case and the typical performance.
Also, the delay can be significantly reduced by using a faster TTL family,
such as S or AS logic.