In this section, we will see how to apply the principles and components of arithmetic circuits to implement a subsystem of moderate complexity. Our objective is to design a fast 8-by-8 bit multiplier using 4-by-4 bit multipliers as building blocks, along with adders, arithmetic logic, and carry look-ahead units.

First, we denote the two 8-bit magnitudes to be multiplied
as *A*7-0 and *B*7-0 and the 16-bit product that results as
*P*15-0. We can partition *A* and *B* into two 4-bit
groups, *A*7-4, *A*3-0, *B*7-4, *B*3-0, and
form their 16-bit product as a sum of several 8-bit products:

To see how this works, let's examine the multiplication of the 8-bit binary numbers 111100102 and 100011002. These correspond to the decimal numbers 242 and 140, respectively.

As a check, we see that 242 * 140 = 33880, which is
equal to 10000100010110002. `(`

See Appendix A to review base
conversions.```
)
```

The hardware implementation follows directly from this observation. It requires four 4-by-4 multipliers, implemented as in Figure 5.31, plus logic to sum the four-bit wide slices of the partial products.

Let's call the four 8-bit partial products PP0, PP1, PP2, and PP3. Then the final product bits are computed as follows:

Of course, any carry-out of the calculation of *P*7-4
must be added to the sum for *P*11-8, and likewise for the carry-out
of *P*11-8 to *P*15-12.

`(`

1`)`

the calculation of partial products, `(`

2`)`

the summing
of the 4-bit product slices, and `(`

3`)`

the carry
look-ahead unit. We examine each of these in turn.**Calculation of Sums** The low-order
4 bits of the final product, *P*3-0, are the same as PP03-0 and do
not participate in the sums. *P*7-4 and *P*11-8 are sums of
three 4-bit quantities. How do we compute these?

Figure 5.32 shows a way to cascade full adders to implement a function
that sums three 4-bit quantities, denoted *A*3-0, *B*3-0,
and *C*3-0. `(`

Watch that you don't confuse the variable
*C*i with adder carry-ins.`)`

The first level of full
adders sums 1 bit from each of the three numbers to be added. We accomplish
this by using the carry input as a data input. The second-level adders combine
the carry-out from the next lower order stage with the sum from the first-level
adder. The carries simply propagate from right to left among the second-level
adders. This is just like the carry propagations we needed in the 4-by-4
multiplier of Figure 5.28.

Figure 5.33 shows how the logic of Figure 5.32 can be implemented with TTL components. The first-level full adders are provided by 74183 dual binary adders. The second-level adders are implemented by a 74181 arithmetic logic unit, configured for the adder function. This has the extra performance advantage of internal carry look-ahead logic. Note that the ALU block is written in its positive logic form, with positive logic data inputs and outputs and negative logic carry-in and carry-out.

Figure 5.33 provides the basic building block we can
use to implement bit slices *P*7-4 and *P*11-8 for the result
products. This is shown in Figure 5.34.

The rightmost 74181 component and its two associated 74183s implement
bit slice *P*7-4. The logic is cascaded with an identical block of
components to implement bit slice *P*11-8.

Figure 5.34 also includes the implementation of slice
*P*15-12. The final slice is formed from the partial product PP37-4,
plus any carry-outs from lower-order sums. We implement this using a 74181
component configured as an adder, with the *B* data inputs set to
0, the *A* inputs set to the partial product, and the carry-in coming
from the adjacent adder block.

**Putting the Pieces Together** The
last step in the design combines the multiplier block with the accumulation
block. To further improve the performance, the carries between the 74181s
can be replaced with a 74182 carry look-ahead unit.

The generate/propagate outputs of the three 74181s are wired to the corresponding
inputs of the 74182 carry look-ahead unit. The component is drawn with positive
logic generate and propagate inputs and negative logic carries. This matches
the notation used for the ALUs. The *C*n input is wired high, matching
the carry-in to the lowest order 74181. The generated *C*n + x and
*C*n + y carries are routed to the carry-in of the middle and high-order
74181s, respectively.

**Package Count and Performance** In
terms of package count, the complete implementation uses four 74284/74285
multipliers `(`

eight packages`)`

, four 74183 full
adder packages, three 74181 arithmetic logic units, and one 74182 carry
look-ahead unit. This is a total of 16 packages.

A circuit this complex is far too complicated to analyze
by simply counting gate delays. We start by identifying the *critical
delay path*. This is the sequence of propagated signals that limits
the performance of the circuit. Once we have determined the critical path,
the TTL catalog will provide us with signal delays associated with the individual
packages in our implementation.

The first step in the critical path is the calculation
of the partial products by the 74284/285 multipliers. Assuming standard
TTL components, the typical delay from the arrival of the inputs to valid
outputs is 40 ns `(`

60 ns maximum`)`

.

The next step in the critical path is the formation of
the intermediate sums by the 74183s. We assume LS TTL for these packages.
Since the typical adder delay is sensitive to the final value of the sum
output, 9 ns for a low-to-high transition and 20 ns for a high-to-low output
transition, it is reasonable to average these to get 15 ns. For worst-case
delay, we should use the worst-case maximum, which is 33 ns.

The final leg of the critical path is the calculation
of the second-stage sums using the carry look-aheads. This consists of three
pieces: `(`

1`)`

calculation of the group propagates/generates
in the 74181s, `(`

2`)`

calculations of the carry-outs
by the 74182 after the propagates and generates become valid, and `(`

3`)`

calculation of the final sums in the 74181s once the carries are valid.

We assume LS TTL for the 74181 and standard TTL for the
74182. For the 74LS181, from inputs valid to group propagate/generate valid
takes 20 ns typical `(`

30 ns maximum`)`

. In this case,
the propagate is slightly slower than the generate, so this is the signal
that really determines the delay.

Using a standard TTL 74182, the delay from group propagate/generate
in to valid carry-outs is 13 ns typical, 22 ns worst case. Returning to
the 74LS181, the last piece of the critical path is the delay from carry-in
valid to sums valid. This is 15 ns typical, 26 ns worst case.

So the typical delay is 40 `(`

multipliers`)`

+ 15 `(`

full adders`)`

+ 20 `(`

generate/propagate`)`

+ 13 `(`

carry-outs`)`

+ 15 `(`

sums`)`

= 103 ns. The worst-case delay is 60 + 33 + 30 + 22 + 26 = 171 ns. There
is a significant difference between the worst case and the typical performance.
Also, the delay can be significantly reduced by using a faster TTL family,
such as S or AS logic.

randy@cs.Berkeley.edu;