Baugh-Wooley multipliers

\[ A = (a_{m-1}a_{m-2}\cdots a_1a_0)_2 \quad \text{Signed numbers} \]
\[ B = (b_{n-1}b_{n-2}\cdots b_1b_0)_2 \]

Then

\[
P = A \times B = a_{m-1} \cdot b_{n-1} \cdot 2^{m+n-2} + \sum_{i=0}^{m-2} \sum_{j=0}^{n-2} a_i b_j \cdot 2^{i+j} + 2^{m-1} \cdot \left( -2^n + 2^{n-1} + a_{m-1} \cdot 2^{n-1} + a_{m-1} \cdot b_{n-1} \cdot 2^j \right) \]
\[ + \sum_{j=0}^{n-2} a_{m-1} \cdot b_j \cdot 2^j \]
\[ + 2^{n-1} \cdot \left( -2^m + 2^{m-1} + b_{n-1} \cdot 2^{m-1} + b_{n-1} \cdot \sum_{i=0}^{m-2} a_i \cdot 2^i \right) \]

(1)
The only negative summands in equation (1) are

\[ 2^{m-1}(-2^n+2^{n-1}) + 2^{n-1}(-2^m+2^{m-1}) = 2^{m-1}(-2^{n-1}) + 2^{n-1}(-2^{m-1}) = -2^{m+n-2} - 2^{m+n-2} = -2^{m+n-1} \] (2)

Since \( P = A \times B \) is an \( m \times n \) signed product, the length of \( P \) is \( m+n \) bits. The leftmost bit of the product is its sign bit which has a negative weight factor of \(-2^{m+n-1}\). Thus, the term \(-2^{m+n-1}\) of the right hand side of (2) corresponds to a \(+1\) at the location of the sign bit (or the column-location \( m+n-1 \)).
Example: Construct a 6-by-4 Baugh-Wooley multiplier.

Solution: Here $m = 6$, $n = 4$

\[
\begin{array}{cccccc}
& a_5 & a_4 & a_3 & a_2 & a_1 & a_0 = A \\
\times & b_3 & b_2 & b_1 & b_0 = B \\
\hline
a_4 b_0 & a_3 b_0 & a_2 b_0 & a_1 b_0 & a_0 b_0 \\
a_4 b_1 & a_3 b_1 & a_2 b_1 & a_1 b_1 & a_0 b_1 \\
a_5 b_3 & a_4 b_2 & a_3 b_2 & a_2 b_2 & a_1 b_2 & a_0 b_2 \\
\overline{a_5} & \overline{a_5 b_2} & \overline{a_5 b_1} & a_5 b_0 \\
a_5 & b_3 \\
1 & b_3 & a_4 b_3 & a_3 b_3 & a_2 b_3 & a_1 b_3 & a_0 b_3
\end{array}
\]
Figure 6.16  The schematic logic circuit diagram of a 6-by-4 Baugh–Wooley two's complement array multiplier.
Minimum delay unsigned multipliers based on counters.

\[ A = (a_{n-1} \cdots a_0)_2 \quad B = (b_{n-1} \cdots b_0)_2 \]

\[ A \uparrow n \quad B \downarrow n \]

Minimum delay CSA tree reducing the \( n \) partial products (\( n \) rows or \( n \) numbers) down to two #s (or two rows).

\[ \text{CPA} \]
\[ +2n \]
\[ \rightarrow P = \text{product} \]

The CSA tree can consist of counters like (2,2) counters or Half Adders (HA), (3,2) counters or Full Adders (FA), or in general (k,m) counters.
(2,2) counter \( \circ \) \( \rightarrow \) reduces the column height by one.

(3,2) counter \( \circ \circ \) \( \rightarrow \) reduces the column height by two.

In general

\( (k,m) \) counter \( \cdots \) \( \{ \) \( k \) bits \( \rightarrow \) reduces the column height by \( k-1 \).

\[ m \text{ bits} \]

Question: What is the minimum number of levels of counters that need to be involved in the reduction down to two rows (two numbers)? In other words, how many levels will the minimum delay CSA tree have?

Answer: The answer to the above question can be obtained from the equations relating the maximum \# of operands \( \Theta(v) \) and the \# of levels \( v \) for a minimum delay CSA\((k,m)\) trees.
For example, if a CSA(3,2) tree is to be used consisting of (3,2) counters, (sometimes (2,2) counters can be used if (3,2) counters are not needed), then

\[
\Theta(v) = \left\lfloor \frac{\Theta(v-1)}{2} \right\rfloor \times 3 + (\Theta(v-1) \mod 2) \right\} \quad (1)
\]

\[
\Theta(1) = 3
\]

The sequence of \( \Theta(v) \) numbers obtained from (1) is

\[3, 4, 6, 9, 13, 19, \ldots\]

It is the above sequence of numbers that dictates the number of levels of counters involved in the reduction.
Problem 1: Reduce the matrix of summands of a 16-by-16 unsigned multiplier down to two rows using (3,2) and (2,2) counters. Minimum number of levels and minimum amount of hardware should be used. The sequence of numbers used for such reductions is 3, 4, 6, 9, 13, 19, ...

Solution: The matrix of summands of a 16-by-16 unsigned multiplier consists of 16 partial products (16 rows or 16 numbers). Since $13 < 16 < 19$ six levels of reduction involving (3,2) and (2,2) will be needed.

1st level is 16 down to 13 reduction
2nd level is 13 down to 9 reduction
3rd level is 9 down to 6 reduction
4th level is 6 down to 4 reduction
5th level is 4 down to 3 reduction
6th level is 3 down to 2 reduction

The reductions follow on the next pages.
Total Hayward = 195 Fa and 15 HA

Delay = 6 x Dea
• The cost and delay for the previous 16 down to 2 reduction is:

\[
\begin{align*}
\text{cost} &= 195 \text{ FAs} + 15 \text{ HAs} \\
\text{delay} &= 6 \times D_{\text{FA}}
\end{align*}
\]

where \(D_{\text{FA}} = \text{delay through full adder}\)

• If an unsigned cellular array multiplier had been involved to perform the same task, the cost and delay would have been:

\[
\begin{align*}
\text{cost} &= 15 \text{ HA} + 15 \times 14 \text{ FA} \\
&= 15 \text{ HA} + 210 \text{ FA} \\
\text{delay} &= D_{\text{HA}} + 14 \times D_{\text{FA}}
\end{align*}
\]

where

\[
\begin{align*}
D_{\text{HA}} &= \text{delay through half adder} \\
D_{\text{FA}} &= \text{delay through full adder}.
\end{align*}
\]
Problem 2: Reduce the matrix of summands of a 15-by-15 unsigned multiplier down to two rows using counters with largest counter being the (7, 3) counter. This means that you cannot use (m, k) counters with m > 7 but you can use (m, k) counters with m ≤ 7.

Solution: The counters allowed to be used are:

\[(7, 3) \quad (6, 3) \quad (5, 3) \quad (4, 3) \quad (3, 2) \quad (2, 2)\]

The reduction will involve a 2-level CSA(7, 3) tree performing the 15 down to 3 reduction followed by a CSA (3, 2) performing the 3 down to 2 reduction.
For minimum delay CSA(7,3) trees, the maximum # of operands $\Theta(v)$ to be added and reduced down to three (3) by a v-level tree are given by:

$$\Theta(v) = \left\lfloor \frac{\Theta(v-1)}{3} \right\rfloor \times 7 + (\Theta(v-1)) \mod 3$$

$$\Theta(1) = 7$$

The sequence of $\Theta(v)$ #s obtained from (2) is 7, 15, 35, ...

Thus

1st level is 15 down to 7 reduction \$\text{CSA}(7,3)$

2nd level is 7 down to 3 reduction \$\text{CSA}(3,2)$

3rd level is 3 down to 2 reduction \$\text{CSA}(3,2)$.

The reductions follow on the next page.
Problem 1: Construct a 16-by-16 unsigned multiplier using 4-by-4 unsigned multipliers, counters and one CPA. The largest counter allowed to be used is the (7,3) counter.

Solution: Let the two numbers to be multiplied be

\[ A = a_{15}a_{14}a_{13}a_{12}a_{11}a_{10}a_9a_8a_7a_6a_5a_4a_3a_2a_1a_0 \quad \text{and} \quad a_i \in \{0,1\} \]

\[ B = b_{15}b_{14}b_{13}b_{12}b_{11}b_{10}b_9b_8b_7b_6b_5b_4b_3b_2b_1b_0 \quad \text{and} \quad b_i \in \{0,1\} \]

Then the product \( P = A \times B \) is

\[ P = A_3 \cdot 2^{12} + A_2 \cdot 2^8 + A_1 \cdot 2^4 + A_0 = A \]

\[ \times B_3 \cdot 2^{12} + B_2 \cdot 2^8 + B_1 \cdot 2^4 + B_0 = B \]

\[ A_3 \cdot B_3 \cdot 2^{16} + A_2 \cdot B_2 \cdot 2^{12} + A_1 \cdot B_1 \cdot 2^8 + A_0 \cdot B_0 \cdot 2^4 \]

\[ + A_3 \cdot B_2 \cdot 2^{20} + A_2 \cdot B_1 \cdot 2^{16} + A_1 \cdot B_0 \cdot 2^{12} + A_0 \cdot B_0 \cdot 2^8 \]

\[ + A_3 \cdot B_1 \cdot 2^{24} + A_2 \cdot B_0 \cdot 2^{20} + A_1 \cdot B_1 \cdot 2^{16} + A_0 \cdot B_0 \cdot 2^{12} \]

\[ P_{31} \quad P_{30} \quad \cdots \quad \cdots \quad \cdots \quad \cdots \quad \cdots \quad \cdots \quad \cdots \quad \cdots \quad P_0 \]
16-to-16 unsigned multiplier

Arrangement of the sixteen 8-bit products for the
counters were used. The first level of reduction was a 7 down to 3 reduction.

The largest counter used was the (7,3) counter. The second level was a 3 down to 2 reduction.
Problem 2: Using nine 3-by-2 unsigned multipliers, (3,2) and (2,2) counters and one CPA, construct a 9-by-6 unsigned multiplier. Minimum number of counters should be used. The useful sequence of numbers is 3, 4, 6, 9, 13, 19, ...

Solution: Let \( A \) and \( B \) be 9-bit and 6-bit unsigned numbers respectively.

\[
A = a_8 a_7 a_6 a_5 a_4 a_3 a_2 a_1 a_0 = A_2 2^6 + A_1 2^3 + A_0
\]
\[
B = b_5 b_4 b_3 b_2 b_1 b_0 = B_2 2^4 + B_1 2^2 + B_0
\]

\[
A = A_2 2^6 + A_1 2^3 + A_0
\]
\[
B = B_2 2^4 + B_1 2^2 + B_0
\]

\[
A \times B = A_2 B_0 2^6 + A_1 B_0 2^3 + A_0 B_0
\]
\[
A_2 B_1 2^8 + A_1 B_1 2^5 + A_0 B_1 2^2
\]
\[
A_2 B_2 2^{10} + A_1 B_2 2^7 + A_0 B_2 2^4
\]