Title of Invention

"DSP WITH DUAL-MAC PROCESSOR AND DUAL-MAC PROCESSOR"

Abstract Thie invention is a digital signal processor architecture that is designed to speed up frequently-used signal processing computations, such as FIR filters, correlations, FFTs, and DFTs,The architecture uses a coupled dual-MAC architecture coprocessor(MAC3), MAC4) onto it in a unique way to achieve a significant increase in processing capability.
Full Text BACKGROUND OF THE INVENTION
Technical Field
[1001] This invention relates to digital signal processors, and has particular relation to multiply-accumulate (MAC) units.
Background Art
[1002] Digital Signal Processors (DSPs) are spedalized types of. microprocessors that are specifically tailored to execute mathematical computations very rapidly. DSPs can be found in a variety of applications including compact disk players, PC disk drives, telecommunication modem banks, and cellular telephones-
[1003] In the cellular telephone context, the demand for DSP computation capability continues to grow, driven by live increasing needs of applications such as GPS position location voice recognition, low-bit rate speech and audio coding, image and video processir and 3G cellular modem processing. To meet these processing demands, there is a need for improved digital signal processor architectures that can process computations more efficiently.
[1004] Considerable work has been done in these areas. Applicant Sih is also an applicant in the following applications for United States patents:

"Signal Processor With Coupled Multipiy-Accumulate Units", filed concurrently herewith;
"Multiple Bus Architecture in a Digital Signal Processor", Serial No. 09/044,087, filed March 18,1998;
"Digital Signal Processor Having Multiple Access Register", Serial No. 09/044,088, filed March 18,1998;
"Memory Efficient Instruction Storage", Serial No. 09/044,089, filed March 18,1998;
"Highly Parallel Variable Length Instructions for Controlling a Digital Signal Processor", Serial No- 09/044,104, filed March 18,1998;
"Variable Length Instruction Decoder", Serial No, 09/044,086, filed March 18,1998; and
"Digital Signal Processor with Shiftable Multiply Accumulate Unit", Serial No". 09/044,108, filed March 18,1998.
The disclosure of these applications is incorporated herein by reference.
BRIEF DISCLOSURE OF THE INVENTION
[1005] The invention is a digital signal processor architecture that is designed to speed up frequently-used signal processing computations, such as FIR filters, correlations, FFTs, and DFTs. The architecture uses a coupled dual-MAC architecture and attaches a dual-MAC coprocessor onto it in a unique way to achieve a significant increase in processing capability.
BRIEF DESCRIPTION OF THE DRAWINGS

[0006] Fig 1 is a block diagram of the new architecture.
[1007] FIG. 2 shows the first configuration of the invention, in FIR Filter and Correlation Mode.
[10081 FIG. 3 is a logical diagram of the FIR Filter and Correlation Acceleration Mode.
[10091 FIG. 4 shows another configuration, the Single-Cycle Complex Multiply Mode.
[1010] FIG. 5 shows yet another configuration, the Single-Cycle Complex Multiply-Accumulate Mode.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
[1011] FIG. 1 is a block diagram of the new arctutecture. In a narrow embodiment of the invention, an electronic drcuit includes a register file (100) having first through third inputs (PI1-PI3) and first through sixth outputs (POl-P06), A first shifter (102) receives the first output (POl) of the register file, a first multiplier (104) receives the second (P02) and third (P03) outputs of the register file, and a second multiplier (106) receives the fourth (P04) and fifth (P05) outputs of the register file. A second shifter (108) receives the output of the first multiplier (104), and a third shifter (110) receives the output of the second multiplier (106). A rounding multiplexer (112) receives the output of the first shifter (102), and a first adder (114) receives, at a first input, the output of the second shifter (108). A first multiplexer (116) receives either a zero or the output of the third shifter (110), and applies an output to a second input of the first adder (114). A second adder (118) receives the outputs of the rounding multiplexer (112) and the first adder (114), and the output of the second adder (118) is fed back to the first input (PIl) of the register file. A tiiird adder (120) receives the outputs of

third shifter (110) and the sixth regsiter output (P06), and the output of the third adder (120) is fed back to the second input (PI2) of the register file.
[1012] A first input storage element (122) receives the third output (P03) of the register file. A second multiplexer (124) receives the output of the first input storage element (122) and the third output (P03) of the register file, and a third multiplexer (126) receives the second (P02) and fifth (P05) outputs of the register file. A third multiplier (128) receives the ouuts of the second (124) and third (126) multiplexers, and a fourth shifter (130) receives the output of the third multiplier (128). A fourth adder (132) recdves, at a firet input, the output of the fourth shifter (130), and a first output storage element (134) receives the output of the fourth adder: (132). The out of the first output storage element (134) is applied to a second input of the fourth adder (132).
[1013] A fourth multiplexer (136) receives the outputs of the first input storage element (122) and the fourth output (P04) of the register file, and a second input storage element (138) receives the output of the fourth multiplexer (136). A fifth multiplexer (140) receives the output of the second input storage element (138) and the fourth output (PCM) of the register file, and a fourth multiplier (142) receives the output of the fifth multiplexer (140) and the second out (P02) of the register file. A fifth shifter (144) receives the output of the fourth multiplier (142), and a fifth adder (146) receives, at a first input, the output of the fifth shifter (144). A second output storage element (148) receives the output of the fifth adder (146). The output of the second out storage element (148) is applied to a first input of a sixth multiplexer (150). The sixth multiplexer (150) receives the output of the fourth shifter (130) at a second input, and the output of the sixth multiplexer (150) is applied to a second input of the fifth adder (146). The output of the fifth adder (146) is also fed back to the third input (PI3) of the register file. the multiplexers are externally controlled.
[1014] The present invention, in its broadest embodiment, does not require all of the above components, Indeed, it is sufficient that the electronic drcuit merely include a register file (100) including at least one input and at least four outputs

(P02-P05); that the electronic circuit further include first (104), second (106), third (128), and fourth (142) multipliers, each having at least two inputs; that it also indude first (118), second (120), third (132), and fourth (146) adders, each adder having, as a first input, an output of the corresponding multiplier (note that these first through fourth adders are the second through fifth adders of the more detailed device); and that the electronic circuit also include means (124), (126), (136), (140) for associating the outputs of the register file with the inputs of at least some of the multipliers, and means (112), (116), (150) for associating another input of at least some of the adders with an output of another multiplier, or with an output of the register file. It is this feature which causes the multipliers, adders, and register file to operate, together, in a single dock cycle.
[1015] Preferably, the number of register file outputs to the multipliers is four.
[1016] It is also preferred that the electronic circuit further indude at least one input storage element (128), (138). The input of the input storage element is connected to an output (P03) of the register file or to an output of another input storage element (122). The output of the input storage element is connected to an input of at least one of the multliers (128), (142) or to an input of another input storage element (138). The multipliers/ adders, input storage elements, and register file operate, together, in a single dock cycle.
[1017] While the invention will work with only a single input storage element, it is preferred that there be a plurality of input storage elements (122), (138).
[1018] The electronic circuit further preferably indudes at least one output storage element (134), (148), connected to an output of at least one of the adders (132), (146). The multipliers, adders, output storage elements, and register file operate, together, in a single dock cyde.
[1019] It is preferred that the output storage element or elements (134), (148) be external to the register file (100).
[1020] FIG. 1 is, as noted above, a block diagram of the new architecture. The core architecture contains a coupled dual-MAC structure composed of MAC units

MACl and MAC2. MACl fetches its multiplier operands from output ports P02 and P03 of the register file The output of the multiplier (104) is passed to a shifter (108) that can shift the result left by 0,1, 2, or 3 bits. The output of the shifter (108) is passed to an adder (114) that takes its other input from a multiplexer, MUXl (116), that has zero and the result of the shifted product from MAC2 as its inputs. The output of the adder (114) is passed into a 40-bit adder (118) than can add another 40-bit operand fetched from output port PO1 of the register file. The output of the 40-bit adder is stored into the register file via input port P11. MAC2 fetches multiplier operands from register file output ports PCM and P05, multiplies them (106), and shifte (110) the result left by 0,1,2, or 3 bits. The shifter output is passed to a 40-bit adder (120) that can add an additional register file operand fetched from output port F06. The shifter output is also sent to the multiplexer, MUXl (116) that feeds the first adder (114) in MACl. The output of the 40-bit adder (120) is stored into the register file via register file input port P12.
[1021] The coprocessor consists of multiply accumulate units MACS and MAC4 that have been connected to the core dual-MAC structure and the register file in a unique way. The inputs to MACS and MAC4 can be configured (via multiplexers MUX2 (124), MUX3 (126), and MUX5 (140)) to be taken from register file output ports P02, P03, P04, P05, or the delay line composed of 16-bit registers Bl (122) and IS2 (138). The output of the shifted product iiv MACS can be fed into MAC4 via MUX6 (150). Alternatively, the 40-bit adders in MACS and MAC4 can take an input from their local )-bit accumulator registers OSl (134) and OS2 (148), respectively. The output of MAC4 can be written to the register file via input port P13. The programmer can set up the multiplexers in the diagram to any desired configuration. by executing certain program instructions, which allows the 4 MAC units to be flexibly configured to speed up several different types of computations. Several of these configurable modes are described below.

[10221 FIG. 2 shows the first configuration of the invention, in FIR Filter and Correlation Mode. It can be used to speed up FIR filters and correlation operations. To see why this configuration speeds up FIR filtering, we examine the equation for implementing an FIR filter:
N-l
If we write out the equations for four consecutive outputs, we have
y(n) = h(0)x(n) + h(l)x(n-l) + h(2)x(n-2) + ... + h(N-l)x(n-N+l)
y(n+X) = h(0)x(n+i) + h(l)x(n) + h(2)x(n-l) +... + hCN-l)x(n-N+2)
y(n+2) = h(0)x(n+2) + h(l)x(n+l) + h(2)x{n) + ... + h(N-l)x(n-N+3)
y(n+3) = h(0)x(n+3) + h(l)x(n+2) + h(2)xCn+l) +... + h(N-l)x(n-N+4)
To compute these four quantities simultaneously, the same coefficient h(k) is fed to all four multipliers simultaneously, while the other input is fed through a delay line to each of the multipliers. This is shown in the logical implementation diagram shown in FIG. 3.
[1023] FIG. 3 is a logical diagram of the FIR Filter and Correlation Acceleration Mode. To achieve this configuration using the hardware setup shown in FIG. 2, the programmer must execute the proper instructions for the 2 core MACs to insure correct operation. The same register fetched from register file output port P02, and used as an input to each of the coprocessor MAC units, must also be fetched from output port F04. The programme must also insure that the register fetched out of P05 is delayed by one cyde (152) using a parallel register move before being sent out on P03 and propagated through the hardware delay line composed of registers Bl (122) and IS2 (138). The two core MACs perform accumulation by fetching and storing results to the register file (100), while the coprocessor MACs perform accumulation using their local OSl (134) and OS2 (148) accumulators. This configuration can also be used to speed up correlation operations in a similar manner.

110241 FIG. 4 shows another configuration, the Single-Cycle Complex Multiply Mode. To speed up FFTs and CDMA symbol demodulation, the coprocessor can also be configured to perfonn single-cycle complex multiplies. As the inputs to each of the core MAC units are sent to the coprocessor to perfonn the cross-term multiplies, the outputs of the core MAC units are added together and sent to the register file (100) via input port P11, while the outputs of the coprocessor MAC units are added together and stored into the register file (100) via input port PIS.
[1025] FIG. 5 shows yet another configuration, the Single-Cycle Complex Multiply-Accumulate Mode. The complex multiply-accumulate configuration is useful for speeding up DFTs and accumulations of 32 x 32 multiplies. As shown in FIG. 5, the multiplier input connections are set up like the single-cyde complex multiply, but the accumulations are set up like the FIR filter acceleration mode. In the 32x32 MAC case, the core MACs are performing signed-signed and unsigned-unrsigned multiplies, while the coprocessor MACs are performing signed-unsigned multiplies.
[1026] The use of 40-bit adders and 17x17 bit multipliers is shown. This is conventional, but any convenient number of bits may be used.
V
Industrial Application
[1027] This invention is capable of exploitation in industry, and can be made and used, whenever is it desired to sped up signal processing computations. The individual components of the apparatus and method shown herein, taken separate and apart from one another, may be entirely conventional, it bang their combination that is claimed as the invention.
[1028] While various modes of apparatus and method have been described, the true spirit and Bcope of the invention are not limited thereto, but are limited only by the following claims and their equivalents, and such are claimed as the invention.


WE CLAIM :
1. An electronic circuit, characterized in that it comprises :
(a) a register file having first through third inputs and first through sixth outputs;
(b) a first shifter receiving the first output of the register file;
(c) a first multiplier receiving the second and third outputs of the register file and having an output;
(d) a second multiplier receiving the fourth and fifth outputs of the register file and having an output;
(e) a second shifter receiving the output of the first multiplier and having an output;
(f) a third shifter receiving the output of the second multiplier and having an output;
(g) a rounding multiplexer receiving the output of the first shifter and having an output;
(h) a first adder receiving, at a first input, the output of the second shifter
and having an output;
(i) a first externally-controlled multiplexer receiving either a zero or the
output of the third shifter, and applying an output to a second input of the first
adder;
(j) a second adder receiving the outputs of the rounding multiplexer and the
first adder, and havmg an output which is fed back to the first input of the
register file;
(k) a third adder receiving the outputs of the third shifter and the sixth
register output, and having an output which is fed back to the second input of
the register file;
(1) a first input storage element receivmg the third output of the register
file;

(m) a second externally-controlled multiplexer receiving the output of the
first input storage element, and the third output of the register file;
(n) a third externally-controlled multiplexer receiving the second and fifth
outputs of the register file;
(o) a third multiplier receiving the outputs of the second and third
externally-controlled multiplexers;
(p) a fourth shifter receiving the output of the third multiplier;
(q) fourth adder receiving, at a first input, the output of the fourth shifter;
(r) first output storage element receiving the output of the fourth adder, the
output of the first output storage element being applied to a second input of the
fourth adder;
(s) fourth externally-controlled multiplexer receiving the [outputs] output
of the first input storage element and the fourth output of the register file;
(t) second input storage element receiving the output of the fourth
externally-controlled multiplexer;
(u) fifth externally-controlled multiplexer receiving the output of the
second input storage element and the fourth output of the register file;
(v) fourth multiplier receiving the output of the fifth externally-controlled
multiplexer and the second output of the register file;
(w) fifth shifter receiving the output of the fourth multiplier;
(x) fifth adder receiving, at a first input, the output of the fifth shifter;
(y) second output storage element receiving the output of the fifth adder, the
output of the first output storage element being applied to a first input of a sixth
externally-controlled multiplexer, the sixth externally-controlled multiplexer,
receiving the output of the fourth shifter at a second input, and the output of the
sixth externally-controlled multiplexer being applied to a second input of the
fifth adder.

2. An electronic circuit wherein:
(a) the electronic circuit comprises a register file having at least one input and at least four outputs; and
(b) the electronic circuit is characterized in that is comprises :

(1) first, second, third, and fourth multipliers, each having at least two inputs;
(2) first, second, third, and fourth adders, each adder having as a first input an output of the corresponding multiplier;
(3) means for associating the outputs of the register file with the inputs of at least some of the multipliers; [and]
(4) means for associating another mput of at least some of the adders with an output of another multiplier, or with an output of the register file; and
(5) at least one input storage element:
(a) whose input is connected to:
(1) an output of the register file; or
(2) an output of another input storage element; and
(b) whose output is connected to;
(1) an input of at least one of the multipliers; or
(2) an input of another input storage element; whereby the multipliers, adders, input storage elements and register file operate, together, in a single clock cycle.

3. The electronic circuit as claimed in claim 2, wherein the number of register file outputs to the multipliers is four.
4. The electronic circuit as claimed in claim 2, wherein there is a plurality of input storage elements.

5. An electronic circuit wherein:
the electronic circuit comprises a register file having at least one input and at least four outputs; and
the electronic circuit is characterized in that is comprises:
first, second, third, and fourth multipliers, each having at least
two inputs;
first, second, third, and fourth adders, each adder having as a first
input an output of the corresponding multiplier;
means for associating the outputs of the register file with the inputs of at least some of the multipliers;
means for associating another input of at least some of the adders with an output of another multiplier, or with an output of the register file; and
at least one output storage element, connected to an output of at least one of the adders, whereby the multipliers, adders, output storage elements, and register file operate, together, in a single clock cycle.
6. The electronic circuit as claimed in claim 5, wherein the output storage element
is external to the register file.

Documents:

in-pct-2002-1326-che abstract-duplicate.pdf

in-pct-2002-1326-che abstract.pdf

in-pct-2002-1326-che assignment.pdf

in-pct-2002-1326-che claims-duplicate.pdf

in-pct-2002-1326-che claims.pdf

in-pct-2002-1326-che correspondence-others.pdf

in-pct-2002-1326-che correspondence-po.pdf

in-pct-2002-1326-che description (complete)-duplicate.pdf

in-pct-2002-1326-che description (complete).pdf

in-pct-2002-1326-che drawings-duplicate.pdf

in-pct-2002-1326-che drawings.pdf

in-pct-2002-1326-che form-1.pdf

in-pct-2002-1326-che form-18.pdf

in-pct-2002-1326-che form-26.pdf

in-pct-2002-1326-che form-3.pdf

in-pct-2002-1326-che form-5.pdf

in-pct-2002-1326-che others.pdf

in-pct-2002-1326-che pct.pdf

in-pct-2002-1326-che petition.pdf


Patent Number 212861
Indian Patent Application Number IN/PCT/2002/1326/CHE
PG Journal Number 13/2008
Publication Date 31-Mar-2008
Grant Date 17-Dec-2007
Date of Filing 22-Aug-2002
Name of Patentee QUALCOMM INCORPORATED
Applicant Address 5775 MOREHOUSE DRIVE, SAN DIEGO, CALIFORNIA 92121-1714,
Inventors:
# Inventor's Name Inventor's Address
1 LEE, Way-Shing 8555 Foucaud Way, San Diego, California 92129-4123,
2 SIH, Gilbert, C 7804 Pipit Place, San Diego, California 92129,
3 KUMAR, Hemant 10318 Water Ridge Circle #278, San Diego, California 92121,
PCT International Classification Number G06F 7/00
PCT International Application Number PCT/US2001/005871
PCT International Filing date 2001-02-23
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 09/513,979 2000-02-26 U.S.A.