Title of Invention

"A METHOD FOR CONTROLLING POWER CONSUMPTION IN A MICROPROCESSOR"

Abstract A system for controlling power consumption in a microprocessor. The microprocessor fetches an instruction from memory. The instruction is decoded, producing an operation flow of at least one operation. Then, power micro-operations are introduced into the operation flow. These power micro-operations provide power consumption control functions for those functional units which are required to execute the various operations which have been decoded from the fetched instruction. The operations and power micro-operations are then scheduled for dispatch to the appropriate execution units. The scheduling is based on the availability of the appropriate execution units and the validity of operation data. The operations and power micro-operations are dispatched to the appropriate execution units, where the operations and power micro-operations are executed. The execution results are subsequently committed to the processor state in the original program order.
Full Text The present invention pertains to the field of integrated circuits. More particularly/ this invention relates to method for controlling power consumption in a microprocessor.
BACKGROUND OF THE RELATED ART
Advances in silicon process technology and microprocessor architecture have led to very complex mic'rbprocessors containing millions of transistors. As the complexity of these microprocessors increases, the power consumed by these devices increases as well. Increased power consumption results in increased operating costs and computer system over-heating problems, which lead to reliability problems. Also, the increased power consumption leads to shorter battery life in battery operated computers.
Methods for controlling power consumption include causing the clock signal coupled to the microprocessor to stop when the microprocessor is idle,

or by reducing the frequency of the microprocessor clock. The problem with reducing the microprocessor clock frequency or stopping the microprocessor clock is that both methods negatively affect system performance. More particularly, software control is required to monitor system events, which increases system overhead. Also, reduced clock frequency leads directly to reduced microprocessor performance. Further, the system overhead requirements increase dramatically in multi-processor systems, as the operating system needs to track the activity of each microprocessor.
Another method for controlling power consumption in a microprocessor includes placing circuitry throughout the microprocessor that
tracks events that occur within the microprocessor. The circuitry is designed to detect the occurrence of certain events within the microprocessor, and to take steps to perform power management functions, typically shutting down a clock signal feeding a particular functional unit within the microprocessor.
However, this approach has some drawbacks. The power management circuitry in prior microprocessors is spread throughout the microprocessor. There is no single control unit that takes care of all power management functions. Because the power management circuitry is spread throughout the microprocessor, power management circuit design becomes very difficult, and the circuitry becomes difficult to debug. In particular, microprocessors which perform out-of-order execution of operations cause power management circuit design and debug difficulties because the order of operation execution cannot be predetermined. Also, the power saving abilities of the power management circuitry are limited by the amount of intelligence designed into the circuitry. As more intelligence is designed into these circuits, the circuits become larger and require more die real estate, which increases manufacturing costs.
The previously discussed limitations of prior systems for controlling power consumption in microprocessors, including software overhead, reduced performance, and circuit design and debug difficulties, result in computer systems that suffer from reduced performance, inefficient power management, and increased cost.
SUMMARY OF THE INVENTION
The present invention provides a method and apparatus for controlling power consumption in a microprocessor.
In the present invention, a microprocessor fetches an instruction from memory. The instruction is then decoded, and at least one operation is decoded from the instruction. Power micro-operations are introduced into the operation flow that provide control functions for those functional units which are required to execute the various operations which have been decoded from the fetched instruction. The operations and power micro-operations are scheduled for dispatch to the appropriate execution units. Once scheduled, the operations and power micro-operations are dispatched to the appropriate execution units, where the operations and power micro-operations are executed.
The present invention relates to a method for controlling power consumption in a microprocessor, comprising the steps of: fetching an instruction; decoding the instruction to produce at least one operation, thereby creating an operation flow; inserting at least one power micro-operation into the operation flow, the power micro-operation providing power consumption control for at least one functional unit required to execute the operation; scheduling the operation and power micro-operation for dispatch; and dispatching the operation and power micro-operation to appropriate execution units for execution.
The present invention also relates to a method for dynamically controlling functions in a microprocessor comprising the steps of: fetching an instruction; decoding the instruction to produce at least one operation, thereby creating an operation flow; inserting at least one dynamic control micro-operation into the operation flow, the dynamic control micro-operation providing additional control for at least one functional unit required to execute the operation; scheduling the operation and dynamic control micro-operation for dispatch; and dispatching the operation and dynamic control micro-operation to appropriate execution units for execution. BRIEF DESCRIPTION OF THE

ACCOMPANYING DRAWINGS
FIG. 1 depicts a method for fetching, decoding, executing, and writing results in one embodiment of a microprocessor implemented in accordance with the teachings of the present invention.
FIG. 2 is a flow diagram illustrating the operation of one embodiment of a microprocessor performing out-of-order dispatch and execution of operations configured in accordance with the teachings of the present invention.
FIG. 3 is a block diagram of one embodiment of a microprocessor configured in accordance with the teachings of the present invention.
FIG. 4 is a block diagram of one embodiment of an instruction decoder configured in accordance with the teachings of the present invention.

DETAILED DESCRIPTION
Methods and apparatus for controlling power consumption in a microprocessor are disclosed. In the following description, for purposes of explanation, specific nomenclature is set forth to provide a thorough understanding of the present invention. However, it will be apparent to one skilled in the art that these specific details are not required to practice the present invention. In other instances, well known circuits and devices are shown in block diagram form to avoid obscuring the present invention unnecessarily.
For the purposes of illustration, the present invention is described in the context of an architecture and instructions compatible with the Intel® microprocessor architecture (Intel is a registered trademark of Intel Corporation). However, it is contemplated that the present invention may be practiced with other instruction sets and architectures as well, including reduced instruction set computers (RISC). In the following description, the exemplary instructions supplied to the decoder (termed "macroinstructions") have the well-known format of the Intel instruction set which is described, for example, in the i486™ Programmers Reference Manual 1990, available from Intel Corporation, and the Pentium™ Processor User's Manual, Volume 3, 1994, also available from Intel Corporation.
Although the following embodiment is described using a floating point addition instruction and a floating point execution unit, the present invention may be practiced with other types of instructions and functional units as well. For example, the present invention may be practiced to enable a functional unit that accesses memory when an instruction is decoded that requires a memory access. In this manner, a microprocessor implemented in accordance with the present invention can efficiently control power consumption, since only the required functional units are enabled at a given time. Further, the present invention may be practiced to offer other types of power consumption control to the functional units other than turning the units on or off.
Further, although the following embodiment is described in the context of power management control, the present invention may be practiced to provide other forms of dynamic control to the microprocessor by way of dynamic control micro-operations. These dynamic control micro-
operations perform additional control functions not necessarily dictated by the instructions. For example, the present invention may be practiced to dynamically change cache size or to adjust bus protocols depending on the instruction to be executed.
Fig. 1 depicts one embodiment of a method for fetching, decoding, executing, and writing results in accordance with the present invention. The method of Fig. 1 includes the insertion of power micro-operations, or POps, into the operation flow. The POps provide system-transparent power management control functions which enable and disable various functional units depending on the requirements of the operation flow. Referring again to Fig. 1, in the Fetch Instruction step 110, a macroinstruction is fetched from memory . In this example, the macroinstruction fetched is FADD m32 real, which adds a 32 bit floating point number stored in memory to a 32 bit floating point number stored on top of the stack, and stores the results on top of the stack. Once the macroinstruction is fetched from memory, it is decoded at the Decode Instruction/Insert POps step 120. At this step, the FADD m32 real macroinstruction is broken down into a number of micro-operations (Ops). In this example, the Ops specify that the contents of the stack be placed in a floating point register FPRl, that the contents of memory location m32 real be moved to another floating point register FPR2, that the contents of FPRl and FPR2 be added, and that the results of the addition are stored on top of the stack. In addition, two POps are inserted into the operation flow. To determine which PuOps to insert, at the Decode Instruction/Insert POps step 120, the macroinstruction is examined to determine which functional units are required in order to execute the decoded macroinstruction. For example, the POps specify that the Floating Point Execution Unit (FEU) is turned on before the floating point addition occurs and turned off again after the floating point addition has completed.
One feature of the system and method of the present invention is that the method for controlling power consumption is transparent, meaning that no software or operating system support is required. The operating system is unaware of the POps, and the power consumption control activities provided by the POps occur independent of any operating system or other software program. Since the operating system need not monitor the power consumption control activities provided by the POps, the system software
"overhead is reduced, providing performance advantages over prior systems and methods.
The system and method of the present invention may be implemented in a microprocessor which performs out-of-order dispatch and execution of operations. Fig. 2 is a flow diagram illustrating the operation of one embodiment of a method for performing out-of-order dispatch and execution of operations. At step 210, instructions are fetched. After the instructions are fetched, the instructions are decoded at step 215 and the appropriate POps are inserted into operation flow. At step 220, the decoded operations and POps are issued in their original program order. At step 225, the operations are scheduled. The operations may be scheduled to be dispatched and executed out-of-order, depending on operation data dependencies and execution unit availability. The POps are scheduled in a way that ensures that the appropriate execution units are enabled when the corresponding operations are executed. At step 230, the operations and POps are dispatched to the appropriate execution units. The operations and POps are executed at step 235, and at step 240 the results of the executed operations are committed to the processor state in the original program order.
Fig. 3 shows a block diagram of one embodiment of a microprocessor implemented in accordance with the present invention. In the present embodiment, the Instruction Fetch Unit (IFU) 320 fetches macroinstructions. The macroinstructions may reside in main memory, which is accessed over the External Bus 301 via the Bus Unit 310, or may reside in the L2 Cache 360, which is accessed by way of the Backside Bus 302. Alternatively, the macroinstructions may reside in an instruction cache located within the microprocessor.
Once fetched, the macroinstructions are sent to the Issue Cluster 330, which includes an Instruction Decoder (ID) 332, a Micro-operation Sequence^ (MS) 334, a Register Alias Table (RAT) 336, and an Allocator (ALLOC) 338. The ID 332 converts a stream of macroinstructions into Ops, and inserts appropriate POps into the operation flow.
For macroinstructions requiring long uOp flow for decoding, the MS 334 is used to sequence the macroinstructions. The RAT 336 performs register renaming by mapping logical registers to physical registers in the Re-Order Buffer (ROB) 364, and the ALLOC 338 assigns and tracks resources for operation for the Out-of-Order Unit 360 and the Memory Unit 350.

The Out-of-Order Unit 360 contains the Reservation Station (RS) 362 and ROB 364. The ROB 364 includes a Real Register File (RRF) 365 that defines the architectural register set for the microprocessor. The Out-of-Order Unit 360 receives Ops and POps from the Issue Cluster 330. The Out-of-Order Unit 360 merges the in-order stream of Ops and POps with corresponding source data provided by the ROB 364 and captured in the RS 362. The Out-of-Order Unit 360 also performs schedule, dispatch, and retirement functions. In order to perform the schedule function, the RS 362 identifies all ready-to-execute Ops and POps, and selects certain Ops and POps for dispatch to the Execution Cluster 340. The Ops and POps are executed in an execution unit in the Execution Cluster 340, and result data is written back to the Out-of-Order Unit 360. The ROB 364 retires the uOps and POps by transferring the result data to the RRF 365 in the original program order.
In order to perform the issue function implemented in the Issue Cluster 330, control information is written in ROB 364 entries and an associated ready bit is cleared. For each Op and POp, an op code, source data, source/destination addresses, and other control information are written into the RS 362 entries. This control information may include information that binds one Op or POp to another, thus forcing the bound pair to be executed serially. Source data for the execution units originates from either the ROB 364 or a real register contained in the RRF 365. Consequently, source data entries in the RS 362 contain a bit to identify whether the source data is stored in the ROB 364 or in the RRF 365. The valid bit in the ROB 364 indicates whether the corresponding source data entry is valid.
In order to schedule the (lOps and POps for the execution units in the Execution Cluster 340, the RS 362 ascertains which Ops and POps are data ready by evaluating a corresponding data valid bit for each source data. The RS 362 then determines availability of execution units for data ready ps and PjiOps, and schedules the Ops and PuOps based on a priority pointer. The POps are scheduled in a way that ensures that the various functional units required to execute the other Ops stored in the RS 362 are enabled during the required periods of time. The RS 362 may also be implemented to look for situations where a particular functional unit is scheduled to be turned on and off repeatedly for example where a series of floating point operations are scheduled to be executed in the Floating Point Execution Unit(FEU) 343. In such a case, the RS 362 would not schedule the POps to turn on and off the FEU for each floating point operation, but would rather schedule the POps so that the FEU 343 was turned on before executing the first floating point operation in the series and then turned off after the last floating point operation in the series was executed. Also, the POps are scheduled to be executed in parallel with other Ops or PjiOps whenever possible, thus ensuring that any degradation in the otherwise available performance will be minimized. For the scheduled Ops and POps, the RS 362 dispatches the Ops and POps and associated source data to the appropriate execution unit.
Upon completion of execution of Ops and POps in the Execution Cluster 340, the execution units transmit pointer addresses to the ROB 364 prior to writing the actual result data. The pointer addresses identify ROB 364 entries that are the destinations for the result data. Subsequently, the execution unit writes result data to the specified ROB 364 entry. The RS 362 monitors the writing of result data to the ROB 364 in order to capture data required for other Ops.
In order to perform the retirement function, in which the result data is committed to the processor state, a number of consecutive entries are read out of the ROB 364 based on the physical destination identifiers. The entries read from the ROB 364 are candidates for retirement. A Op or a POp is a candidate for retirement if a corresponding ready bit is set, the Op or POp does not cause and exception, and all preceding Op candidates, in the original program order, are eligible for retirement. When lOp or POp is eligible for retirement, the RAT 336 is notified to update its look-up table, which maintains logical and physical register mapping information, and data are transferred from the ROB 364 to the RRF 365. In addition, a retirement pointer is incremented in the ROB 364 to indicate that the ROB entry has retired.
In one embodiment, the Execution Cluster 340 contains five semi-autonomous units: an address generation unit (AGU) 341, and integer execution unit (IEU) 342, a floating point execution unit (FEU) 343, a memory interface unit (MIU) 344, and a Power Unit 345, which executes the POps. Although the Execution Cluster 340 is described in conjunction with five execution units, the Execution Cluster 340 may include any number and type
of execution units without deviating from the spirit and scope of the present invention.
In one embodiment, the power unit 345 executes the POps by enabling or disabling clock signals that are coupled to other functional units throughout the microprocessor. These other functional units may include but are not limited to the FEU, IEU, MIU, and AGU. The power unit 345 may also be implemented to control clock signals which are coupled to registers, queues, caches, etc., thereby allowing the Power Unit to control power consumption in any unused or unneeded functional units. By disabling clock signals, the power consumed in the disabled functional units is dramatically reduced.
As shown in Fig. 3, the IFU 320 is coupled to the ID 332. In the present embodiment, the ID 332 provides multiple decoders so as to decode multiple macroinstructions simultaneously. At each clock cycle, the ID 332 receives macroinstructions from the IFU 320. In turn, the ID 332 translates the macroinstructions into Ops each clock cycle. Also, at each clock cycle appropriate POps are inserted into the operation flow. In addition, the ID 332 is coupled to the MS 334. The ID 332 requests microcode operation from the MS 334 for macroinstructions requiring the decoding of long microcode sequences, as is well known to those skilled in the art.
Fig. 4 is a block diagram of the ID 332 configured in accordance with one embodiment of the present invention. As explained earlier, the ID 332 converts a stream of macroinstructions into Ops, and inserts POps into the operation flow. The ID 332 contains an ID Input Buffer 410. A number of macroinstructions are stored in the ID Input Buffer 410 to generate a queue of macroinstructions for decoding. This queue allows the ID Input Buffer 410 to provide a steady stream of macroinstructions to the Instruction Steering Logic 420, also included in the ID 332. The Instruction Steering Logic 420 directs each macroinstruction to a decoder located within the issue template 430.
The issue template 430 configuration permits parallel decoding of macroinstructions. The issue template 430 specifies the number of decoders and the capabilities of each decoder. For the embodiment illustrated in Fig. 4, the ID 332 contains four decoder blocks, 440, 445, 450, and 455. The decoders 440, 445, 450, and 455 are coupled to an ID Output Queue 460. In the present embodiment, certain decoders decode all types of instructions while other
decoders decode only particular instructions. The issue template 430 is configured such that decoder 440 issues up to four uOps or POps, and decoders 445,450, and 455 issue up to twoOps or POps per clock period. Consequently, up to ten Ops or POps per clock period may be generated in the issue template 430. Although ID 332 is described in conjunction with four decoder blocks, any number of decoder blocks may be implemented without deviating form the spirit and scope of the invention.
The operation of each decoder 440, 445, 450, and 455 is dependent upon the particular macroinstruction set utilized by the processor. In general, each decoder block extracts operand and opcode fields from the macroinstruction in a Field Locator 465, and stores data in alias registers (not shown). Each decoder also contains at least one Translate Programmable Logic Array (XLAT PLA) 470; preferably one PLA for each Op or POp that the decoder is capable of producing. The XLAT PLA 470 operates in parallel with the Field Locator 465, and contains microcode for generating control Ops and POps. An Alias Multiplexor (Alias Mux) 475 merges, as is well known in the art, the control [Ops with data extracted by the Field Locator 465 to generate the [lOps.
Preferably, decoder 440 decodes instructions requiring longer microcode sequencing. Furthermore, macroinstructions having greater than four Ops or POps summons the MS 334 to sequence the ID 332 during long microcode routines. Once the MS 334 completes the sequencing of the long microcode routines, control is returned to the ID 332. The (Ops and POps are issued in the original program order.
The ID Output Queue 460 decouples the decode pipeline from the Out-of-Order Unit 360 pipeline by buffering the decoded Ops and POps. The ID Output Queue 460 attempts to provide a steady flow of Ops and POps each clock cycle. The ID Output Queue 460 permits decoding of instructions even when the Out-of-Order Unit (360, Fig. 3) is stalled. The ID Output Queue 460 compensates for the variable number of uOps and Pps produced per macroinstruction.
In the foregoing specification the invention has been described with reference to specific exemplary embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims. The specification and drawings are accordingly to be regarded as illustrative rather than a restrictive sense.



WE CLAIM:
1. A method for controlling power consumption in a microprocessor,
comprising the steps of:
fetching (320) an instruction;
decoding (330) the instruction to produce at least one operation, thereby creating an operation flow;
inserting (340) at least one power micro-operation into the operation flow, the power micro-operation providing power consumption control for at least one functional unit required to execute the operation;
scheduling (360) the operation and power micro-operation for dispatch; and
dispatching (360) the operation and power micro-operation to appropriate execution units for execution.
2. The method as claimed in claim 1, wherein the step of scheduling
the operation and power micro-operation optionally includes scheduling
the operation and power micro-operation according to availability of
appropriate execution units and operation data.
3. The method as claimed in claim 2, optionally comprising the
steps of:
executing (340) the operation and power micro-operation to generate result data; and
committing result data to the processor state in the original program order when the result data are valid.
4. The method as claimed in claim 3, wherein the step of scheduling
the operation and power micro-operation for dispatch optionally
includes scheduling the power micro-operation such that the
appropriate functional units are enabled before the corresponding
operation is executed and are disabled after the corresponding
operation has been executed.

5. The method as claimed in claim 4, wherein the step of
dispatching the operating and power micro-operation includes
dispatching the power micro-operation to a power unit for execution.
6. The method as claimed in claim 5, wherein the power unit
executes the power micro-operation through manipulation of a plurality
of clock signals coupled to the functional units.
7. A method for controlling power consumption in a micro
processor, wherein the said micro processor optionally comprises:
an out-of-order unit, the out-of-order unit scheduling the operation and power micro-operation for dispatch according to availability of appropriate execution units and operation data and the out-of-order unit committing result data to the processor state in the original program order when result data are valid after the operation and power micro-operation have been executed.
8. The method as claimed in claim 1, wherein the dynamic control
micro-operation provides control to change cache size.
9. The method as claimed in claim 1, wherein the dynamic control
micro-operation provides control to change bus protocol.
10. The method as claimed claim 1, wherein the dynamic control
micro-operation provides power consumption control.
11. A method for controlling power consumption in a microprocessor
substantially as herein described with reference to and as illustrated in
the accompanying drawings.

Documents:

478-del-1997-abstract.pdf

478-del-1997-claims.pdf

478-DEL-1997-Correspondence-Others (15-01-2010).pdf

478-DEL-1997-Correspondence-Others-(28-08-2009).pdf

478-DEL-1997-Correspondence-Others.pdf

478-del-1997-correspondence-po.pdf

478-del-1997-description (complete).pdf

478-del-1997-drawings.pdf

478-del-1997-form-1.pdf

478-del-1997-form-13.pdf

478-DEL-1997-Form-15-(28-08-2009).pdf

478-del-1997-form-19.pdf

478-del-1997-form-2.pdf

478-DEL-1997-Form-26-(28-08-2009).pdf

478-del-1997-form-3.pdf

478-del-1997-form-4.pdf

478-del-1997-form-6.pdf

478-DEL-1997-GPA (15-01-2010).pdf

478-del-1997-gpa.pdf


Patent Number 221600
Indian Patent Application Number 478/DEL/1997
PG Journal Number 31/2008
Publication Date 01-Aug-2008
Grant Date 26-Jun-2008
Date of Filing 25-Feb-1997
Name of Patentee INTEL CORPORATION
Applicant Address 2200 MISSION COLLEGE BOULEVARD, SANTA CLARA, CALIFORNIA 95052, U.S.A.
Inventors:
# Inventor's Name Inventor's Address
1 JOHN WILLIAM BENSON MATES 8945 N.W. OAK STREET, PORTLAND, OREGON 97229, U.S.A.
PCT International Classification Number G06F 1/32
PCT International Application Number N/A
PCT International Filing date
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 08/623,978 1996-03-29 U.S.A.