Title of Invention

GRAPHICS PROCESSORS AND GRAPHICS PROCESSING METHODS

Abstract A 3D graphics pipeline includes a prefetch mechanism that feeds a cache of depth tiles. The prefetch mechanism may be predictive, using triangle geometry information from previous pipeline stages to pre-charge the cache, thereby allowing for an increase in memory bandwidth efficiency. A z-value compression technique may be optionally utilized to allow for a further reduction in power consumption and memory bandwidth.
Full Text FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENTS RULES, 2003
COMPLETE SPECIFICATION
(See section 10, rule 13)
“TILED PREFETCHED AND CACHED DEPTH
BUFFER"
QUALCOMM Incorporated, a corporation under the laws of state Delaware of United States of America of 5775 Morehouse Drive, San Diego, California 92121-1714, United States of America
The following specification particularly describes the invention and the manner in which it is to be performed.

WO 2006/102380 PCT/US2006/010340

TILED PREFETCHED AND CACHED DEPTH BUFFER
BACKGROUND OF THE INVENTION
1. Field of the Invention
[0001] The present disclosure generally relates to graphics processors, and more
particularly, the present disclosure relates to a 3D graphics pipeline which is contained in a graphics processor.
2. Description of the Related Art
[0002] Graphics engines have been utilized to display three-dimensional (3D)
images on fixed display devices, such as computer and television screens. These engines are typically contained in desk top systems powered by conventional AC power outlets, and thus are not significantly constrained by power-consumption limitations. A recent trend, however, is to incorporate 3D graphics engines into battery powered handheld devices. Examples of such devices include mobile phones and personal data assistants (PDAs). Unfortunately, however, conventional graphics engines consume large quantities of power and are thus not well-suited to these low-power operating environments.
[0003] FIG. 1 is a schematic block diagram of a basic Open GL rasterization
pipeline contained in a conventional 3D graphics engine. As shown, the rasterization
pipeline of this example includes a triangle setup stage 101, a pixel shading stage 102, a
texture mapping stage 103, a texture blending stage 104, a scissor test stage 105, an
alpha test stage 106, a stencil test stage 107, a hidden surface removal (HSR) stage 108,
an alpha blending stage 109, and a logical operations stage 110.
[0004] In 3D graphic systems, each object to be displayed is typically divided
into surface triangles defined by vertex information, although other primitive shapes can

WO 2006/102380 PCT/US2006/010340

be utilized. Also typically, the graphics pipeline is designed to process sequential batches of triangles of an object or image. The triangles of any given batch may visually overlap triangles of another batch, and it is also possible for triangles within a given batch to overlap one another.
[0005] Referring to FIG. 1, the triangle setup stage 101 "sets up" each triangle
by computing setup coefficients to be used in computations executed by later pipeline stages.
[0006] The pixel shading stage 102 uses the setup coefficients to compute which
pixels are encompassed by each triangle. Since the triangles may overlap one another, multiple pixels of differing depths may be located at the same point on a screen display. In particular, the pixel shading stage 101 interpolates color, fog, depth values, texture coordinates, alpha values, etc., for each pixel using the vertex information. Any of a variety of shading techniques can be adopted for this purpose, and shading operations can take place on per triangle or per pixel basis.
[0007] The texture mapping stage 103 and texture blending stage 104 function
to add and blend texture into each pixel of the process batch of triangles. Very generally, this is done by mapping pre-defined textures onto the pixels according to texture coordinates contained within the vertex information. As with shading, a variety of techniques may be adopted to achieve texturing. Also, a technique known as fog processing may be implemented as well.
[0008] The scissor test stage 105 functions to discard pixels contained in
portions (fragments) of triangles which fall outside the field of view of the displayed scene. Generally, this is done by determining whether pixels lie within a so-called scissor rectangle.

WO 2006/102380 PCT/US2006/010340

[0009] The alpha test unit 106 conditionally discards a fragment of a triangle
(more precisely, pixels contained in the fragment) based on a comparison between an
alpha value (transparency value) associated with the fragment and a reference alpha
value. Similarly, the stencil test conditionally discards fragments based on a
comparison between each fragments and a stored stencil value.
[0010] The HSR stage 108 (also called a depth test stage) discards pixels
contained in triangle fragments based on the depth values of other pixels having the same display location. Generally, this is done by comparing a z-axis value (depth value) of a pixel undergoing the depth test with a z-axis value stored in a corresponding location of a so-called z-buffer (or depth buffer). The tested pixel is discarded if the z-axis value thereof indicates that the pixel would be blocked from view by another pixel having its z-axis value stored in the z-buffer. On the other hand, the z-buffer value is overwritten with the z-axis value of the tested pixel in the case where the tested pixel would not be blocked from view. In the manner, underlying pixels which are blocked from view are discarded in favor of overlying pixels.
[0011] The alpha blending stage 109 combines rendered pixels with previously
stored pixels in a color buffer based on alpha values to achieve transparency of an object.
[0012] The logical operations unit 110 genetically denotes miscellaneous
remaining processes of the pipeline for ultimately obtaining pixel display data.
[0013] In any graphics system, it is desired to conserve processor and memory
bandwidth to the extent possible while maintaining satisfactory performance. This is especially true in the case of portable or hand-held devices where bandwidths may be limited. Also, as suggested previously, there is a particular demand in the industry to minimize power consumption and enhance bandwidth efficiency when processing 3D graphics for display on portable or hand-held devices.

WO 2006/102380 PCT/US2006/010340
-
SUMMARY OF THE INVENTION
[0014] According to one aspect of embodiments of the present disclosure, a
graphics processor is provided which includes a rasterization pipeline including a plurality of sequentially arranged processing stages which render display pixel data from input primitive object data. The processor further include a memory which stores data utilized by at least one of the processing stages of the rasterization pipeline, and a pre-fetch mechanism which retrieves the data utilized by the at least one processing stage with respect to a processed pixel in advance of the processed pixel being arriving at the at least one processing stage.
[0015] According to still another aspect of embodiments of the present
disclosure, a graphics processor is provided which includes a rasterization pipeline including a plurality of sequentially arranged processing stages which render display pixel data from input primitive object data, where the processing stages include a hidden surface removal (HSR) stage. The processor further includes a depth buffer which stores a depth value of a previously rendered pixel, a memory controller which retrieves the depth value of the previously rendered pixel, and a cache memory which is coupled to the HSR stage of the pipeline and which stores the depth value retrieved by the memory controller.
[0016] According to still another aspect of embodiments of the present
disclosure, a graphics processor is provided which includes a rasterization pipeline including a plurality of sequentially arranged processing stages which render display pixel data from input primitive object data, where the processing stages include a hidden surface removal (HSR) stage. The processor further includes a depth buffer which stores depth values of a two-dimensional block of pixels, a block address generator

WO 2006/102380 PCT/US2006/010340

which generates a block address of the two-dimensional block of pixels which includes a processed pixel, a cache memory coupled to the HSR stage of the rasterization processor, and a memory controller which is responsive to the block address to retrieve the depth values of the two-dimensional block of pixels from the depth buffer and stores the depth values in the cache memory.
[0017] According to still another aspect of embodiments of the present
disclosure, a graphics processor is provided which includes a rasterization pipeline
including a plurality of sequentially arranged processing stages which render display
pixel data from input primitive object data, and means for pre-fetching data from a main
memory and supplying the data to at least one of the processing stages in advance of a
pixel data arriving at the at least one processing stage through the rasterization pipeline.
[0018] According to still another aspect of embodiments of the present
disclosure, a graphics processor is provided which includes a rasterization pipeline including a plurality of sequentially arranged processing stages which render display pixel data from input primitive object data, where the processing stages include a hidden surface removal (HSR) stage. The processor further includes a hierarchical depth buffer which stores depth values of two-dimensional block of pixels, a random access memory which is coupled to the HSR stage and which stores a maximum depth value and a minimum depth value of the depth values of the two-dimensional block of pixels, a block address generator which generates a block address of the two-dimensional block of pixels which includes a processed pixel, a cache memory coupled to the HSR stage of the rasterization processor, and a memory controller which is responsive to the block address to retrieve the depth values of the two-dimensional block of pixels from the depth buffer and stores the depth values in the cache memory.

WO 2006/102380 PCT/US2006/010340
[0019] According to still another aspect of embodiments of the present
disclosure, a graphics processor is provided which includes a rasterization pipeline including a plurality of sequentially arranged processing stages which render display pixel data from input primitive object data, where the processing stages include a hidden surface removal (HSR) stage. The processor further includes a depth buffer including two-dimensional blocks of depth values data associated with the pixel data rendered by the rasterization pipeline, wherein the primitive object data is indicative of a primitive shape, and wherein the depth values data of a two-dimensional block is compressed in the case where the two-dimensional block is contained completely within the primitive shape containing a processed pixel.
[0020] According to still another aspect of embodiments of the present
disclosure, a graphics processing method is provided which includes supplying
primitive object data to a rasterization pipeline which includes a plurality of sequentially
arranged processing stages which render display pixel data from input primitive object
data, storing data utilized by at least one of the processing stages of the rasterization
pipeline in a memory, and pre-fetching from the memory the data utilized by the at
least one processing stage with respect to a processed pixel in advance of the processed
pixel being arriving at the at least one processing stage.
[0021] According to still another aspect of embodiments of the present
disclosure, a graphics processing method is provided which includes supplying primitive object data to a rasterization pipeline which includes a plurality of sequentially arranged processing stages which render display pixel data from input primitive object data, where the processing stages include a hidden surface removal (HSR) stage, and selectively compressing two-dimensional blocks of depth values data in a depth buffer. The primitive object data is indicative of a primitive shape, and the depth values data of

WO 2006/102380 PCT/US2006/010340
a two-dimensional block is compressed when the two-dimensional block is contained completely within the primitive shape containing a processed pixel.
BRIEF DESCRIPTION OF THE DRAWINGS
[0022] The above and other aspects of the disclosed embodiments will become
readily apparent from the detailed description that follows, with reference to the
accompanying drawings, in which:
[0023] FIG. 1 is a schematic block diagram of an example of a basic Open GL
rasterization pipeline contained in a 3D graphics engine;
[0024] FIG. 2 illustrates a simplified example of a circuit block configuration of
a graphics pipeline according to an embodiment of the present disclosure;
[0025] FIG. 3 is a view for explaining the predictive pre-fetching of pixel tiles
according to another embodiment of the present disclosure;
[0026] FIG. 4 illustrates a simplified example of a circuit block configuration of
a graphics pipeline according to another embodiment of the present disclosure;
[0027] FIG. 5 illustrates of block diagram of another embodiment of the present
disclosure in which the z-values of a tiles of pixels are predictively pre-fetched and
stored in a cache;
[0028] FIG. 6 illustrates of block diagram for explaining the operation of the
depth cache illustrated in FIG. 5; and
[0029] FIG. 7 is a view for explaining pixel tiles that are candidates for z-
compression according to an embodiment of the present disclosure.

WO 2006/102380



PCT/US2006/010340

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
[0030] Some embodiments herein are at least partially characterized by a 3D
graphics pipeline which includes a prefetch mechanism that feeds a cache of depth tiles.
The prefetch mechanism may be predictive, using triangle geometry information from
previous pipeline stages to pre-charge the cache, thereby allowing for an increase in
memory bandwidth efficiency.
[0031] Other embodiments are at least partially characterized by a z-value
compression technique which allows for a reduction in power consumption and memory
bandwidth.
[0032] Several preferred but non-limiting embodiments will now be described.
[0033] The triangle setup block of a 3D graphics pipeline may be preceded by
what is referred to herein as a command block. The command block contains all
relevant data as to each triangle, including pixel screen location information. According
to embodiments of the present disclosure, pixel screen location data is fed forward in the
pipeline and used by later pipeline stages to compute addresses of data needed for pixel
processing. By the time the pixels arrive at a given stage, the values associated with the
stage will already be in the cache, thus allowing for an improvement in bandwidth
efficiency.
[0034] FIG. 2 is a simplified block diagram illustrating an embodiment of the
present disclosure. A 3D graphics pipeline is depicted having a command block 200
and first through nth pipeline blocks 201a ... 20 In. At least one of the pipeline blocks is
operatively equipped with a cache memory 202a ... 202d. By forwarding address
information to the pipeline stage(s) 1,2, n-1 and/or n in advance, it becomes possible to

WO 2006/102380 PCT/US2006/010340

retrieve relevant data from a main memory in advance of the arrival of a processed pixel
to the pipeline stage(s). In this manner, memory throughput is increased.
[0035] Also, in an alternative embodiment, the pre-fetching mechanism is
accompanied by a predictive mechanism to further enhance memory efficiency. This is described later with reference to FIG. 3, which is directed to the example of predictive pre-fetching of z-values (depth values) from a depth buffer.
[0036] Three-dimensional (3D) rasterization pipelines utilize a "depth test" to
determine whether a newly processed pixel is obscured by a previously rendered pixel. The mechanism involves accessing a "depth buffer" (also called a "z-buffer") into which depth values (i.e., z values) are stored and checked during rasterization. Essentially any visible pixel's distance from the viewer is stored as a depth value in the depth buffer. Subsequently, another processed pixel may attempt to occupy the same position on the screen. The depth value of the previously rendered pixel (i.e., the depth valued stored in the depth buffer at the pixel position) is read and compared with the value of the newly processed pixel. If the comparison result indicates that the new pixel is closer to the viewer, then it is deemed to be visible, and the previous depth value of the depth buffer may be overwritten with the depth value of the new pixel. The new pixel is further processed by the pipeline, and eventually rendered in a frame buffer. On the other hand, if the comparison result indicates that the new pixel is farther from the viewer, then it is deemed to be invisible, and the new pixel may be discarded and the previous depth value of the depth buffer is maintained. This process is referred to herein as Hidden Surface Removal (HSR).
[0037] FIG. 3 illustrates an example of how a triangle strip might map onto z-
value pixel tiles. The triangles are labeled A through E, and appear on the pipeline in that order. The tiles are numbered 1-13. In order to process triangle A, tiles 1, 2, 3,4, 5

WO 2006/102380 PCT/US2006/010340
and 8 are needed. Accordingly, the z-values of the pixel tiles 1, 2, 3, 4, 5 and 8 are pre
fetched from the depth buffer and stored in cache memory. Next, in order to process
triangle B, pixel tiles 4, 5, 8 and 9 are needed. However, since pixel tiles 4,5 and 8 are
already stored in cache memory, it is only necessary to pre-fetch pixel tile 9 from the
depth buffer. Similarly with triangle C, only tile 6 must be pre-fetched. Memory
bandwidth efficiency is enhanced by predictively caching tiles in this manner.
[0038] FIG. 4 is block diagram of an example of 3D graphics pipeline which is
configured for pre-fetching of z-values to be utilized in a Hidden Surface Removal
(HSR) block of the pipeline. In the figure, the pipeline includes a command block 400,
a triangle setup block 401, a pixel shading block 402, an HSR block 403, a texture
mapping block 404 and a texture blending block 405. In addition, the HSR block 403 is
equipped with a depth cache 406 and has access to a depth buffer 407.
[0039] In operation, address information of depth pixel tiles is forwarded from
the command block 400 directly to the HSR block 403. The HSR block 403 is configured to pre-fetch depth values from the depth buffer 407 according to the address information, and to then store the depth values in the depth cache 406. As such, when the processed pixel arrives through the pipeline to the HSR block 403, the depth values of a previously rendered pixel may be rapidly retrieved from the cache 406 for HSR processing.
[0040] The predictive pre-fetching technique of depth buffer management of an
embodiment of present disclosure lends itself extremely well to the use of a so-called
hierarchical z-buffer, and example of which is described next.
[0041] FIGS. 5 and 6 are functional block diagrams illustrating another
embodiment of the present disclosure, in which FIG. 6 is a functional block diagram for explaining the operation of the depth test block 504 illustrated in FIG. 5.

WO 2006/102380 PCT/US2006/010340

[0042] Illustrated in FIG. 5 are a command engine 501, a triangle setup block
502, a pixel shading block 503, a depth test block 504 (containing a hierarchical z-
buffer, not shown), a memory system 505 (containing a depth buffer), and remaining
pipeline blocks 506.
[0043] In operation, triangle data from the command engine 501 is applied to
the triangle setup block 502. The triangle setup block outputs corresponding depth
coefficients, geometry data and attribute coefficients, which are all applied to the pixel
shading block 503. Then, the pixel attributes and pixel address are supplied by the pixel
shading block 503 to the depth test block 504, together with triangle bounding box data
from the command engine 501, and the depth coefficients from the triangle setup block
502. The depth test block 504 then executes a depth test with respect to the processed
pixel and depth values stored in cache memory (not shown). Preferably, the depth
values are predictively retrieved from the memory system 505 and stored in cache
memory in advance of actual execution of the depth test. The processed pixel is then
either discarded as a result of the depth test, or transmitted to the remaining pipeline
block 506 in the form of the pixel address and the pixel attributes.
[0044] As already mentioned, FIG. 6 is a functional block diagram for
explaining the operation of the depth test block 504 illustrated in FIG. 5. As shown in
FIG. 6, the depth test block of this example generally includes a tile index predictor 601,
a tile index generator 602, a depth interpolator 603, a tile test block 604, a pixel test
block 607, an attribute buffer 608, and a depth cache 609.
[0045] The attribute buffer 608 is used to store the pixel attributes of incoming
pixels as they travel down the pipeline. The depth block is a pipeline, and the attribute
buffer 608 matches the pipeline. As will be explained below, the discard_pixel signals
are effectively erase or clear signals for pixels flowing through the pipeline 621.

WO 2006/102380 PCT/US2006/010340

[0046] The tile index predictor 601 utilizes bounding box information
bounding-box to predictively generate a series of tile indexes indicative of tiles occupied by the processed triangle. As discussed previously in connection with FIG. 3, memory bandwidth efficiency is enhanced by predictive caching of tiles with respect to the processed triangles. Prefetch logic 610 utilizes the tile indexes from the tile index predictor 601 to control a cache read block 612 of the depth cache 609. The operation of the cache read block 612 will be explained later. However, the prefetch logic block 610 makes early tile requests of the cache read block 612 such that the later requested pixels from pixel test block (explained below) are more likely to be present in the cache RAM.
[0047] The tile index generator 602 generates a tile index signal tile_index_in
from the incoming pixel address pixel-addressjn. Note that since the same tile index
would have been predicted earlier by the tile index predictor 601, logic may be shared
between the tile index predictor 601 and tile index generator 602.
[0048] The depth interpolator 603 uses depth coefficients z_coefficients and
bounding box information boundingjbox to actually rasterize the depth value z-in for an incoming pixel address pixeljxddressjn . It is also possible to include the depth interpolator 603 as part of the shading block (see FIG. 5). However, in this example, the depth interpolator 603 has been implemented in the depth test block because the same interpolator may be used to decompress z in the event that only the coefficients are stored for any given tile. In this regard, note that the depth interpolator also appears inside the depth cache block 609.
[0049] The tile test block 604 is essentially a hierarchical z test block and is
configured with a limit table 605 and a visibility check block 606. The limit table 605 contains the maximum far depth value (z-value) z_maxjhr and minimum near depth

WO 2006/102380 PCT/US2006/010340
value (z-value) z_min_near for each screen tile. The tile_index from the tile index
generator 602 is utilized as an address into limit table 605, and as a result, the limit table
605 produces the minimum depth value z-min-near and the maximum depth value
z-min-far for the tile containing the processed pixel. The tile's minimum depth value
Z-min-near and maximum depth value z-min-far are then applied with z-in to the
visibility check block 606. The visibility check block 606 compares z-in with
z-min-near and z-min-far, with the comparison result having three possible outcomes,
namely, z-in is farther than z-min-far for the tile, z-in is nearer than z-min-near for
the tile, or z-in is nearer than z-min-far but farther than z-min-near for the tile.
[0050] In the case where z-in is farther than z-min-far for the tile, the pixel is
discarded by operation of the discard-pixel signal to the attribute buffer 608.
[0051] In the case where z-in is nearer than z-min-near for the tile, the pixel is
visible and must be updated by enablement of the update-pixel signal and transmission of the signals designated in FIG. 6 as update-pixel tile index, update-pixel-address, update-pixel-z, and update-pixel-z-coefficients to the cache write block 617. The cache write block 617 includes cache tag management. When a pixel is updated, cache write block 617 functions to update the cache RAM 619 and maintain data coherency with the external memory system 620. Also, when a tile is stored back into cache RAM 619 or external memory system 620, the cache write block 617 streams the depth information pixel-z of the tile to the limit generator 618.
[0052] The limit generator 618 computes the z-min-far and z-min-near of a
tile as it is being stored into the memory system 620. Then, the update_tile signal is enabled, and the signals update_tile_index, the z-min-far and z-min-near are transmitted to the tile test block 604 so as to update the limit table 605.

WO 2006/102380 PCT/US2006/010340

[0053] As mentioned previously, the cache write block 617 receives the signals
upda.te_jpvxeljilej.ndex, update_pixel_address, update_pixel_z, and update_pixel_z_coefficients. The update_pixel_tile_index signal is essentially the cache block index (or cache line index). The update_pixel_address is a cache address utilized to address an individual pixel. The update_pixel_z is the individual depth value (z-value) for the individual pixel. The update_pixel_z_coefficients signal contains coefficients used as part of a z-compression technique. That is, the compression table 611 of the depth cache 609 keeps track of which tiles have only their coefficients stored. When such a tile is encountered by the cache read block 612, the coefficients are read from cache RAM 619 and then run through the depth interpolator 616 to recover the individual depth values.
[0054] In the case where z-in is nearer than z_max_far but farther than
z_min_near, the pixel is between the minimum and maximum of the tile. As such, an individual pixel test is executed by enablement of the pixelJest_enable signal. In response, the signals requestjpixel, request_pixeljilejndex and request_pixel_address are sent to the depth cache 609 by the pixel test block 607 to request the depth value of a previously processed pixel. The request_pixel signal is essentially a cache read command, and the request_pixeljileJudex and request_pixel_address are tile and pixel addresses, respectively. In response to these signals, the cache read block 612 retrieves the requested z-value of the previously processed pixel from the cache RAM 619 via the memory interface 613. The cache read block 612 includes cache tag checking and management. The requested z-value is supplied as the request_pixel_z signal to the pixel test block 607 which then determines whether the processed pixel is visible. If the pixel is determined to be not visible, then the discard_pixel signal is enabled as described previously with respect to the tile test block 604. If the pixel is determined to

WO 2006/102380 PCT/US2006/010340

be visible, then the update _pixel signal is enabled, and the update_jpixel_tile_index,
update_pixel_addr-ess, update_pixel_z, and update_pixel_z_coefficients signals are
utilized in the same manner as described previously in connection with the tile test
block 604.
[0055] It is noted that another level of hierarchical z-buffer can be implemented
in which complete triangles are discarded based on maximum and minimum values for a
tile, if the triangle is completely within the tile.
[0056] The embodiment of FIG. 5 and 6 utilizes a tile mode of operation in
which depth values of pixel tiles are stored and retrieved from the depth buffer. To
further enhance bandwidth efficiency, it may be desirable to compress the data
representative of the pixel tiles. One such z-compression technique according to an
embodiment of the present disclosure is described below.
[0057] In the description of this embodiment, it is assumed that the depth buffer
is divided into a tile mode (4x4 pixels, e.g.,) and triangles are rendered in tile mode.
[0058] Early in the pipeline process, depth values of the pixels of each triangle
are computed from vertex information associated with the triangle. Typically, a linear
interpolation is utilized for this purpose.
[0059] As such, if a tile corresponds to a place in the z-buffer that was updated
by rendering a triangle, then the depth values in the tile can be represented as a linear
function:
Z(x,y) = Atx + Bty + Cz
[0060] Here, x and y denote the horizontal and vertical coordinates of each pixel
within the 4 x 4 tile. By giving the depth value of the upper-left pixel of the tile the

WO 2006/102380 PCT/US2006/010340

value of (Z00), Az and Bz, the remaining pixels of the tile can be obtained by interpolating the following equation:

[0061] Thus, if a tile is compressible, instead of updating all its 16 pixels' depth
values to the depth buffer, it is only necessary to update Z00 , Az and Bz. This is just
3/16 of a regular tile's information, assuming Az and Bz has the same data precision as
Z00. When the same compressed tile is read back from z-buffer, it is only necessary to
read Zm, A, and Bz and to execute a decompress function based on the above formula
to obtain the depth values of the entire tile.
[0062] A tile can be compressed only if it is fully contained in a triangle, as
illustrated in FIG. 7. As shown, tile A is compressible, while tiles B and C are not since
they cross over a triangle boundary. To determine whether a tile falls completely within
a triangle, is usually sufficient to examine whether all the four comer pixels of the tile
are inside the triangle.
[0063] Since not every tile is compressible, an on-chip memory may be utilized
to store an array of flags (1-bit per tile) that could indicate if a particular tile block is
compressed in the depth buffer. When a tile is read from the depth buffer, its
corresponding compression flag is examined to determine whether decompression of the
data is needed. When a tile is being updated to the depth buffer, if it is compressible,
the compressed data is written to the depth buffer and the corresponding compression
flag is set.
[0064] In the drawings and specification, there have been disclosed typical
preferred embodiments and, although specific examples are set forth, they are used in a

WO 2006/102380 PCT/US2006/010340
generic and descriptive sense only and not for purposes of limitation. It should therefore be understood the scope of the present disclosure is to be construed by the appended claims, and not by the exemplary embodiments.

WO 2006/102380 PCT/US2006/010340
WHAT IS CLAIMED IS:
1. A graphics processor, comprising:
a rasterization pipeline comprising a plurality of sequentially arranged processing stages which render display pixel data from input primitive object data;
a memory which stores data utilized by at least one of the processing stages of the rasterization pipeline; and
a pre-fetch mechanism which retrieves the data utilized by the at least one processing stage with respect to a processed pixel in advance of the processed pixel arriving at the at least one processing stage.
2. The graphics processor of claim 1, wherein the retrieved data is stored in a cache memory of the at least one of the processing stages of the rasterization pipeline.
3. A graphics processor, comprising:
a rasterization pipeline comprising a plurality of sequentially arranged processing stages which render display pixel data from input primitive object data, wherein the processing stages include a hidden surface removal (HSR) stage;
a depth buffer which stores data utilized by HSR stage of the rasterization pipeline; and
a pre-fetch mechanism which retrieves the data utilized by the HSR stage from the depth buffer with respect to a processed pixel in advance of the processed pixel arriving at the HSR stage through the rasterization pipeline.
4. The graphics processor of claim 3, wherein the retrieved data is stored in a
cache memory of the HSR stage of the rasterization pipeline.

WO 2006/102380 PCT/US2006/010340

5. A graphic processor, comprising:
a rasterization pipeline comprising a plurality of sequentially arranged processing stages which render display pixel data from input primitive object data, wherein processing stages include a hidden surface removal (HSR) stage;
a depth buffer which stores depth values of a two-dimensional tile of pixels;
a tile address generator which generates a tile address of the two-dimensional tile of pixels which includes a processed pixel;
a cache memory coupled to the HSR stage of the rasterization processor;
a memory controller which is responsive to the tile address to retrieve the depth values of the two-dimensional tile of pixels from the depth buffer and to store the depth values in the cache memory.
6. The graphics processor of claim 5, wherein the depth buffer is a hierarchical depth buffer.
7. A graphics processor, comprising:
a rasterization pipeline comprising a plurality of sequentially arranged processing stages which render display pixel data from input primitive object data; and
means for pre-fetching data from a main memory and supplying the data to at least one of the processing stages in advance of a pixel data arriving at the at least one processing stage through the rasterization pipeline.
8. The graphics processor as claimed in claim 7, wherein the at least one
processing stage is a hidden surface removal (HSR) stage.

WO 2006/102380 PCT/US2006/010340

9. The graphics process of claim 8, wherein said means comprises a cache
memory which stores the data form the main memory and which is coupled to the HSR
stage.
10. A graphic processor, comprising:
a rasterization pipeline comprising a plurality of sequentially arranged processing stages which render display pixel data from input primitive object data, wherein processing stages include a hidden surface removal (HSR) stage;
a hierarchical depth buffer which stores depth values of two-dimensional tiles of pixels;
a random access cache memory which is coupled to the HSR stage and which stores a maximum depth value and a minimum depth value of the depth values of the two-dimensional tile of pixels;
a tile address generator which generates a tile address of the two-dimensional tile of pixels which includes a processed pixel;
a memory controller which is responsive to the tile address to retrieve the depth values of the two-dimensional tile of pixels from the depth buffer and stores the depth values in the cache memory.
11. The graphics processor of claim 10, further comprising a tile test block
which compares a depth value of a processed pixel with minimum and maximum depth
values of a tile containing the processed pixel.

WO 2006/102380 PCT/US2006/010340
12. The graphics processor of claim 11, wherein the tile test block is operative to discard the processed pixel in the case where the depth value of the processed pixel is less than the minimum depth value of the tile containing the process pixel.
13. The graphics processor of claim 11, wherein the tile test block is operative to update the cache memory in the case where the depth value of the processed pixel is greater than the maximum depth value of the tile containing the processed pixel.
14. The graphics processor of claim 13, further comprising a pixel test block which compares the depth value of the processed pixel with a previously stored depth value stored in the cache memory.
15. The graphics processor of claim 14, wherein the tile test block is operative to enable the pixel test block in the case where the depth value of the processed pixel is between the minimum and maximum depth values of the tile containing the process pixel.
16. The graphics process of claim 10, further comprising a tile index predictor block which generates tile information based on primitive object data associated with the processed pixel, and a prefetch logic block which retrieves depth values of tiles based on the tile information generated by the tile index predictor block.

WO 2006/102380



PCT/US2006/010340

17. A graphics processor, comprising:
a rasterization pipeline comprising a plurality of sequentially arranged processing stages which render display pixel data from input primitive object data, wherein the processing stages include a hidden surface removal (HSR) stage;
a depth buffer comprising two-dimensional tiles of depth values data associated with the pixel data rendered by the rasterization pipeline, wherein the primitive object data is indicative of a primitive shape, and wherein the depth values data of a two-dimensional tile are compressed in the case where the two-dimensional tile is contained completely within the primitive shape containing a processed pixel.
18. The graphics processor of claim 17, wherein the primitive shape is a triangle.
19. The graphics processor of claim 18, wherein the two-dimensional tile is a 4 x 4 tile of pixels.
20. The graphics processor of claim 17, wherein the depth values data is compressed by storing coefficients of an equation describing relative values of the depth values of the two-dimensional tile.
21. The graphics processor of claim 20, wherein the equation is a linear equation.

WO 2006/102380



PCT/US2006/010340

22. A graphics processing method, comprising:
supplying primitive object data to a rasterization pipeline which includes a plurality of sequentially arranged processing stages which render display pixel data from input primitive object data;
storing data utilized by at least one of the processing stages of the rasterization pipeline in a memory; and
pre-fetching from the memory the data utilized by the at least one processing stage with respect to a processed pixel in advance of the processed pixel being arriving at the at least one processing stage.
23. The method of claim 22, further comprising storing the retrieved data in a cache memory of the at least one of the processing stages of the rasterization pipeline.
24. The method of claim 23, wherein the at least one processing stage is a hidden surface removal (HSR) stage.
25. The graphics processor of claim 24, further comprising executing a tile test which compares a depth value of a processed pixel with minimum and maximum depth values of a two-dimensional tile containing the processed pixel.
26. The graphics processor of claim 25, wherein the tile test includes updating the cache memory in the case where the depth value of the processed pixel is greater than the maximum depth value of the tile containing the processed pixel.

WO 2006/102380 PCT/US2006/010340

27. The graphics processor of claim 26, further comprising selectively executing a pixel test which compares the depth value of the processed pixel with a previously stored depth value stored in the cache memory.
28. The graphics processor of claim 27, wherein the tile test includes enabling the pixel test in the case where the depth value of the processed pixel is between the minimum and maximum depth values of the tile containing the process pixel.
29. The graphics process of claim 22, further comprising generating tile information based on primitive object data associated with the process pixel, and a prefetching depth values of tiles based on the tile information.
30. A graphics processing method, comprising:
supplying primitive object data to a rasterization pipeline which includes a plurality of sequentially arranged processing stages which render display pixel data from input primitive object data, wherein the processing stages include a hidden surface removal (HSR) stage;
selectively compressing two-dimensional tiles of depth values data in a depth buffer, wherein the primitive object data is indicative of a primitive shape, and wherein the depth values data of a two-dimensional tile is compressed when the two-dimensional tile is contained completely within the primitive shape containing a processed pixel.



ABSTRACT
TILED PREFETCHED AND CACHED DEPTH BUFFER
A 3D graphics pipeline includes a prefetch mechanism that feeds a cache of depth tiles. The prefetch mechanism may be predictive, using triangle geometry information from previous pipeline stages to pre-charge the cache, thereby allowing for an increase in memory bandwidth efficiency. A z-value compression technique may be optionally utilized to allow for a further reduction in power consumption and memory bandwidth.

Documents:

1579-MUMNP-2007-ABSTRACT(1-10-2007).pdf

1579-MUMNP-2007-ABSTRACT(4-2-2011).pdf

1579-MUMNP-2007-ABSTRACT(GRANTED)-(28-3-2011).pdf

1579-mumnp-2007-abstract.doc

1579-mumnp-2007-abstract.pdf

1579-MUMNP-2007-CANCELLED PAGES(4-2-2011).pdf

1579-MUMNP-2007-CLAIMS(AMENDED)-(4-2-2011).pdf

1579-MUMNP-2007-CLAIMS(GRANTED)-(28-3-2011).pdf

1579-mumnp-2007-claims.doc

1579-mumnp-2007-claims.pdf

1579-mumnp-2007-correspondence(10-4-2008).pdf

1579-MUMNP-2007-CORRESPONDENCE(19-3-2012).pdf

1579-MUMNP-2007-CORRESPONDENCE(22-3-2011).pdf

1579-MUMNP-2007-CORRESPONDENCE(28-3-2011).pdf

1579-MUMNP-2007-CORRESPONDENCE(IPO)-(29-3-2011).pdf

1579-mumnp-2007-correspondence-others.pdf

1579-mumnp-2007-correspondence-received.pdf

1579-mumnp-2007-description (complete).pdf

1579-MUMNP-2007-DESCRIPTION(GRANTED)-(28-3-2011).pdf

1579-MUMNP-2007-DRAWING(4-2-2011).pdf

1579-MUMNP-2007-DRAWING(GRANTED)-(28-3-2011).pdf

1579-mumnp-2007-drawings.pdf

1579-MUMNP-2007-FORM 1(1-10-2007).pdf

1579-MUMNP-2007-FORM 1(4-2-2011).pdf

1579-MUMNP-2007-FORM 2(GRANTED)-(28-3-2011).pdf

1579-MUMNP-2007-FORM 2(TITLE PAGE)-(1-10-2007).pdf

1579-MUMNP-2007-FORM 2(TITLE PAGE)-(4-2-2011).pdf

1579-MUMNP-2007-FORM 2(TITLE PAGE)-(GRANTED)-(28-3-2011).pdf

1579-MUMNP-2007-FORM 26(19-3-2012).pdf

1579-MUMNP-2007-FORM 3(1-10-2007).pdf

1579-mumnp-2007-form 3(10-4-2008).pdf

1579-MUMNP-2007-FORM 3(4-2-2011).pdf

1579-MUMNP-2007-FORM 3(4-4-2008).pdf

1579-mumnp-2007-form-1.pdf

1579-mumnp-2007-form-18.pdf

1579-mumnp-2007-form-2.doc

1579-mumnp-2007-form-2.pdf

1579-mumnp-2007-form-26.pdf

1579-mumnp-2007-form-3.pdf

1579-mumnp-2007-form-5.pdf

1579-mumnp-2007-form-pct-ib-304.pdf

1579-MUMNP-2007-OTHER DOCUMENT(4-2-2011).pdf

1579-mumnp-2007-pct-search report.pdf

1579-MUMNP-2007-PETITION UNDER RULE 137(4-2-2011)-.pdf

1579-MUMNP-2007-PETITION UNDER RULE 137(4-2-2011).pdf

1579-MUMNP-2007-REPLY TO EXAMINATION REPORT(4-2-2011).pdf

1579-MUMNP-2007-WO INTERNATIONAL PUBLICATION REPORT(1-10-2007).pdf

abstract1.jpg


Patent Number 247058
Indian Patent Application Number 1579/MUMNP/2007
PG Journal Number 13/2011
Publication Date 01-Apr-2011
Grant Date 28-Mar-2011
Date of Filing 01-Oct-2007
Name of Patentee QUALCOMM INCORPORATED
Applicant Address 5775 MOREHOUSE DRIVE, SAN DIEGO, CALIFORNIA 92121-1714,
Inventors:
# Inventor's Name Inventor's Address
1 DHAWAN RAJAT RAJINDERKUMAR 9928 KIKA COURT, #2818, SAN DIEGO, CA 92129
2 ANDERSON MICHAEL HUGH 1091 HYMETTUS AVENUE, LEUCADIA, CA 92024
3 CHUANG DAN MINGLUN 13113 SIERRA MESA ROAD, SAN DIEGO, CA 92129
4 SHIPPEE GEOFFREY 2035 AVILA COURT, LA JOLLA, CA 92037
PCT International Classification Number G06T15/00
PCT International Application Number PCT/US2006/010340
PCT International Filing date 2006-03-21
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 11/086,474 2005-03-21 U.S.A.