Title of Invention

METHOD OF ENCODING PICTURES OF AN UNCOMPRESSED DIGITAL VIDEO SEQUENCE, METHOD OF TRANSCODING PICTURES OF A COMPRESSED DIGITAL VIDEO SEQUENCE AND APPARATUSES THEREFOR .

Abstract A method of encoding pictures of an uncompressed digital video sequence includes analyzing (12 or 14) a picture of the uncompressed digital video sequence with a first algorithm (MPEG2) to measure a first value of the picture's complexity (X12, Xp2 or Xb2) and to assign the first picture's picture type, and estimating (18) a second value of the picture's complexity (X14, XP4 or XB4) using the first measured value as a parameter. The picture is then encoded (24) with a distinct second algorithm (H.264) employing the second value of the first picture's complexity and the first picture's picture type as parameters.
Full Text METHOD OF ENCODING PICTURES OF AN UNCOMPRESSED DIGITAL VIDEO
SEQUENCE, METHOD OF TRANSCODING PICTURES OF A COMPRESSED DIGITAL
VIDEO SEQUENCE AND APPARATUSSES THEREFOR
BACKGROUND OF THE INVENTION
[0001] The present invention relates to compression coding of
video signals, and more particularly to rate control with a picture-
bass i lookahead window for dual-pass compression encoding /
transcoding, and a method of encoding pictures of an uncompressed
digital viceo sequence, a method of transcoding pictures of a
compressed cigital video sequence and apparatusses for the same.
[0002] '"wo of the international standards for digital video
compression are known as MPEG2 (H.262) and H.264 (MPEG 4 part 10).
There are several other standards to which the invention might be
applied, such as H.261, MPEGl and H.263, but the following
description of an embodiment of the invention refers mainly to MPEG2
and F. 264 so we will not discuss the other standards.
[0003] A: uncompressed video stream can be described as a
consecutive series of pictures, or frames. An individual frame
describes a particular setting at a particular instant in time. A
scene is a series of frames describing the same setting at
consecutive moments in time. The second frame of a scene shows the
same setting as the first frame, slightly further in the future. MPEG
standards take advantage of that repetition of information using what
is known as temporal encoding. In accordance with an MPEG video
compression standard, such as MPEG2, the encoder divides a video
stream into sets of related pictures, known as groups of pictures
(GOPs) . Each frame within a GOP is labeled by the encoder as an
intrafrome, a predicted frame or a bi-directional frame. An
intrafreme (I type frame) is encoded using only information from
within that frame. No temporal encoding is used to compress
the frame. A predicted (P type) frame is encoded using
information from within the frame and uses an earlier I
frane or P frame as a reference for temporal compression. I
and P frames are referred to as anchor frames. A bidirectional
(B type) frame is encoded using information
from within the frame and can also use information from at
least one earlier anchor frame and at least one later anchor
frame. Within a GOP, the I frame is generally the most
complex, followed by the P frames, and the B frames typically
have the least amount complexity.
[0004] In one conventional implementation of MPEG2, each
GO3 {which is referred to herein as a standard GOP) has a
period (N) of 15 pictures, or frames, and includes only one I
type picture, which is the first picture in the GOP. The
fourth, seventh, tenth and thirteenth pictures are P type
pictures and the remaining ten pictures are B type pictures.
Thus, each standard GOP is composed of 5 sub-groups of 3
pictures each. Each sub-group is made up of an anchor picture
and two B type pictures. The spacing between anchor pictures,
in this case 3, is known as the intra-period (M) of the GOP.
Thus a standard GOP, in display order, will appear as:
IBBPBBPBBPBBPBB
[0005] In this conventional implementation of MPEG2, the
standard GOP is closed, i.e.. it does not make any
predictions based on frames outside of the GOP.
[0006] A video stream is further sub-divided in MPEG
standards by defining a frame as being made up of a series of
macro-blocks (MBs) . A macro-block contains all the
information required to display an area of the picture
representing 16x16 luminance pixels.
[0C07] MPEG2 and H.264 specify the syntax of a valid bit
sti earn and the ways in which a decoder must interpret such a
bit stream to arrive at the intended output, the output being
uncompressed digital video. The MPEG standards do not,
however, specify an encoder. An encoder is defined as any
device, hardware or software, capable of outputting a bit
stream that will produce the desired output when input to an
MPEG compliant decoder.
[0008] In a typical application of an encoder, an
uncompressed video signal is input to the encoder and encoded
in accordance with the applicable compression standard and
the newly encoded signal is output from the encoder, received
by a decoder and decoded for viewing. In order to accommodate
variation in the rate at which data is received by the
decoder, the decoder may include a buffer that receives the
encoded data and provides data to the decoding process. The
encoder must ensure that the encoded signal is being output
at such a rate that the decoder may be continuously decoding
the encoded signal and transmitting the decoded signal. If
the encoder transmits the signal too slowly, there will be
gaps in the data transmitted by the decoder as the decoder
waits for data from the encoder. If the encoder transmits the
signal too quickly, the decoder may not be able to keep up,
causing buffer overflow at the decoder and unacceptable
information loss. The process of managing an encoder's rate
of transmission is known as rate control. The encoder keeps
track of decoder buffer fullness by use of a virtual buffer.
[0009] A simplistic method of rate control would be to
assign a set number of bits to each picture, or frame, of the
video signal to be encoded. However this is not efficient, as
the number of bits used to encode every picture in the video
stream must then be large enough no accommodate the most
complex frame possible, when in fact a simple frame, such as
a picture of a blue sky, will require fewer bits to encode
than a complex frame (a picture of cloudy sunrise at the
horizon} . A measure of a picture's complexity, prior to the
picture being encoded, allows the encoder to make a hotter
decision, regarding how many bits should be used to encode the
picture. This method can be further improved if the encoder
has knowledge of the complexity of frames that it will be
encoding in the future. Consider a video stream that begins
showing a clear blue sky, and then pans down to show a
sunset. The initial frames will have a low measure of
complexity and thus will be encoded using a relatively small
number of bits. The later frames, however, contain much more
complex information and will require a larger number of bits
to encode. If, while deciding how many bits to allocate to
the initial, simple, video frames, the encoder is made aware
that it. will soon need to allocate a large number of bits to
encoding more complex frames, the encoder may further reduce
the number of bits used to encode the simple frames and avoid
or redu:e the risk of overflowing a downstream decoder.
[0010] Another effective method of controlling the number
of bits used to encode a picture is by modifying the
quantization step size dynamically for each MB within the
frame. For an MB of generally uniform color and intensity
only a small number of possible pixel values are necessary
and thus less bits will be required to describe it. The
inverse is true for an MB containing a wide variety of color
and intensity values since the encoder will have to describe
wider range of pixel values. In accordance with this approach
each ME is assigned a quantization scale factor (Mquant) that
is used, to modify quantization step size.
[0011] During the development of the MPEG2 standard it
became necessary to design a generic rate control and
quantisation methodology with which to test bit-stream
syntax, decoder designs and other aspects of the standard.
This methodology was known as the Test Model and it evolved
as the development of MPEG2 continued. The fifth and final
version of the model (TM-5) was produced with the freezing of
the MPi'132 standard. TM-5 is divided into three primairy steps:
(a) target bit allocation; (b) rate control; and (c) adaptive
quantisation.
a) Target Bit Allocation
[0012] As discussed above, the amount of bits allocated to
a certain picture for encoding is a func-ion of that
picture's complexity, relative to other pictures. For a
particular GOP, a complexity weight factor is assigned for
each picture type (X12, XP2 and XB2 for I, P and B type
pictures respectively). X12. XP2 and XB2 represent the
complexity measure for I, P, B pictures and may be calculated
as:
X12 = SI2QI2
Xp2 = Sp2Qp2
XB2 = Sb2Qb2
where Si2, SP2 and SB2 are the number of bits for each picture
and Q13, Qp2 and QB2 are the average quantization parameters
for a]1 MBs in each picture (see below).
[0013] In TM-5, bit allocation for a picture is targeted
based on how much of the bit space allocated for the GOP is
remaining, the type of picture being encoded, and the
complexity statistics of recently encoded pictures of the
same type. The target bit allocation is the number of bits
TM-5 anticipates will be necessary to encode the frame.
b) Rate Control
[0014] If there is a difference between the target bit
allocation (Btar) and the actual number of bits required to
encode a picture (Bact) , than there is a risk of under or over
filling TM-5's virtual buffer. The virtual buffer tracks the
fullness of the decoder's buffer on an MB by MB basis while a
picture is being encoded. Encoding all the MBs up to, but not
including, the jth MB should use a certain portion of the
total targeted bits. This portion is equal to Btar multiplied
by this number of MBs already encoded (j-1) and divided by the
total number of MBs in the picture (MB_cnt) . The number of
bits actually generated by encoding up to, but not including,
the jth MB is equal to B(j-t). The delta between the targeted
and generated number of bits represents a change in the
fullness of the virtual buffer after each MB is encoded (dj)
and is calculated prior to encoding the jth MB.
dj = d0 + B(j-1) - Btar* (j-1)/MB_cnt
where d0 equals the fullness of the virtual buffer at the
beginning of the current picture.
[0015] If the virtual buffer begins to overflow, the
quantization step size increases, leading to a smaller yield
of bizs for the subsequent MBs. Similarly, if the virtual
buffer begins to underfill, the quantization step size is
decreased, leading to a larger yield of bits for subsequent
MBs. This measure of the virtual buffer fullness is used to
generate the MB's reference quantization number (Qj) .
c) Adaptive Quantization
[0016] The macro-block quantization step size is further
modulated as a function of spatial activity (actj). The
macro-block is divided into four 8x8 sub-blocks and the
spatill activity is measured for each sub-block. The smallest
of the four measurements is then normalized (N_actj) against
the average spatial activity (avg_act) of the previously
encoded picture. The minimum spatial activity measurement is
used because the quality of a macro-block is no better than
its sub-block of highest visible distortion.
N_actj = (2*actj + avg_act)/(actj + 2*avg_act)
[0017] The product of a MB's normalized spatial activity
and its reference quantization parameter gives the MB's
quantization scale factor (Mquantj)'.
Mquantj = Qj*N_actj
Generally, the encoding algorithms recommended by newer
compression standards are more efficient, but they are
usual_y more complicated to implement. With the fast growth
in computation speeds of central processing units (CPUs) and
digital signal processing (DSP) chips, implementation of more
and more sophisticated algorithms has become practically
feasiole. Video encoders/decoders (codecs) built based on
newer standards eventually replace those built based on older
standards in applications where specifications overlap, such
as for bit-rate, resolution, etc. This replacement procedure
takes a long period of time, since it is expensive to replace
older video codecs with newer ones. Another reason older
codecs continue to be used is that many video streams have
already been compressed with the older algorithms, and may
easily be decompressed by the older codecs. However where
high coding efficiency is desired, there arises the mixed use
of bcth older and newer codecs. In some applications it is
desirable to re-transmit video streams compressed with an
olden codec at a new bit-rate that is lower than the older
codec can achieve for the same video quality. Therefore to
obtain higher compression efficiency a transcoder having
mixec codecs (an older decoder and a newer encoder) is used.
One good example is a transcoder that converts MPEG2
compressed video streams to H.264 compressed video streams.
[0018] It is recognized by the digital compression
industry that dual-pass encoding with a iookahead window
provides higher coding efficiency than single-pass encoding.
For the emerging, more sophisticated, compression
technologies, even single-pass encoding is expensive and the
cost of dual-pass encoding is much higher than that of
single-pass encoding. Using two sophisticated codecs for
encoding/transcoding in a dual-pass architecture raises the
cost of the encoder/transcoder by almost an order of
magnitude over the older technology codecs.
In Klein Gunnewiek (U.S. Pat. No. 5, 757, 434), a picture's
complexity is measured by an encoder / preanalyzer 8 using, for
example, the MPEG 2 video coding standard. The measured complexity is
then used oy an encoder 2 to encode the picture, also using the MPEG
2 standard. The measured complexity is manipulated, but there is no
description of using the measured complexity value to estimate a
second complexity value or of the encoder 2 using a different
encoding algorithm than the encoder 8.
Keesmin (U.S. Pat. No. 5, 805, 224) describes using a decoding
suk-assembly 201 to decode an encoded picture using a conventional
MPE3 decoder (Col. 5, line 34 and col. 6, line 59) and to measure the
decided picture's complexity and then the measured complexity is used
in encoding the decoded picture using an MPEG based (Col. 9, line 41)
encoding sub-assembly 202. As in Klein Gunnewiek, the measured
complexity is manipulated, but there is no description of using the
measured value to estimate a second value of the decoded picture's
complexity or of the encoding sub-assembly 202 using a different
encoding algorithm than the decoding sub-assembly 201.
[0019] What is desired is the ability to achieve higher coding
efficiency for minimal cost in an encoder / transcoder using mixed
codecs.
BRIEF SUMMARY OF THE INVENTION
[0020] In accordance with a first aspect of the invention there
is prcvided a method of encoding pictures of an uncompressed digital
video sequence, each picture having a level of complexity, said
method compri sing : analyzing a first picture of the uncompressed
digital, video sequence with a first algorithm to measure a first
value of the first picture's complexity and to assign the first
picture's picture type, estimating a second value of the first
picture's complexity using the first measured value as a parameter,
and encoding the first picture with a distinct second algorithm
employing the second value of the first picture's complexity and the
first picture's picture type as parameters.
[0021] In accordance with a second aspect of the invention there
is a method of transcoding pictures of a compressed digital video
stream each picture being encoded according to a first encoding
algori:hm and having a level of complexity, said method comprising :
decoding a first picture of the compressed digital video stream with
a first decocing algorithm to produce a decoded version of the first
picture and to measure a first value of the first picture's
complexcity and to determine the first picture's picture type,
estimating a second value of the first picture's complexity using the
first value as a parameter, and encoding the decoded version of the
first picture with a distinct second encoding algorithm employing the
second value of the first picture's complexity and the first
picture's pic ;ure type as parameters.
[0022] In accordance with a third aspect of the invention there
is an apparatus for encoding an uncompressed digital video input
sequence composed of a succession of pictures, each picture having a
plurality or characteristics associated therewith, said apparatus
compr.sing : an extraction means for receiving the succession of
pictures of the uncompressed digital video input sequence and
employing a first method to obtain measured values for the plurality
of characteristics of a picture of the input sequence and to assign a
picture type to the picture, a delay means for receiving the
succession of pictures of the input sequence and outputting the
pictures in delayed fashion relative to the pictures of the input
sequence, a value storage means for storing the measured values and
the p cture type of a picture in the delay means, and a first
encoding means for receiving a picture from the delay means and
encoding the picture, the first encoding means being responsive to a
measured value stored in the value storage means for adjusting the
size c f the encoded version of the picture, wherein the first
encoding means1 encodes the picture by a first encoding method and the
extrac- ion means includes a second encoding means for encoding the
pictures by a second encoding method, distinct from the first
encoding method.
[0023] In accordance with a fourth aspect of the invention there
is an apparatus for transcoding a compressed digital video input
stream composed of a succession of encoded pictures, each encoded
picture having a plurality of characteristics associated therewith,
said apparatus comprising : a decoding means for receiving the
succession of encoded pictures of the compressed digital video input
stream and emoloying a first method having a corresponding first
encoding method to obtain a succession of decoded pictures and
measure 1 values for the plurality of characteristics of a decoded
picture and to assign a picture type to each of the decoded
picture., a delay means for receiving the succession of decoded
picture of the input stream and outputting the decoded pictures in
delayed fashion relative to the encoded pictures of the input stream,
a value storage means for storing the measured values and the picture
type of a decoded picture in the delay means, and an encoding means
for receiving a decoded picture from the delay means and encoding the
picture using a second video coding method distinct from the encoding
method corresponding to said first method, the encoding means being
respcrsive to a measured value stored in the value storage means for
adjusting the size of the encoded version of the picture, wherein the
value storage means includes a means for manipulating a measured
value of a first characteristic of a decoded picture to derive
estimated values for a second characteristic of the decoded picture.
[0024] An embodiment of the present invention provides
rate control with a picture-based lookahead window for
encoders/transcoders having mixed codecs in a dual-pass
compressed video architecture. In a transcoder, where the
input video signal is a compressed video signal, statistics
are extracted by using a simple compression decoder to
produce the statistics from the compressed video signal; and
in an encoder, where the input video signal is an
uncompressed video signal, statistics are extracted by using
a simple compression encoder to generate the statistics from
the uncompressed video signal. A trans-factor is calculated
for a current picture based on previous pictures in a sliding
"past" window to predict the complexity of the current
picture, the trans-factor being a ratio of global complexity
measures for the simple compression standard versus a
sophisticated compression standard. Bits for the current
picture are then allocated based on the complexity of future
pictures in the lookahead or "future" window. If future
pictures are difficult to encode, then less bits are
a.L.-ocated to the current picture, and vice versa. This is
effective for a scene change. Because the lookahead window
takes into account the statistics of future pictures, i.e.,
pictures that have not yet been compressed according to the
sophisticated compression standard, a more reasonable bit
allocation and better quality is achieved. After encoding
the current picture according to the sophisticated
corrpression standard, the actual bits, the picture complexity
and the trans-factor for the encoded picture are updated as
the past and lookahead windows are shifted by one picture,
i.e., the encoded picture moves into the past window and out
of the lookahead window as a new picture is loaded into the
looks read window.
[0025] The objects, advantages and other novel features of
the present invention are apparent from the following
detailed description when read in conjunction with the
appended claims and attached drawings.
ACCOMPANYING
brief description of the several views of the drawings
[0026] Fig. 1 is a block diagram view of a dual-pass
encoder/transcoder architecture implementing rate control
with a picture-based lookahead window according to the
present, invention.
[0027] Fig. 2 is a flow chart view of a rate control
algorithm according to the present invention.
[0028] Fig. 3 is a conceptual view of virtual sliding
windows according to the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[0029] Fig. 1 illustrates an encoder/transcoder having a
simple compression decoder 12 for receiving and decoding a
compressed video stream encoded according to a simple
compression standard, such as MPEG2, to produce an
uncompressed video signal and related statistics.
Alternatively a simple compression encoder 14 receives an
uncompressed video stream to generate related statistics.
The statistics are input to a lookahead window module 18 for
processing by a rate control algorithm, described below,
while the uncompressed video signal in either configuration
(transcoder or encoder) is input to a storage and delay
nodule 16. The storage and delay module is a buffer memory
that receives, delays and outputs the uncompressed video
stream. The lookahead window module contains the statistics
for each picture in the storage and delay module 16, for
example, the number of bits for the picture, the picture type
and the average quantization step size over all the macro-
blocks for the picture. The lookahead window module 18
generates bit allocation data from the statistics for use by
a sophisticated compression encoder 24, such as an H.264
encoder, in determining rate control for the sophisticated
encoding process. The storage and delay module compensates
for the time required for the lookahead window module 18 to
generate bit allocation data.
[0030] The delayed uncompressed video stream from the
storage and delay module 16 is input to an adaptive pre-
filtsr 20 to produce a filtered uncompressed video stream.
The iiilter may be a low-pass filter that attenuates high
spatal frequencies in the images represented by the
uncompressed video stream and thus serves to "blur" the
unconpressed video stream so that it is easier to compress,
i.e., is less complex and so requires fewer bits to compress.
The strength of the filtering may depend on a threshold or
cut-off frequency above which spatial frequency components
are attenuated and on the degree to which high spatial
frequencies are attenuated.
[0031] Both the delated uncompressed video stream from the
storage and delay module 16 and the filtered, uncompressed
video stream are input to a switch 22 which selects one of
the streams. The selected uncompressed video stream from the
switch 22 and the bit allocation data from the look ahead
window module 18 are input to the sophisticated compression
encoder 24 to produce a compressed video stream according to
a sophisticated compressed video standard, such as H.2 64
(MPEG4, Part 10). The sophisticated compression encoder 24
also provides a control signal to the adaptive pre-filter 20
and to the switch 22 that determines the "strength" of the
filtering and which uncompressed video stream is to be'
encoded. The strength of the filtering may be implemented as
different filtering levels or may be continuous. The
adaptive pre-filter 20 may be switched off or set to a low
strength for minimum filtering when the filtered uncompressed
video stream is not selected for encoding by the
sophisticated compression encoder 24.
[0032] By using a simple encoder/decoder 12/14 instead of
a sophisticated encoder/decoder at the input, the
implementation cost is reduced close to that of a single-pass
sophisticated codec. However the information on complexity
estimation for pictures in the lookahead window module 18 is
not exactly the desired information for the sophisticated
compression encoder 24. For example, a P-type picture needs
high b.-t-rate for motion compensation in simple (MPEG2)
compression encoding if its corresponding original picture
was recorded during light off/on/off transition time. On the
ether Land this P-type picture may be a simple picture for
the sophisticated (H.264) encoder. Despite this deficiency,
a correlation of picture complexity estimation can be made,
based on both the simple and sophisticated compression
standards. In most cases a picture or a group of pictures
(GOP) that is relatively complicated/simple for the simple
compression standard is also relatively complicated/simple
for the sophisticated compression encoder 24. The complexity
statistics still indicate important relationships among
picturee and macro-blocks (MBs), with the error being
colerable. Therefore compared to single-pass sophisticated
coding, the pseudc dual-pass sophisticated coding is superior
in video coding efficiency with only a slightly higher
implemennation cost.
[0033] The statistics of picture complexity are used for:
- estimation of a bit-rate target and selection of
coantizer step sizes of macro-blocks for a current, picture
before second pass encoding; and
- control of the strength of the adaptive pre-filter 20
for a current GOP which includes the current picture before
second pass encoding.
[0034] The larger the volume of statistics that are
available to compute the bit allocation provided to the
compression encoder 24, the better the video quality
performance of the encoder/transcoder". Therefore the storage
and delay module 16 stores multiple uncompressed images. Each
image of the uncompressed video stream will ultimately be
encoded by the sophistficated compression encoder 24 as an I,
P or B type picture. The type of picture (I, P or B) that a
given uncompressed image will be encoded as is based on the
statistics provided to the lookahead window module 18.
Therefore, even though the images stored in the module 16 are
not encoded, it will be convenient to refer to these images
as I, P, or B type pictures. The number of images stored by
the module 16 is limited by the size of the memory and an
allowed maximum delay. A storage length corresponding to at
least two GOPs of the input video signal is desired. For the
purpose of this description, it is assumed that the storage
and delay module is designed to contain two standard GOPs of
fifteen pictures each.
[0035] "'The lookahead window module 18 sets a bit-rate
target for the current picture being encoded based on the
received statistics, which include picture types (I, P or B),
picture size (in bytes), and average quantizer step sizes at
picture levels.
[0036] The picture complexity in the sense of encoding is
not the same for the two different compression standards. A
t-type picture may be complicated and need high bit-rate for
rotion compensation in MPEG2 encoding if its corresponding
original picture was recorded during flash light off/on/off
transition time. On the other hand this P-type picture may
be a simple picture to an H.264 encoder which is able to select one out of up to six reference pictures for motion
prediction, and one of the references may be strongly
correlated with this P-type picture, as indicated above.
[0037] In addition to setting the bit-rate target, the
statistics of picture complexity obtained by the lookahead
window module 18 may also be used for generating the control
signal for the adaptive pre-filter 20 to control the strength
of the low-pass filtering. If the rate control information
indicates that the current picture is a difficult picture
which needs more bit-rate to encode, the strength of the
adaptive pre-filter 20 may be increased so that the picture
is heavily low-pass filtered, i.e., becomes softer and easier
to encode. The sophisticated compression encoder 24 employs
the switch 22 to select either the delayed uncompressed video
signal output from the storage and delay module 16 or the
filtered video signal output by the adaptive pre-filter 20
based on the rate control information and on the virtual
buffer fullness of the sophisticated compression encoder 24.
For example, if the virtual buffer is approaching full and
the rate information indicates that the current picture for
encoding requires more bits than are available in the virtual
buffer, then the amount of pre-filtering is increased so that
the virtual buffer does not overflow and the filtered
uncompressed video is the video signal that is encoded. If
there is no danger of virtual buffer overflow, then the
currez t picture is slightly filtered or not filtered at all.
In the latter event the uncompressed video signal from the
storage and delay module 16 is used as the input for
encoding. However, frequently and abruptly changing the
filter strength and/or switching between the uncompressed
video signal and the filtered uncompressed video signal
within a GOP may lead to a motion compensating residue signal
for P and B pictures. This is avoided by smoothly
controlling the pre-filter 20 within a GOP. If a picture is
filtered, any other pictures which use it as reference should
also be filtered with at least the same filter strength.
[0038] The rate control algorithm used, by way of
illustration, is based on the Test Model 5 (TM5)
specification. TM5 takes a complexity measure to allocate
target: bits for each picture and then sets a quantization
parameter for each MB based en the fullness of the virtual
buffer. In the transcoder configuration all of the
information about the input video signal is available from
the encoded compressed video stream via the decoder 12,
especially the statistics about the complexity of the input
content. In the encoder configuration all the information
about the input video signal is available from the
uncompressed video stream via the simple encoder 14,
especially the statistics about the complexity of the input
contents. The rate control algorithm includes two parts:
1 Take "past" statistics for complexity prediction.
2 Take "future" statistics for bit allocation.
Both processes are adaptive and a past sliding window and a
future sliding window are maintained to update the statistics
after each picture is encoded. Note that the past sliding
window is located in the lookahead window 18 and the future
sliding window is located in the sophisticated compression
encoder 24. Contrary to prior applications that used sliding
windows which increment in terms of GOPs, the sliding windows
of the present invention are picture-based, and move forward
after encoding each picture.
[0039] The rate control algorithm has four steps: (a)
statistics extraction; (b) complexity prediction; (c) bit
allocation; and (d) statistics update.
(a) Statistics Extraction
[0040] When transcoding from an MPEG2 variable bit-rate
(VBR) stream to an H.264 constant bit-rate (CBR) stream or
encoding an uncompressed video stream to an H.264 CBR stream
the following information is collected:
1. Average quantization parameters (quantization step
size) for each picture.
2. Output bits for each picture.
3. Picture type (I, P, B) for each picture.
Items 1 and 2 are used for calculation of the input video's
complexity, while item 3 records the picture type that is
used by the sophisticated compression encoder 24.
(b) Complexity Prediction
[0041] Complexity prediction is to predict the complexity
of a current picture from a prior simple/sophisticated
(MPEG2/H.264) complexity ratio and the input complexity of
the current picture. In TM5 the current picture's complexity
is predicted by that of the previous picture of the s.sme
type. In an embodiment of the invention, the current
picture's complexity is predicted on the basis of the
complexity of all pictures of the same type in the past
window. However since the statistics are based on a sample
encoding format an adjustment to the algorithm in the form of
a scaling factor, referenced here as trans-factor, is
introduced to take into account the difference between; the
sophistication of the two standards and/or the two bit rates.
The trans-factor is calculated as the average of previous
simple/sophisticated ratios and is updated after the encoding
of each picture. Because of the different properties of
different picture types, the trans-factor is calculated
independently for each picture type.
[0042] The complexity prediction algorithm has two steps:
[0043] 1. Calculate a current trans-factor for a current
picture by averaging previous trans-factors
[0044] At the beginning of a video sequence to be
encoded/transcoded there are three initial values for :he
trans-factor, corresponding to the three picture types (I, P,
B) respectively. The average trans-factor over a past sliding
window is generally better than that of only one picture and
takes into account pictures that have already been encoded by
the sophisticated encoder 24 and are within the past window.
[0045] A GOP can be described as containing NI I type
pictures, NP P type pictures and NB B type, pictures. For a
standard GOP, as described above:
N = 15
M = 3
Ni = 1
Np = (N/M) - NI = (15/3) - 1 =4
Nh = N - NI - Np =15-1-4 =10
[0046] The storage and delay module 16 contains W GOPs.
For this discussion, it will be assumed that the module 16 is
designed to store 2 standard GOPs. Wi, WP and WB represent the
total number of I, P and B type pictures respectively in the
storage and delay module 16.
WI = WNi =2* (!) = 2
Wr = W(Np) = 2* (4) = 8
WB = W(Nb) = 2* (10) = 20
[0047] The trans-factor Tcur of a picture in the
uncompressed video stream is calculated by averaging the
trans-factors of the previous Wtype pictures of the same type
(I, P or B), the number of previous trans-factors that are
averaged being equal to the total number of pictures of that
type in the storage and delay module 16 (WlA WP, WB).

where j is the picture number of the current picture.
[0048] For a storage and delay module 16 that contains two
standard GOPs, as described above, two, eight or twenty
previous trans-factors are averaged for an I, P or B type
picture respectively.
[0049] 2. Predict current picture's complexity
[0050] For I and P type pictures, the current picture's
sophisticated, or MPEG4, complexity (X14, XP4 or XB4) is then
predicted from the picture's simple, or MPEG2, complexity
(XI2. Xp2 or Xb2) using the updated trans-factor (Tcur) to
appropriately scale the simple complexity factor. The trans-
factors for B type pictures are further adjusted by a weight
facLor (KB4) to take into account the different quality
requirements for different picture types. This weight factor
has been empirically determined and is a function of the
ratio of the current GOP's I type simple complexity and the
average simple complexity of the GOP's B type pictures.
)
[0C51] Kb4 is larger for a well-predicted sequence, i.e. a
sequence without fast motion, and is smaller for a sequence
with fast motion. KBi is set adaptively after encoding each
GCP in accordance with the ratio Xi/XB where Xi and XB are the
average simple complexity of all I and B pictures in the
current GOP .
[0052] In principle, the sophisticated complexity XP4 for a
P type picture may also adjusted by a weight factor (KP4) , but
it ha= been found that this is not necessary in practice.
c) Bit Allocation
[0053] Bit allocation may be based on GOP-layer and
picture-layer. Picture-layer breaks the GOP boundary and
performs better than GOP-layer. This is particularly
effective for scene changes in the video signal. Bit
allocation has two steps.
[0054] 1. Allocate target bits for current (kth) picture
[0055] The target size (Tw) , in bits, for all the pictures
currently referenced in the sliding lookahead window is
calculated based on the number of pictures in the window
(wF) , the constant bit rate (R) , in bits per second, and the
picture rate (F), in pictures per second.
Tw -- WF(R/F)
[0056] Then the targeted number of bits (B4_tar(k)) to be
ellocated for the kth picture is calculated by multiplying TM
t/ the ratio of the current picture's complexity factor to
the complexity factor of all the pictures in the sliding
I )okahead window.

[0057] This calculation essentially identifies the
proportior of the target size (TM) that should be used for
the currert picture. The size of the current picture when
encoded by the complex encoding algorithm is not permitted to
be larger than the size of the current picture when encoded
by the simple compression algorithm (B2 (k)) . Thus, the size
of the current picture when encoded is clamped to B2 (k) . If
B4 .ar(k) does exceed B2 (k) , the number of bite targeted for
the kta picture is still B4_tar(k). However it is already known
that the smaller number B2 (k) will be the upper limit of bits
used when the encoding actually takes place. Thus there is a
known surplus of bits in the target window. The target window
size in then modified to take the extra bits into account.

[0058] 2. Adaptive quantization and encoding (TM5)
[0059] Before encoding MBj the fullness of the virtual
buffer is computed for I, P, B independently:

where 2j is the number of bits generated by encoding all MBs
in the picture up to and including j, MB_cnt is the number of
MBs in the picture, T is the constant bit rate (CBR) per
picture, d0 is the initial fullness of the virtual buffer and
dj is the fullness of the virtual buffer at MBj. The
reference quantization parameter Qj is then computed for MBj
Qj = dj*5l/r
where the reaction parameter, r, is
r = 2*R/F
Adaptive quantization:
[0060] The spatial activity for MBj from four luminance
picture-organized sub-blocks (n = 1 . . . 4) and four
luminance field-organized sub-blocks (n=5. . .8) are
computer using the original pixel values

where
where- P is the pixel gray level.
Then normalize acty.
N_actj = ; {2*actj) + avg_act) / {actj + (2*avgr_act) )
where avg_act is the average value of actj for the last
picture to be encoded.
Then adjust mquantj as:
mquantj = Qj* N_actj
The t:.nal value of mquantj is clipped to a range [1 ... 51]
and used for the quantization. Delta Q? should be clipped to
[-26,26], as defined by H.264 semantics. Then encode one MB
with mquantj and repeat this step until all MBs of the
current picture are encoded.
[0061] (d). Update picture complexity and trans-factor for
just ancoded picture
[0062] The picture complexity and trans-factor for the
just encoded picture are updated and stored in the sliding
past vindow for use with future pictures.
[0063] 1. Trans-factor is defined as the ratio of
"global complexity measure" of corresponding simple and
sophisticated compression standards pictures.
Cr [current_picture_SNj = XI2/X14
Tp [current_picture_SNj = XP2/Xp4
Tß [current_picture_SNj = XB2/XB4
standard complexity is determined and entered in the past
window while the oldest one is shifted out. A new picture a
statistics are leaded into the lookahead window to determine
a new complexity for the window as the next picture to be
encoded becomes the current picture.
[0066] Thus the present invention provides rate control
with a picture-based sliding window to simplify
transcoding/encoding from a simple compression standard to a
sophisticated compression standard by extracting statistics
for a video signal using the simple compression standard, by
using the extracted statistics and virtual buffer fullness to
control a lowpass pre-filter for the uncompressed video
signal, and by encoding the filtered or unfaltered
uncompressed video signal using a trans-factor which is the
ratio of global complexity measures for the simple and
sophisticated compression generated standards pictures with a
sliding window on a picture-by-picture basis, updating the
trans-factor and sliding window for each picture.
[0067] It will be appreciated that the invention is not
restricted to the particular embodiment that has be?n
described, and that variations may be made therein without
departing from the scope of the invention as defined in the
appended claims and equivalents thereof. Unless the contex,
indicates otherwise, a reference in a claim to the rumber of
instances of an element, be in a reference to one instance of
more than one instance, requires at least the stated number
of instances of the element but is not. intended to exclude
from the scope of the claim a structure or method having more
instances of that element than stated.
WE CLAIM :
1. A method of encoding pictures of an uncompressed digital video sequence, each picture having
a level of complexity, said method comprising :
analyzing a first picture of the uncompressed digital video sequence with a first algorithm to
measure a first value of the first picture's complexity and to assign the first picture's picture type,
estimating a second value of the first picture's complexity using the first measured value as a
parame er. and
encoding the first picture with a distinct second algorithm employing the second value of the
first picture's complexity and the first picture's picture type as parameters.
2. A method as claimed in claim 1, comprising :
analyzing the first picture with the second encoding algorithm to measure a third value of the
first pit lure's complexity, and
storing the picture type and a complexity ratio of the first picture.
3. A method as claimed in claim 1 or 2, comprising, prior to encoding the first picture with the
second algorithm,
generating an encoded version of a past picture with the second algorithm, said encoded version
of the past picture having a size, and
estimating a value of a future picture's complexity,
and wherein the step of encoding the first picture with the second algorithm further comprises :
determining a target size for the first picture using the size of the past picture and the
estimated value of the future picture's complexity as parameters, and
encoding the first picture with the second algorithm employing the first picture's target
size as a parameter.
4. A method as claimed in claim 3, wherein the step of encoding the first picture with the second
algorithm produces an encoded version of the first picture, and the step of determining the target size
of the firsr. picture comprises predicting the size of said encoded version of the first picture.
5. A method as claimed in claim 2, 3 or 4, wherein the first picture's complexity ratio is calculated
by dividing the first value of the picture's complexity by the third value of the first picture's complexity.
6. A method as claimed in any one of claims 1 to 5, wherein the first picture is followed by a
plurality of pictures of the uncompressed video sequence, and the method comprises :
analyzing each of the plurality of pictures of the uncompressed digital video sequence with the
first algorithm to measure first values of each picture's complexity and to assign each picture's picture
type,
estimating a second value of each picture's complexity, using each picture's first value as a
parameter, and
encoding each of the plurality of pictures with the second algorithm employing the second value
and picture type of each picture as parameters.
7. A method as claimed in claim 6, comprising :
analyzing each of the plurality of pictures with the second encoding algorithm to measure a third
value of the complexity for each picture, and
storing the picture type and complexity ratio for each of the plurality of pictures.
8. A method as claimed in claim 6 or 7, wherein the plurality of pictures is followed by a second
picture, and the method comprises :
aralyzing the second picture with the first algorithm to measure a first value of the second
picture's amplexity and to assign the second picture's picture type,
estimating a second value of the second picture's complexity using the first value as a parameter,
and
encoding the second picture with the second algorithm, employing the second value of the
second picture's complexity and the second picture's picture type as parameters.
9. A method as claimed in claim 8, comprising estimating the second picture's second value by
dividing the second picture's first value by a translation factor.
10. A method as claimed in claim 9, comprising calculating the second picture's translation factor
by averaging the stored complexity ratios associated with a sub-set of the plurality of pictures, the subset
being the pictures that have been encoded by the second algorithm and have the same picture type
as the second picture.
11. A method as claimed in claim 10, comprising :
analyzing the second picture with the second encoding algorithm to measure a third value of the
second picture's complexity, and
storing the picture type and complexity ratio of the second picture.
12. A method as claimed in any one of claims 1 to 8, wherein the step of estimating the first
picture's second value comprises dividing the first value by a default value based on the first picture's
picture type.
13. A method as claimed in any one of claims 1 to 12, comprising :
receiving an unfiltered version of the first picture,
creating a filtered version of the first picture prior to encoding the first picture with the second
algoriihm, and
selecting either the unfiltered or filtered version of the first picture for encoding by the second
algoriihm.
14. A method of transcoding pictures of a compressed digital video stream, each picture being
encoded according to a first encoding algorithm and having a level of complexity, said method
comprising :
decoding a first picture of the compressed digital video stream with a first decoding algorithm to
produce a decoded version of the first picture and to measure a first value of the first picture's
comp exity and to determine the first picture's picture type.
estimating a second value of the first picture's complexity using the first value as a parameter,
and
encoding the decoded version of the first picture with a distinct second encoding algorithm
empleying the second value of the first picture's complexity and the first picture's picture type as
parameters.
15. A method as claimed in claim 14, comprising :
analyzing the decoded version of the first picture with the second encoding algorithm to
measure a third value of the first picture's complexity, and
storing the picture type and a complexity ratio of the first picture.
16. A method as claimed in claim 14 or 15, comprising, prior to encoding the decoded version of
the first picture with the second algorithm,
generating an encoded version of a past picture with the second algorithm, said encoded version
of the past picture having a size, and
estimating a value of a future picture's complexity.
and wherein the step of encoding the decoded version of the first picture with the second
algorithm comprises :
determining a target size for the decoded version of the first picture using the size of the
past picture and the second value of the future picture's complexity as parameters, and
encoding the decoded version of the first picture with the second algorithm employing
the first picture's target size as a parameter.
17. A method as claimed in claim 16, wherein the step of encoding the first picture with the second
algorithm produces an encoded version of the first picture, and the step of determining the target size of
the decoded version of the first picture comprises predicting the size of said encoded version of the first
picture.
18 A method as claimed in claim 15, 16 or 17, wherein the first picture's complexity ratio is
calculated by cividing the first value of the picture's complexity by the third value of the first picture's
complexity.
19. A method as claimed in any one of claims 14 to 18, wherein the first picture is followed by a
pluralil> of pictures of the compressed video stream, and the method comprises
analyzing each of the plurality of pictures of the compressed digital video stream with the first
algorithm to produce a decoded version of each picture and to measure a first value of each picture's
complexly and to determine each picture's picture type,
estimating a second value of each picture's complexity using each picture's first value as a
parameter, and
encoding the decoded version of each of the plurality of pictures with the second algorithm,
employing the second value and picture type of each picture as parameters.
20. A method as claimed in claim 19, comprising :
analyzing the decoded version of each of the plurality of pictures with the second algorithm to
measure a third value of the complexity of each picture and
storing the picture type and complexity ratio for each of the plurality of pictures.
21. A method as claimed in claim 19 or 20, wherein the plurality of pictures is followed by a second
picture, and the method comprises :
analyzing the second picture with the first algorithm to produce a decoded version of the second
picture and to measure a first value of the second picture's complexity and to determine the second
picture's picture type,
estimating a second value of the second picture's complexity using the first value as a parameter,
and
encoding the decoded version of the second picture with the second algorithm employing the
second value of the second picture's complexity and the second picture's picture type as parameters.
22. A method as claimed in claim 21, comprising estimating the second picture's second value by
dividing the second picture's first value by a translation factor.
23. A method at claimed in claim 22, comprising calculating the second picture's translation factor
by averaging the stored complexity ratios associated with a sub-set of the plurality of pictures, the sub-
set being the pictures that have been encoded by the second algorithm and have the same picture type as
the second picture
24. A method as claimed in claim 23, comprising :
analyzing the decoded version of the second encoded picture with the second algorithm to
measure a third value of the second picture's complexity, and
soring the picture type and the complexity ratio of the second picture.
25. A method as claimed in any one of claims 14 to 21, wherein the step of estimating the first
picture's second value is carried out by dividing the first value by a default value based on the first
picture's picture type.
26. A method as claimed in any one of claims 14 to 25, comprising :
creating a filtered version of the first picture from an unfiltered, uncompressed version of the
first picture prior to encoding the first picture with the second algorithm, and
selecting either the unfiltered or filtered version of the first picture for encoding by the second
algorithm .
27. An apparatus for encoding an uncompressed digital video input sequence composed of a
succession of pictures, each picture having a plurality of characteristics associated therewith, said
apparatus comprising :
ar extraction means for receiving the succession of pictures of the uncompressed digital video
input sequence and employing a first method to obtain measured values for the plurality of
characteristics of a picture of the input sequence and to assign a picture type to the picture,
a delay means for receiving the succession of pictures of the input sequence and outputting the
pictures in delayed fashion relative to the pictures of the input sequence,
a value storage means for storing the measured values and the picture type of a picture in the
delay means, and
a first encoding means for receiving a picture from the delay means and encoding the picture,
the first encoding means being responsive to a measured value stored in the value storage means for
adjusting the size of the encoded version of the picture,
wherein the first encoding means encodes the picture by a first encoding method and the
extraction means includes a second encoding means for encoding the pictures by a second encoding
method, distinct from the first encoding method.
28. An apparatus as claimed in claim 27, wherein the value storage means comprises a means for
manipulating a measured value of a first characteristic of a picture to derive estimated values for a
second, characteristic of the picture.
29. An apparatus as claimed in claim 27 or 28, wherein the first encoding means receives the
picture type, a measured value of a characteristic and an estimated value of a characteristic associated
with said picture and uses said picture type, said measured value and said estimated value as parameters
for creating the encoded version of said picture.
30. An apparatus as claimed in claim 27, 28 or 29, wherein the value storage means determines a
predicted value of the size of an encoded version of a picture in the delay means and wherein the first
encoding means adjusts the size of the encoded version of the picture received from the delay means in
responst to the predicted value.
31. An apparatus as claimed in any one of claims 27 to 30. wherein the first encoding means
comprises a means for measuring a value of a picture's complexity, a means for storing the measured
complexity value, and a means for storing the size of an encoded version of a picture.
32. An apparatus as claimed in any one of claims 27 to 31, comprising :
a filtering means for receiving an unfiltered picture from the delay means and creating a filtered
picture and
a switching means for receiving the unfiltered picture from the delay means, receiving the
filtered picture from the filtering means and selectively transmitting either the unfiltered picture or the
filtered picture to the first encoding means.
33. An apparatus as claimed in claim 32, wherein the switching means is responsive to the first
encoding means for selecting the unfiltered picture or the filtered picture for transmission to the first
encoding means.
34. An apparatus for transcoding a compressed digital video input stream composed of a succession
of encoded pictures, each encoded picture having a plurality of characteristics associated therewith, said
apparatus comprising :
a decoding means for receiving the succession of encoded pictures of the compressed digital
video input stream and employing a first method having a corresponding first encoding method to
obtain a succession of decoded pictures and measured values for the plurality of characteristics of a
decoded picture and to assign a picture type to each of the decoded pictures,
a delay means for receiving the succession of decoded pictures of the input stream and
outputting the decoded pictures in delayed fashion relative to the encoded pictures of the input stream,
a value storage means for storing the measured values and the picture type of a decoded picture
in the delay means, and
a encoding means for receiving a decoded picture from the delay means and encoding the
picture using a second video coding method distinct from the encoding method corresponding to said
first method, the encoding means being responsive to a measured value stored in the value storage
means for adjusting the size of the encoded version of the picture,
wherein the value storage means includes a means for manipulating a measured value of a first
characteristic of a decoded picture to derive estimated values for a second characteristic of the decoded
picture.
35. An apparatus as claimed in claim 34, wherein the encoding means receives a decoded picture,
the picture type, a measured value of a characteristic and an estimated value of a characteristic
associate: with said decoded picture and uses said picture type, said measured value and said estimated
value as parameters for creating the encoded version of said decoded picture.
36. An apparatus as claimed in claim 34 or 35, wherein the value storage means determines a
predicted value of the size of an encoded version of a decoded picture in the delay means and wherein
the encoding means adjusts the size of the encoded version of the decoded picture received from the
delay means in response to the predicted value.
37. An apparatus as claimed in claim 34, 35 or 36, wherein the encoding means comprises a means
for measuring a value of a decoded picture's complexity, a means for storing the measured complexity
value, and a means for storing the size of an encoded version of a decoded picture.
38. An apparatus as claimed in claim 34, 35, 36 or 37 comprising :
a filtering means for receiving an unfiltered decoded picture from the delay means and creating
a filtered picture and
a switching means for receiving the unfiltered picture from the delay means, receiving the
filtered picture from the filtering means and selectively transmitting either the unfiltered decoded
picture or the filtered picture to the encoding means.
39. An apparatus as claimed in claim 38, wherein the switching means is responsive to the encoding
means for selecting the unfiltered decoded picture or the filtered picture for transmission to the
encoding means.
A method of encoding pictures of an uncompressed digital video sequence
includes analyzing (12 or 14) a picture of the uncompressed digital video sequence with
a first algorithm (MPEG2) to measure a first value of the picture's complexity (X12, Xp2
or Xb2) and to assign the first picture's picture type, and estimating (18) a second value
of the picture's complexity (X14, XP4 or XB4) using the first measured value as a
parameter. The picture is then encoded (24) with a distinct second algorithm (H.264)
employing the second value of the first picture's complexity and the first picture's
picture type as parameters.

Documents:

1225-KOLNP-2005-FORM-27.pdf

1225-kolnp-2005-granted-abstract.pdf

1225-kolnp-2005-granted-assignment.pdf

1225-kolnp-2005-granted-claims.pdf

1225-kolnp-2005-granted-correspondence.pdf

1225-kolnp-2005-granted-description (complete).pdf

1225-kolnp-2005-granted-drawings.pdf

1225-kolnp-2005-granted-examination report.pdf

1225-kolnp-2005-granted-form 1.pdf

1225-kolnp-2005-granted-form 13.pdf

1225-kolnp-2005-granted-form 18.pdf

1225-kolnp-2005-granted-form 3.pdf

1225-kolnp-2005-granted-form 5.pdf

1225-kolnp-2005-granted-reply to examination report.pdf

1225-kolnp-2005-granted-specification.pdf


Patent Number 226717
Indian Patent Application Number 1225/KOLNP/2005
PG Journal Number 52/2008
Publication Date 26-Dec-2008
Grant Date 24-Dec-2008
Date of Filing 23-Jun-2005
Name of Patentee TUT SYSTEMS, INC. ,
Applicant Address 6000 S.W. MEADOWS ROAD, SUITE 200, LAKE OSWEGO, OREGON 97035
Inventors:
# Inventor's Name Inventor's Address
1 YU GUOYAO 12715 NW MILFORD STREET, PORTLAND, OR 97229
2 ZHOU ZHI 3801 BROOKLYN AVENUE NE, #M305, SEATTLE, WA 98105
3 VAN DUSEN CHARLES H P.O. BOX 868, BEAVERTON, OR 97075
PCT International Classification Number H04N 7/12
PCT International Application Number PCT/US2003/039184
PCT International Filing date 2003-12-09
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 10/316,483 2002-12-10 U.S.A.