Title of Invention

APPARATUS AND METHOD FOR ENVELOPE SHAPING A DECORRELATED SIGNAL

Abstract The invention relates to an apparatus (200) for processing a decorrelated signal (202) derived from an original signal (204) or a combination signal derived by combining the original signal (204) and the decorrelated signal (202), comprising: a spectral flattener (206, 208, 210, 220, 222, 224) for spectral flattening the decorrelated signal (202), a signal derived from the decorrelated signal (202) or the combination signal to obtain a first flattened signal, and for spectral flattening the original signal (204) or a signal derived from the original signal (204) to obtain a second flattened signal, the spectral flattener (206, 208, 210, 220, 222, 224) being operative such that a flattened signal has a flatter spectrum than a corresponding signal before flattening; a time envelope shaper (232) for time envelope shaping the decorrelated signal or the combination signal and using information on the first flattened signal and the second flattened signal.
Full Text

ENVELOPE SHAPING OF DECORRELATED SIGNALS
Field of the Invention
The present invention relates to temporal envelope shaping
of signals and in particular to the temporal envelope shap-
ing of a decorrelated signal derived from a downmix signal
and additional control data during the reconstruction of a
stereo or multi-channel audio signal.
Background of the Invention in Prior Art
Recent development in audio coding enables one to recreate
a multi-channel representation of an audio signal based on
a stereo (or mono) signal and corresponding control data.
These methods differ substantially from older matrix based
solutions, such as Dolby Prologic, since additional control
data is transmitted to control the recreation, also re-
ferred to as up-mix, of the surround channels based on the
transmitted mono or stereo channels. Such parametric multi-
channel audio decoders reconstruct N channels based on M
transmitted channels, where N > M, and the additional con-
trol data. Using the additional control data causes a sig-
nificantly lower data rate than transmitting all N chan-
nels, making the coding very efficient, while at the same
time ensuring compatibility with both M channel devices and
N channel devices. The M channels can either be a single
mono channel, a stereo channel, or a 5.1 channel represen-
tation. Hence, it is possible to have an 7.2 channel origi-
nal signal, downmixed to a 5.1 channel backwards compatible
signal, and spatial audio parameters enabling a spatial au-
dio decoder to reproduce a closely resembling version of
the original 7.2 channels, at a small additional bit rate
overhead.
These parametric surround coding methods usually comprise a
parameterisation of the surround signal based on time and
frequency variant ILD (Inter Channel Level Difference) and

ICC (Inter Channel Coherence) quantities. These parameters
describe e.g. power ratios and correlations between channel
pairs of the original multi-channel signal. In the decoder
process, the re-created multichannel signal is obtained by-
distributing the energy of the received downmix channels
between all the channel pairs described by the transmitted
ILD parameters. However, since a multi-channel signal can
have equal power distribution between all channels, while
the signals in the different channels are very different,
thus giving the listening impression of a very wide sound,
the correct wideness is obtained by mixing signals with
decorrelated versions of the same, as described by the ICC
parameter.
The decorrelated version of the signal, often referred to
as wet signal, is obtained by passing the signal (also
called dry signal) through a reverberator, such as an all-
pass filter. The output from the decorrelator has a time-
response that is usually very flat. Hence, a dirac input
signal gives a decaying noise-burst out. When mixing the
decorrelated and the original signal it is for some tran-
sient signal types, like applause signals, important to
shape the time envelope of the decorrelated signal to bet-
ter match that one of the dry signal. Failing to do so will
result in a perception of larger room size and unnatural
sounding transients due to pre-echo type of artefacts.
In systems where the multi-channel reconstruction is done
in a frequency transform domain having a low time resolu-
tion, temporal envelope shaping techniques can be employed,
similarly to those used for shaping quantization noise such
as Temporal Noise Shaping [J. Herre and J. D. Johnston,
"Enhancing the performance of perceptual audio coding by
using temporal noise shaping (TNS)," in 101st AES Conven-
tion, Los Angeles, November 1996] of perceptual audio co-
decs like MPEG-4 AAC. This is accomplished by means of pre-
diction across frequency bins, where the temporal envelope
is estimated by linear prediction in the frequency direc-

tion on the dry signal, and the filter obtained is applied,
again in the frequency direction, on the wet signal.
One may for example consider a delay line as decorrelator
and a strongly transient signal, such as applause or a gun-
shot, as signal to be up-mixed. When no envelope shaping
would be performed, a delayed version of the signal would
be combined with the original signal to reconstruct a ste-
reo or multi-channel signal. Such, the transient signal
would be present twice in the up-mixed signal, separated by
the delay time, causing an unwanted echo type effect.
In order to achieve good results on highly critical sig-
nals, the time-envelope of the decorrelated signal needs to
be shaped with a very high time resolution, such cancelling
out a delayed echo of a transient signal or masking it by
reducing its energy to the energy contained in the carrier
channel at the time.
This broad band gain adjustment of the decorrelated signal
can be done over windows as short as 1ms [US Patent Appli-
cation, "Diffuse Sound Shaping for BCC Schemes and the
Like", No. 11/006492, 12/7/2004]. Such high time-
resolutions of the gain adjustment for the decorrelated
signal inevitably leads to additional distortion. In order
to minimise the added distortion for non-critical signals,
i.e. where the temporal shaping of the decorrelated signal
is not crucial, detection mechanism are incorporated in the
encoder or decoder, that switch the temporal shaping algo-
rithm on and off, according to some sort of pre-defined
criteria. The drawback is that the system can become ex-
tremely sensitive to detector tuning.
Throughout the following description the term decorrelated
signal or wet signal is used for the, possibly gain ad-
justed (according to the ILD and ICC parameters) decorre-
lated version of a downmix signal, and the term downmix

signal, direct signal or dry signal is used for the, possi-
bly gain adjusted downmix signal.
In prior art implementations, a high time-resolution gain
adjustment, i.e. a gain adjustment based on samples of the
dry signal as short as milliseconds, leads to an additional
significant distortion for non-critical signals. These are
non-transient signals having a smooth timely evolution, for
example music signals. The prior art approach of switching
the gain adjustment off for such non-critical signals in-
troduces a new and strong dependency of the quality of au-
dio perception on the detection mechanism, which is, of
course, mostly disadvantageous and may even introduce addi-
tional distortion, when the detection fails.
Summary of the Invention
It is the object of the present invention to provide a con-
cept to shape the envelope of a decorrelated signal more
efficiently, avoiding the introduction of additional signal
distortion.
In accordance with a first aspect of the present invention
this object is achieved by an apparatus for processing a
decorrelated signal derived from an original signal or a
combination signal derived by combining the original signal
and the decorrelated signal, comprising: a spectral flat-
tener for spectral flattening of the decorrelated signal, a
signal derived from the decorrelated signal, the original
signal, a signal derived from the original signal or the
combination signal to obtain a flattened signal, the spec-
tral flattener being operative such that the flattened sig-
nal has a flatter spectrum than a corresponding signal be-
fore flattening; and a time envelope shaper for time enve-
lope shaping the decorrelated signal or the combination
signal using information on the flattened signal.

In accordance with a second aspect of the present inven-
tion this object is achieved by a spatial audio decoder,
comprising: an input interface for receiving an original
signal derived from a multi channel signal having at least
two channels and for receiving spatial parameters describ-
ing an interrelation between a first channel and a second
channel of the multi channel signal; a decorrelator for
deriving a decorrelated signal from the original signal
using the spatial parameters; a spectral flattener for
spectral flattening of the decorrelated signal, a signal
derived from the decorrelated signal, the original signal,
a signal derived from the original signal or a combination
signal derived by combining the original signal and the
decorrelated signal to obtain a flattened signal, the
spectral flattener being operative such that the flattened
signal has a flatter spectrum than a corresponding signal
before flattening; and a time envelope shaper for time en-
velope shaping the decorrelated signal or the combination
signal using information on the flattened signal.
In accordance with a third aspect of the present invention
this object is achieved by a receiver or audio player,
having an apparatus for processing a decorrelated signal
derived from an original signal or a combination signal
derived by combining the original signal and the decorre-
lated signal, comprising: a spectral flattener for spec-
tral flattening of the decorrelated signal, a signal de-
rived from the decorrelated signal, the original signal, a
signal derived from the original signal or the combination
signal to obtain a flattened signal, the spectral flat-
tener being operative such that the flattened signal has a
flatter spectrum than a corresponding signal before flat-
tening; and a time envelope shaper for time envelope shap-
ing the decorrelated signal or the combination signal us-
ing information on the flattened signal.

In accordance with a fourth aspect of the present inven-
tion this object is achieved by a method for processing a
decorrelated signal derived from an original signal or a
combination signal derived by combining the original sig-
nal and the decorrelated signal, the method comprising:
spectrally flattening the decorrelated signal, a signal
derived from the decorrelated signal, the original signal,
a signal derived from the original signal or the combina-
tion signal to obtain a flattened signal, the flattened
signal having a flatter spectrum than a corresponding sig-
nal before flattening; and time envelope shaping the
decorrelated signal or the combination signal using infor-
mation on the flattened signal.
In accordance with a fifth aspect of the present invention
this object is achieved by a method of receiving or audio
playing, the method having a method for processing a
decorrelated signal derived from an original signal or a
combination signal derived by combining the original sig-
nal and the decorrelated signal, the method comprising:
spectrally flattening the decorrelated signal, a signal
derived from the decorrelated signal, the original signal,
a signal derived from the original signal or the combina-
tion signal to obtain a flattened signal, the flattened
signal having a flatter spectrum than a corresponding sig-
nal before flattening; and time envelope shaping the
decorrelated signal or the combination signal using infor-
mation on the flattened signal.
In accordance with a sixth aspect of the present invention
this object is achieved by a computer program for perform-
ing, when running on a computer, a method in accordance
with any of the above method claims.
The present invention is based on the finding that the en-
velope of a decorrelated signal derived from an original
signal or of a combination signal derived by combining the
original signal and the decorrelated signal can be shaped

without introducing additional distortion, when a spectral
flattener is used to spectrally flatten the spectrum of the
decorrelated signal or the combination signal and the
original signal to use the flattened spectra for deriving a
gain factor describing the energy distribution between the
flattened spectra, and when the so derived gain factor is
used by an envelope shaper to shape the time envelope of
the decorrelated signal or of the combination signal.
Flattening the spectrum has the advantage that transient
signals are hardly affected by flattening, since these sig-
nals already have a rather flat spectrum. Moreover, the
gain factors derived for non-transient signals are being
brought closer to unity. Therefore both demands shaping
transient signals and not altering non-transient signals
can be met at a time, without having to switch envelope
shaping on and off during a decoding process.
The same advantages hold for shaping of combination signals
that are a combination of an original signal and a decorre-
lated signal which is derived from said original signal.
Such a combination may be derived by first deriving a
decorrelated signal from the original signal and by then
simply adding the two signals. For example, possible pre-
echo type of artefacts can be advantageously suppressed in
the combination signal by shaping the combination signal
using the flattened spectrum of the combination signal and
the flattened spectrum of the original signal to derive
gain factors used for shaping.
The present invention relates to the problem of shaping the
temporal envelope of decorrelated signals that are fre-
quently used in reconstruction of multi-channel audio sig-
nals. The invention proposes a new method that retains the
high time resolution for applause signals, while minimising
the introduced distortion for other signal types. The pre-
sent invention teaches a new way to perform the short time
energy adjustment that significantly reduces the amount of

distortion introduced, making the algorithm much more ro-
bust and less dependent on a very accurate detector con-
trolling the operation of a temporal envelope shaping algo-
rithm.
The present invention comprises the following features:
performing spectral flattening of the direct sound
signal or a signal derived from the direct sound sig-
nal, over a time segment significantly longer than the
time segment used for temporal envelope shaping;
performing spectral flattening of the decorrelated
signal, over a time segment significantly longer than
the time segment used for temporal envelope shaping;
calculating the gain factor for the short time segment
used for envelope shaping based on the long time spec-
trally flattened signals;
performing the spectral flattening in the time domain
by means of LPC (Linear Predictive Coding);
performing the spectral flattening in the subband do-
main of a filterbank;
performing spectral flattening prior to frequency di-
rection based prediction of temporal envelope;
performing energy correction for frequency direction
based prediction of temporal envelope.
The following problems are completely or significantly re-
duced by the present invention, that would otherwise arise
when attempting very short time broad band energy correc-
tion of a decorrelated signal:

the problem of introducing a significant amount of
distortion especially for signal segments where the
temporal shaping is not required;
the problem of introducing high dependency on a detec-
tor indicating when the short time energy correction
should be operated, due to the distortion introduced
for arbitrary signals.
The present invention outlines a novel method for calculat-
ing the required gain adjustment that retains the high
time-resolution but minimises the added distortion. This
means that a spatial audio system utilising the present in-
vention is not as dependent on a detection mechanism that
switches the temporal shaping algorithm off for non-
critical items, since the added distortion for items where
the temporal shaping is not required is kept to a minimum.
The novel invention also outlines how to get an improved
estimate of the temporal envelope of the dry signal to be
applied to the wet signal when estimating it by means of
linear prediction in the frequency direction within the
transform domain.
In one embodiment of the present invention an inventive ap-
paratus for processing a decorrelated signal is applied
within the signal processing path of a 1 to 2 upmixer after
the derivation of the wet signal from the dry signal.
Firstly, a spectrally flattened representation of the wet
signal and of the dry signal is computed for a large number
of consecutive time domain samples (a frame). Based on
those spectrally flattened representations of the wet and
the dry signal, gain factors to adjust the energy of a
smaller number of samples of the wet signal are then com-
puted based on the spectrally flattened representations of
the wet and the dry signal. By spectrally flattening, the
spectrum of a transient signal, which is rather flat by na-

ture, is hardly altered, whereas the spectrum of periodic
signals is strongly modified. Using a signal representation
with flattened spectra therefore achieves both, shaping the
envelope of the decorrelated wet signal heavily, when a
transient signal is predominant and shaping the envelope of
the wet signal merely, when smooth or periodic signals
carry the most energy in the dry channel. Thus, the present
invention significantly reduces the amount of distortion
added to the signal especially for signal segments where
the temporal envelope shaping is basically not required.
Furthermore, the high dependency on a prior art detector
indicating when short time energy corrections should be ap-
plied, is avoided.
In a further embodiment of the present invention an inven-
tive apparatus operates on an upmixed (combined) monophonic
signal which is derived by an upmixer that combines an
original signal and a decorrelated signal derived from the
original signal to compute the upmixed monophonic signal.
Such upmixing is a standard strategy during reconstruction
of multi-channel signals for deriving individual channels
that have acoustic properties of the corresponding original
channel of the multi-channel signal. Since the inventive
apparatus can be applied after such upmixing, already ex-
isting set ups can easily be extended.
In a further embodiment of the present invention, the tem-
poral envelope shaping of a decorrelated signal is imple-
mented within the subband domain of a filterbank. There,
flattened spectral representations of the various subband
signals are derived for each subband individually for a
high number of consecutive samples. Based on the spectrally
flattened long-term spectra, the gain factor to shape the
envelope of the wet signal according to the dry signal is
computed for a sample representing a much lower time period
of the original signal. The advantages with respect to the
perceptual quality of the reconstructed audio signal are
the same as for the example described above. Furthermore,

the possibility to implement the inventive concept within a
filterbank representation has the advantage, that already
existing multi-channel audio decoders using filterbank rep-
resentations can be modified to implement the inventive
concept without major structural and computational efforts.
In a further embodiment of the present invention, the tem-
poral envelope shaping of the wet signal is performed
within the subband domain using linear prediction. There-
fore, linear prediction is applied in the frequency direc-
tion of the filterbank, allowing to shape the signal with
higher time resolution than natively available in the fil-
terbank. Again, the final energy correction is computed by
estimating gain curves for a number of consecutive subband
samples of the filterbank.
In a modification of the previously described embodiment of
the present invention, the estimation of the parameters de-
scribing the whitening of the spectrum are smoothed over a
number of neighbouring time samples of the filterbank.
Therefore, the risk of applying a wrongly derived inverse
filters to whiten the spectrum when transient signals are
present, is further reduced.
Brief Description of the Drawings
Fig. la shows the application of an inventive apparatus
within a 1 to 2 upmixer stage;
Fig. lb shows a further example of an application of an
inventive apparatus;
Fig. 2a shows an alternative placement possibility of the
inventive apparatus;
Fig. 2b shows a further example for the placement of an
inventive apparatus;

Fig. 3a shows the use of an inventive apparatus within a
multi-channel audio decoder;
Fig. 3b shows an inventive apparatus within a further
multi-channel audio decoder;
Fig. 4a shows a preferred embodiment of an inventive ap-
paratus;
Fig. 4b shows a modification of the inventive apparatus
of Fig. 4a;
Fig. 4c shows an example of linear predictive coding;
Fig.4d shows the application of a bandwidth expansion
factor at linear predictive coding;
Fig. 5a shows an inventive spectral flattener;
Fig. 5b shows an application scheme of long-term energy
correction;
Fig. 6 shows an application scheme for short-term energy
correction;
Fig. 7a shows an inventive apparatus within a QMF-
filterbank design;
Fig. 7b shows details of the inventive apparatus of Fig.
7a;
Fig. 8 shows the use of an inventive apparatus within a
multi-channel audio decoder;
Fig. 9 shows the application of an inventive apparatus
after the inverse filtering in a QMF based de-
sign;

Fig. 10 shows the time-versus frequency representation of
a signal with a filterbank representation;
Fig. 11 shows a transmission system having an inventive
decoder.
Detailed Description of Preferred Embodiments
Fig. 1 is showing a 1 to 2 channel parametric upmixing de-
vice 100 to upmix a submitted mono channel 105 into two
stereo channels 107 and 108, additionally using spatial pa-
rameters. The parametric upmixing device 100 has a paramet-
ric stereo upmixer 110, a decorrelator 112 and an inventive
apparatus for processing a decorrelated signal 114.
The transmitted monophonic signal 105 is input into the pa-
rametric stereo upmixer 110 as well as into the decorrela-
tor 112, that derives a decorrelated signal from the trans-
mitted signal 105 using a decorrelation rule, that could,
for example, be implemented by simply delaying the signal
for a given time. The decorrelated signal produced by the
decorrelator 112 is input into the inventive apparatus
(shaper) 114, that additionally receives the transmitted
monophonic signal as input. The transmitted monophonic sig-
nal is needed to derive the shaping rules used to shape the
envelope of the decorrelated signal, as elaborated in more
detail in the coming paragraphs.
Finally, a envelope shaped representation of the decorre-
lated signal is input into the parametric stereo upmixer,
which derives the left channel 107 and the right channel
108 of a stereo signal from the transmitted monophonic sig-
nal 105 and from the envelope shaped representation of the
decorrelated signal.
To better understand the inventive concept and the differ-
ent presented embodiments of the present invention, the up-
mixing process of a transferred monophonic signal into a

stereo signal using the additionally submitted special pa-
rameters is explained within the following paragraphs:
It is known from prior art that two audio channels can be
reconstructed based on a downmix channel and a set of spa-
tial parameters carrying information on the energy distri-
bution of the two original channels upon which the downmix
was made as well as information on the correlation between
the two original channels. The embodiment in Fig. plifies a frame work for the present invention.
In Fig. 1, the downmixed mono signal 105 is fed into a
decorrelator unit 112 as well as a up-mix module 110. The
decorrelator unit 112 creates a decorrelated version of
the input signal 105, having the same frequency character-
istics and the same long term energy. The upmix module
calculates an upmix matrix based on the spatial parameters
and the output channels 107 and 108 are synthesised. The
upmix module 110 can be explained according to:

with the parameters c,, cr, a and β being derived from
the ILD parameters and the ICC parameters transmitted in
the bitstream. The signal X[k] is the received downmix sig-
nal 105, the signal Q[k] is the de-correlated signal, being
a decorrelated version of the input signal 105. The output
signals 107 and 108 are denoted Y1[k] and Y2[k].
The new module 114 is devised to shape the time envelope
of the signal being output of the decorrelator module 112
so that the temporal envelope matches that of the input
signal 105. The details of module 100 will be elaborated
extensively on in a later section.

It is evident from the above and from Fig. 1 that the up-
mix module generates a linear combination of the downmix
signal and the decorrelated version of the same. It is
thus evident that the summation of the decorrelated signal
and the downmix signal can be done within the upmix as
outlined above or in a subsequent stage. Hence, the two
output channels above 107 and 108 can be replaced by four
output channels, where two are holding the decorrelated
version and the direct-signal version of the first chan-
nel, and two are holding the decorrelated version and the
direct-signal version of the second channel. This is
achieved by replacing the above upmix equation by:

The reconstructed output channels are subsequently ob-
tained by:

Given the above, it is clear that an inventive apparatus
can be implemented into a decoding scheme as well before
the final up-mixing, as shown in Fig. 1, as after the up-
mixing. Moreover, the inventive apparatus can be used to
shape the envelope of a decorrelated signal as well in the
time domain as in a QMF subband domain.
Fig. lb shows a further preferred embodiment of the present
invention where an inventive shaper 114 is used to shape a
combination signal 118 derived from the transmitted mono-
phonic signal 105 and a decorrelated signal 116 derived
from the transmitted monophonic signal 105. The embodiment
of Fig. lb is based on the embodiment of Fig. 1. Therefore,

components having the same functionality have the same
marks.
A decorrelator 112 derives the decorrelated signal 116 from
the transmitted monophonic signal 105. A mixer 117 receives
the decorrelated signal 116 and the transmitted monophonic
signal 105 as an input and derives the combination signal
118 by combining the transmitted signal 105 and the decor-
related signal 116.
Combination may in that context mean any suitable method to
derive one single signal from two or more input signals. In
the simplest example the combination signal 118 is derived
by simply adding the transmitted monophonic signal 105 and
the decorrelated signal 116.
The shaper 114 receives as an input the combination signal
118 that is to be shaped. To derive the gain factors for
shaping, the transmitted monophonic signal 105 is also in-
put into the shaper 114. A partly decorrelated signal 119
is derived at the output of the shaper 114 that has a
decorrelated signal component and an original signal compo-
nent without introducing additional audible artefacts.
Fig. 2 shows a configuration, where the envelope shaping of
the wet signal part can be applied after the upmix.
Fig. 2 shows an inventive parametric stereo upmixer 120 and
a decorrelator 112. The monophonic signal 105 is input into
the decorrelator 112 and into the parametric stereo upmixer
120. The decorrelator 112 derives a decorrelated signal
from the monophonic signal 105 and inputs the decorrelated
signal into the parametric stereo upmixer 120. The paramet-
ric stereo upmixer 120 is based on the parametric stereo
upmixer 110 already described in Fig. 1. The parametric
stereo upmixer 120 differentiates from the parametric ste-
reo upmixer 110 in that the parametric stereo upmixer 120
derives a dry part 122a and a wet part 122b of the left

channel and a dry part 124a and a wet part 124b of the
right channel. In other words, the parametric stereo up-
mixer 120 up-mixes the dry signal parts and the wet signal
parts for both channels separately. This might be imple-
mented in accordance with the formulas given above.
As the wet signal parts 122a and 124a have been up-mixed
but not shaped, a first shaper 126a and a second shaper
126b are additionally present in the inventive up-mixing
set shown in Fig. 2. The first shaper 126a receives at its
input the wet signal 122b to be shaped and as a reference
signal a copy of the left signal 122a. At the output of the
first shaper 126a, a shaped dry signal 128a is provided.
The second shaper 126b receives the right dry signal 124b
and the right wet signal 124a at its input and derives the
shaped wet signal 128b of the right channel as its output.
To finally derive the desired left signal 107 and right
signal 108, a first mixer 129a and a second mixer 129b are
present in the inventive setup. The first mixer 129a re-
ceives at its input a copy of the left up-mixed signal 122a
and the shaped wet signal 128b to derive (at its output)
the left signal 107. The second mixer 129b derives the
right channel 108 in an analogous way, receiving the dry
right signal 124a and the shaped wet right signal 128b at
its inputs. As can be seen from Fig. 2, this setup can be
operated as an alternative to the embodiment shown in Fig.
1.
Fig. 2b shows a preferred embodiment of the present inven-
tion being a modification of the embodiment previously
shown in Fig. 2 and therefore the same components share the
same marks.
In the embodiment shown in Fig. 2b, the wet signal 122b is
first mixed with its dry counterpart 122a to derive a left
intermediate channel L* and the wet signal 124b is mixed
with its dry counterpart 124a to receive a right intermedi-
ate channel R*. Thus, a channel comprising left-side infor-

mation and a channel comprising right-side information is
generated. There is, however, still the possibility of hav-
ing introduced audible artefacts by the wet signal compo-
nents 122b and 124b. Therefore, the intermediate signals L*
and R* are shaped by corresponding shapers 126a and 126b
that additionally receive as an input the dry signal parts
122a and 124a. Thus, finally a left channel 107 and a right
channel 108 can be derived having the desired spatial prop-
erties.
To summarize shortly, the embodiment shown in Fig. 2b dif-
fers from the embodiment shown in Fig. 2b in that the wet
and dry signals are upmixed first and the shaping is done
on the so derived combinations signal (L* and R*) . Thus,
Fig. 2b shows an alternative set-up to solve the common
problem of having two derive to channels without introduc-
ing audible distortions by the used decorrelated signal
parts. Other ways of combining two signal parts to derive a
combination signal to be shaped, such as for example multi-
plying or folding signals, are also suited to implement the
inventive concept of shaping using also spectrally flat-
tened representations of the signals.
As shown in Fig. 3a, two channel reconstruction modules can
be cascaded into a tree-structured system that iteratively
recreates, for example, 5.1 channels from a mono downmix
channel 130. This is outlined in Fig. 3a, where several in-
ventive upmixing modules 100 are cascaded to recreate 5.1
channels from the monophonic downmix channel 130.
The 5.1 channel audio decoder 132 shown in Fig. 3a com-
prises several 1 to 2 upmixers 100, that are arranged in a
tree-like structure. The upmix is done iteratively, by sub-
sequent upmixing of mono channels to stereo channels, as
already known in the art, however using inventive 1 to 2
upmixer blocks 100 that comprise an inventive apparatus for
processing a decorrelated signal to enhance the perceptual
quality of the reconstructed 5.1 audio signal.

The present invention teaches that the signal from the
decorrelator must undergo accurate shaping of its temporal
envelope in order to not cause unwanted artefacts when the
signal is mixed with the dry counterpart. The shaping of
the temporal envelope can take place directly after the
decorrelator unit as shown in Fig. 1 or, alternatively, up-
mixing can be performed after the decorrelator for both,
the dry signal and the wet signal separately, and the final
summation of the two is done in the time domain after the
synthesis filtering, as sketched in Fig. 2. This can alter-
natively be performed in the filterbank domain also.
To support the above mentioned separate generation of dry
signals and wet signals, a hierarchical structure as shown
in Fig. 3b is used in a further embodiment of the present
invention. Fig. 3b is showing a first hierarchical decoder
150 comprising several cascaded modified upmixing modules
152 and a second hierarchical decoder 154 comprising sev-
eral cascaded modified upmixing modules 156.
To achieve the separate generation of the dry and the wet
signal paths, the monophonic downmix signal 130 is split
and input into the first hierarchical decoder 150 as well
as into the second hierarchical decoder 154. The modified
upmixing modules 152 of the first hierarchical decoder 150
differentiate from the upmixing modules 100 of the 5.1
channel audio decoder 132 in that they are only providing
the dry signal parts at their outputs. Correspondingly, the
modified upmixing modules 156 of the second hierarchical
decoder 154 are only providing the wet signal parts at
their outputs. Therefore, by implementing the same hierar-
chical structure as already in Fig. 3a, the dry signal
parts of the 5.1 channel signal are generated by the first
hierarchical decoder 150, whereas the wet signal parts of
the 5.1 channel signal are generated by the second hierar-
chical decoder 154. Hence the generation of the wet and dry
signals can for example be performed within the filterbank

domain, whereas the combination of two signal parts can be
performed in the time domain.
The present invention further teaches that the signals
used for extraction of the estimated envelopes that are
subsequently used for the shaping of the temporal envelope
of the wet signal shall undergo a long term spectral flat-
tening or whitening operation prior to the estimation
process in order to minimise the distortion introduced
when modifying the decorrelated signal using very short
time segments, i.e. time segments in the 1 ms range. The
shaping of the temporal envelope of the decorrelated sig-
nal can be done by means of short term energy adjustment
in the subband domain or in the time domain. The whitening
step as introduced by the present invention ensures that
the energy estimates are calculated on an as large time
frequency tile as possible. Stated differently, since the
duration of the signal segment is extremely short, it is
important to estimate the short term energy over an as
large frequency range as possible, in order to maximise
the "number of data-points" used for energy calculation.
However, if one part of the frequency range is very domi-
nant over the rest, i.e. a steep spectral slope, the num-
ber of valid data points becomes too small, and the esti-
mate obtained will be prone to vary from estimate to esti-
mate, imposing unnecessary fluctuations of the applied
gain values.
The present invention further teaches that when the tempo-
ral envelope of the decorrelated signal is shaped by means
of prediction in the frequency direction [J. Herre and J.
D. Johnston, "Enhancing the performance of perceptual au-
dio coding by using temporal noise shaping (TNS)," in
101st AES Convention, Los Angeles, November 1996.], the
frequency spectrum used to estimate the predictor should
undergo a whitening stage, in order to achieve a good es-
timate of the temporal envelope that shall be applied to
the decorrelated signal. Again, it is not desirable to

base the estimate on a small part of the spectrum as would
be the case for a steep sloping spectrum without spectral
whitening.
Fig. 4a shows a preferred embodiment of the present inven-
tion operative in the time domain. The inventive apparatus
for processing a decorrelated signal 200 receives the wet
signal 202 to be shaped and the dry signal 204 as input,
wherein the wet signal 202 is derived from the dry signal
204 in a previous step, that is not shown in Fig.4.
The apparatus 200 for processing a decorrelated signal 202
is having a first high path filter 206, a first linear
prediction device 208, a first inverse filter 210 and a
first delay 212 in signal path of the dry signal and a
second high-pass filter 220, a second linear prediction
device 222, a second inverse filter 224, a low-pass filter
226 and a second delay 228 in the signal path of the wet
signal. The apparatus further comprises a gain calculator
230, a multiplier (envelope shaper) 232 and an adder (up-
mixer) 234.
On the dry signal side, the input of the dry signal is
split and the input into the first high-pass filter 206
and the first delay 212. An output of the high-pass filter
206 is connected with an input of the first linear predic-
tion device 208 and with an first input of the first in-
verse filter 210. An output of the first linear prediction
device 208 is connected to a second input of the inverse
filter 210, and an output of the inverse filter 210 is
connected to a first input of the gain calculator 230. In
the wet signal path, the wet signal 202 is split and input
into an input of the second high-pass filter 220 and to an
input of the low-pass filter 226. An output of the low-
pass filter 226 is connected to the second delay 228. An
output of the second high-pass filter 220 is connected to
an input of the second linear prediction device 222 and to
a first input of the second inverse filter 224. A output

of the second linear prediction device 222 is connected to
a second input of the second inverse filter 224, an output
of which is connected to a second input of the gain calcu-
lator 230. The envelope shaper 232 receives at a first in-
put the high-pass filtered wet signal 202 as supplied at
the output of the second high-pass filter 220. A second
input of the envelope shaper 232 is connected to an output
of the gain calculator 230. An output of the envelope
shaper 232 is connected to a first input of the adder 234,
that receives at a second input a delayed dry signal, as
supplied from an output of the first delay 212, and which
further receives at a third input a delayed low frequency
portion of the wet signal, as supplied by an output of the
second delay 228. At an output of the adder 232, the com-
pletely processed signal is supplied.
In the preferred embodiment of the present invention shown
in Fig. 4a, the signal coming from the decorrelator (the
wet signal 202) and the corresponding dry signal 204 are
input into the second high-pass filter 220, and the first
high-pass filter 206, respectively, where both signals are
high-pass filtered at approximately 2kHz cut-off fre-
quency. The wet signal 202 is also low-pass filtered by
the low-pass filter 226, that is having a path band simi-
lar to the stop band of the second high-pass filter 220.
The temporal envelope shaping of the decorrelated (wet)
signal 202 is thus only performed in the frequency range
above 2kHz. The low-pass part of the wet signal 202 (not
subject to temporal envelope shaping) is delayed by the
second delay 208 to compensate for the delay introduced
when shaping the temporal envelope of the high-pass part
of the decorrelated signal 202. The same is true for the
dry signal part 204, that receives the same delay time by
the first delay 212, so that at the adder 234, the proc-
essed high-pass filtered part of the wet signal 202, the
delayed low-pass part of the wet signal 202 and the de-
layed dry signal 204 can be added or upmixed to yield a
finally processed upmixed signal.

According to the present invention, after the high-pass
filtering, the long-term spectral envelope is to be esti-
mated. It is important to note, that the time segment used
for the long-term spectral envelope estimation is signifi-
cantly longer than the time segments used to do the actual
temporal envelope shaping. The spectral envelope estima-
tion and subsequent inverse filtering typically operates
on time segments in the range of 20 ms while the temporal
envelope shaping aims at shaping the temporal envelope
with an accuracy in the 1 ms range. In the preferred em-
bodiment of the present invention shown in Fig. 4a, the
spectral whitening is performed by inverse filtering with
the first inverse filter 210 operating on the dry signal
and the second inverse filter 224 operating on the wet
signal 202. To obtain the required filter coefficients for
the first inverse filter 210 and the second inverse filter
224, the spectral envelopes of the signals are estimated
by means of linear prediction by the first linear predic-
tion device 208 and the second linear prediction device
222. The spectral envelope H(z) of a signal can be ob-
tained using linear prediction, as described by the fol-
lowing formulas:

is the polynomial obtained using the autocorrelation
method or the covariance method [Digital Processing of
Speech Signals, Rabiner & Schafer, Prentice Hall, Inc.,
Englewood Cliffs, New Jersey 07632, ISBN 0-13-213603-1,
Chapter 8], and G is a gain factor. The order p of the
above polynomial is called predictor order.

As shown in Fig. 4a, the linear prediction of the spectral
envelope of the signal is done in parallel for the dry sig-
nal part 204 and for the wet signal part 202. With these
estimates of the spectral envelope of the signals, inverse
filtering of the high-pass filtered dry signal 204 and the
wet signal 202 can be performed, i.e. the flattening of the
spectrum (spectral whitening) can be done while the energy
within the signals has to be preserved. The degree of spec-
tral whitening, i.e. the extent to which the flattened
spectrum becomes flat, can be controlled by the varying
predictor order p, i.e. by limiting the order of the poly-
nomial A(z), thus limiting the amount of fine structure
that can be described by H(z). Alternatively, a bandwidth
expansion factor can be applied to the polynomial A(z). The
bandwidth expansion factor is defined according to the fol-
lowing formula, based on the polynomial A(z).

The temporal envelope shaping and the effect of the band-
width expansion factor p are illustrated in Figs. 4c and
4d.
Fig. 4c gives an example for the estimation of the spectral
envelope of a signal, as it could be done by the first lin-
ear prediction device 208 and the second linear prediction
device 222. For the spectral representation of Fig. 4c, the
frequency in Hz is plotted on the x-axis versus the energy
transported in the given frequency in units of dB on the y-
axis.
The solid line 240 describes the original spectral envelope
of the processed signal, whereas the dashed line 242 gives
the result obtained by linear predictive coding (LPC) using
the values of the spectral envelope at the marked equidis-
tant frequency values. For the example shown in Fig. 4c,
the predictor order p is 30, the comparatively high predic-

tor order explaining the close match of the predicted spec-
tral envelope 242 and the real spectral envelope 240. This
is due to the fact that the predictor is able to describe
more fine structure, the higher the predictor order.
Fig. 4d shows the effect of lowering the predictor order p
or of applying a bandwidth expansion factor p . Fig. 4d
shows two examples of estimated envelopes in the same rep-
resentation as in Fig. 4c, i.e. the frequency on the x-axis
and the energy on the y-axis. A estimated envelope 244
represents a spectral envelope obtained from linear predic-
tive coding with a given predictor order. The filtered en-
velope 24 6 shows the result of linear predictive coding on
the same signal with reduced predictor order p or, alterna-
tively, with a bandwidth expansion factor row applied. As
can be seen, the filtered envelope 246 is much smoother
than the estimated envelope 244. This means that at the
frequencies, where the estimated envelope 244 and the fil-
tered envelope 24 6 differ at most, the filtered envelope
246 describes the real envelope less precise than the esti-
mated envelope 244. Hence, an inverse filtering based on
the filtered envelope 246 yields a flattened spectrum, that
is flattened less as if using the parameters from the esti-
mated envelope 244 in the inverse filtering process. The
inverse filtering is described in the following paragraph.
The parameters or coefficients a* estimated by the linear
predicted devices are used by the inverse filters 210 and
224, to do the spectral flattening of the signals, i.e. the
inverse filtering by using the following inverse filter
function:

where pis the predictor order and p is the optional band-
width expansion factor.

The coefficients ak can be obtained in different manners,
e.g. the autocorrelation method or the covariance method.
It is common practice to add some sort of relaxation to
the estimate in order to ensure stability of the system.
When using the autocorrelation method this is easily ac-
complished by offsetting the zero-lag value of the corre-
lation vector. This is equivalent to addition of white
noise at a constant level to the signal used to estimate
A(z).
The gain calculator 230 calculates the short time target
energies, i.e. the energies needed within the single sam-
ples of the wet signal to fulfil the requirement of an en-
velope of the wet signal that is shaped to the envelope of
the dry signal. These energies are calculated based on the
spectrally flattened dry signal and based on the spectrally
flattened wet signal. A derives gain adjustment value can
then be applied to the wet signal by the envelope shaper
232.
Before describing the gain calculator 230 in more detail,
it may be noted, that during the inverse filtering the gain
factor G of the inverse filters 210 and 224 needs to be
taken care for. Since the dry and wet signals operated on
are output signals from an upmix-process that has produced
two output signals for every channel, wherein the first
channel has a specific energy ratio with respect to the
second channel according to the ILD and ICC parameters used
for the upmixed process, it is essential that this relation
is maintained in average over the time segment for which
the ILD and ICC parameters are valid in the course of the
temporal envelope shaping. Stated differently, the appara-
tus for processing a decorrelated signal 200 shall only
modify the temporal envelope of the decorrelated signal,
while maintaining the same average energy of the signal
over the segment being processed.

The gain calculator 230 operates on the two spectrally
flattened signals and calculates a short-time gain function
for application on the wet-signal over time segments much
shorter than the segments used for inverse filtering. For
example, when the segment length for inverse filtering is
2048 samples, the short-term gain factors may be computed
for samples of a length of 64. This means that on the basis
of spectra, that are flattened over a length of 2048 sam-
ples, gain factors are derived for temporal energy shaping
using much shorter segments of the signal as, for example,
64.
The application of the calculated gain factors to the wet
signal is done by the envelope shaper 232 that multiplies
the calculated gain factors with the sample parameters. Fi-
nally the high-pass filtered, envelope shaped wet signal is
added to its low frequency part by the adder (upmixer) 234,
yielding the finally processed and envelope shaped wet sig-
nal at the output of the envelope shaper 234.
As energy preservation and smooth transition between dif-
ferent gain factors is an issue as well during the inverse
filtering as during the application of the gain factor,
windowing functions may additionally be applied to calcu-
lated gain factors to guarantee for a smooth transition be-
tween gain factors of neighbouring samples. Therefore, the
inverse filtering step and the application of the calcu-
lated short-term gain factors to the wet signals are de-
scribed in more detail within Figs. 5a, 5b and 6 in later
paragraphs, assuming the example mentioned above with a
segment length of 2048 for inverse filtering and with a
segment length of 64 for calculation of the short-term gain
factors.
Fig. 4b shows a modification of the inventive apparatus for
processing a decorrelated signal 200, where the envelope
shaped wet signal is supplied to a high-pass filter 240 af-

ter the envelope shaping. In a preferred embodiment, the
high-pass filter 224 has the same characteristics as the
high-pass filter 220 deriving the part of the wet signal
202 that is filtered. Then, the high-pass filter 240 en-
sures that any introduced distortion in the decorrelated
signal does not alter the high-pass character of the sig-
nal, thus introducing a miss-match in the summation of the
unprocessed low-pass part of the decorrelated signal and
the processed high-pass part of the signal.
Several important features of the above-outlined implemen-
tation of the present invention should again be emphasized:
the spectral flattening is done by calculating a spec-
tral envelope representation (in this particular exam-
ple by means of LPC) of a time segment significantly
longer than a time segment used for short-time energy
adjustment;
the spectral flattened signal is only used to calcu-
late the energy estimates upon which the gain values
are calculated that are used to estimate and apply the
correct temporal envelope of the decorrelated (wet)
signal;
the mean energy ratio between the wet signal and the
dry signal is maintained, it is only the temporal en-
velope that is modified. Hence, the average of the
gain values G over the signal segment being processed
(i.e. a frame comprising typically 1024 or 2048 sam-
ples) , is approximately equal to one for a majority of
signals.
Fig. 5a shows a more detailed description of an inverse
filter used as first inverse filter 210 and as second in-
verse filter 224 within the inventive apparatus for proc-
essing a decorrelated signal 200. The inverse filter 300
comprises an inverse transformer 302, a first energy calcu-

lator 304, a second energy calculator 306, a gain calcula-
tor 308 and a gain applier 310. The inverse transformer 302
receives filter coefficients 312 (as derived by linear pre-
dictive coding) and the signal X(k) 314 as input. A copy of
the signal 314 is input into the first energy calculator
304. The inverse transformer applies the inverse transfor-
mation based on the filter coefficients 312 to the signal
314 for a signal segment of length 2048. The gain factor G
is set to 1, therefore, a flattened signal 316 (Xflat(z)) is
derived from the input signal 314 according to the follow-
ing formula:

As this inverse filtering does not necessarily preserve the
energy, the long-term energy of the flattened signal has to
be preserved by means of a long term gain factor giong-
Therefore, the signal 214 is input into the first energy
calculator 304 and the flattened signal 316 is input into
the second energy calculator 306, where the energies of the
signal E and of the flattened signal Eflat are computed as
follows:

where the current segment length for spectral envelope es-
timation and inverse filtering is 2048 samples.
Hence, the gain factor glong can be computed by the gain
calculator 308 using the following equation:


By multiplying the flattened signal 316 with the derived
gain factor giong, energy preservation can be assured by
the gain applier 310. To ensure a smooth transition be-
tween neighbouring signal segments, in a preferred embodi-
ment, the gain factor glong is applied to the flattened
signal 316 using a window function. Thus, a jump in the
loudness of the signal can be avoided, which would heavily
disturb the perceptual quality of the audio signal.
The long-term gain factor glong can for example be applied
according to Fig. 5b. Fig. 5 shows a possible window func-
tion in a graph, where the number of samples is drawn on
the x-axis, whereas the gain factor g is plotted on the y-
axis. A window spanning the entire frame of 2048 samples
is used fading out the gain value from the previous frame
319 and fading-in the gain value 320 of the present frame.
Applying inverse filters 300 within the inventive appara-
tus for processing a decorrelated signal 200 assures, that
the signals after the inverse filters are spectrally flat-
tened while the energy of the input signals is furthermore
preserved.
Based on the flattened wet and dry signals, the gain fac-
tor calculation can be performed by the gain calculator
230. This shall be explained in more detail within the
following paragraphs, where a windowing function is addi-
tionally introduced to assure for a smooth transition of
the gain factors used to scale neighbouring signal seg-
ments. In the example shown in Fig. 6, the gain factors
calculated for neighbouring segments are valid for 64 sam-
ples each, wherein they are additionally scaled by a win-
dowing function win(k). The energy within the single seg-
ments are calculated according to the following formulas,
where N denotes the segment number within the long-term
segment used for spectral flattening, i.e. a segment hav-
ing 2048 samples:


Here, win(k) is a window function 322, as shown in Fig. 6
that has, in this example, a length of 64 samples. In
other words, the short-time gain function is calculated
similarly to the gain calculation of the long-term gain
factor glong, albeit over much shorter time segments. The
single gain values GN to be applied to the single short-
time samples are then calculated by the gain calculator
230 according to:

The gain values calculated above are applied to the wet
signal using windowed overlap add segments as outlined in
Fig. 6. In one preferred embodiment of the present inven-
tion the overlap-add windows are 32 samples long at a
44.1kHz sampling rate. In another embodiment a 64 sample
window is used. As previously stated, one of the advanta-
geous features of implementing the present invention in
the time domain, is the freedom of choice of time resolu-
tion of the temporal envelope shaping. The windows out-
lined in Fig. 6 can also be used in module 230 where the
gain values gn-t,gn-....gN . are being calculated.
It may be noted, that given the requirement that the en-
ergy relation between the wet and dry signals should be
maintained over the processed segment as calculated by the
upmix based on the ILD and ICC parameters, it is evident
that an average gain value averaged over the gain values
Sn-t,gn...gN shall be approximately equal to one for a majority
of signals. Hence, returning to the calculation of the
long term gain adjustment, in a different embodiment of
the present invention the gain factor can be calculated as


Hence, the wet and dry signals are normalised, and the
long term energy ratio between the two is approximately
maintained.
Although the examples of the present invention detailed in
the paragraphs above are performing temporal envelope
shaping of a decorrelated signal in the time domain, it is
evident from the derivation of the wet and dry signals
above, that the temporal shaping module can be made to op-
erate as well on the QMF subband signal output of a decor-
relator unit prior to using the decorrelator signal for
the final upmix stage.
This is sketched in Fig. 7a. There, a incoming mono signal
400 is input into a QMF filter bank 402, deriving a sub-
band representation of a monophonic signal 400. Then, in a
signal processing block 4 04, the upmix is performed for
each subband individually. Hence, a final reconstructed
left signal 406 can be provided by a QMF synthesis block
408, and a final reconstructed right channel 410 can be
provided by a QMF synthesis block 412.
An example for a signal processing block 404 is given in
Fig. 7b. The signal processing block 404 is having a
decorrelator 413, an inventive apparatus for processing a
decorrelated signal 414 and an upmixer 415.
A single subband sample 416 is input into the signal proc-
essing block 404. The decorrelator 413 is deriving a
decorrelated sample from the subband sample 416 which is
input into the apparatus for processing a decorrelated
signal 414 (shaper). The shaper 414 is receiving a copy of
the subband sample 416 as a second input. The inventive
shaper 414 is performing the temporal envelope shaping ac-

cording to the present invention and providing a shaped
decorrelated signal to a first input of the upmixer 415
that is additionally receiving the subband sample 416 at a
second input. The upmixer 415 is deriving a left subband
sample 417 and a right subband sample 418 from both the
subband sample 416 and the shaped decorrelated sample.
By integrating multiple signal processing blocks 404 for
different subband samples, left and right subband samples
can be calculated for each subband of a filterbank domain.
In multi-channel implementations, signal procession is
normally done in the QMF domain. It is also clear, given
the above, that the final summation of the decorrelated
signal and the direct version of the signal can be done as
a final stage just prior to forming the actual recon-
structed output signal. Hence, the shaping module can also
be moved to be performed just prior to the addition of the
two signal components, provided that the shaping module
does not change the energy of the decorrelated signal as
stipulated by the ICC and ILD parameters, but only modi-
fies the short-term energies giving the decorrelated sig-
nal a temporal envelope closely matching the direct sig-
nal.
Operating the present invention in the QMF subband domain
prior to upmix and synthesis or operating the present in-
vention in the time-domain, after upmix and synthesis are
two different approaches both having their distinct advan-
tages and disadvantages. The former being the simplest and
requires the least amount of computations albeit limited
to the time-resolution of the filterbank it is operating
in. While the latter requires additional synthesis filter-
banks and therefore additional computational complexity,
it has complete degree of freedom when choosing time reso-
lution.

As already mentioned above, multi-channel decoders mostly
perform the signal processing in the subband domain as
shown in Fig. 8. There, a monophonic downmix signal 420,
that is a downmix of a original 5.1 channel audio signal,
is input into a QMF filterbank 421 that derives the sub-
band representations of the monophonic signal 420. The ac-
tual upmix and signal reconstruction is then performed by
a signal processing block 422 in the subband domain. As
final step, the original 5.1 channel signal, comprising a
left-front channel 424a, a right-front channel 424b, a
left-surround channel 424c, a right-surround channel 424d,
a center channel 424e and a low-frequency enhancement
channel 424f are derived by QMF synthesis.
Fig. 9 shows a further embodiment of the present inven-
tion, where the signal shaping is shifted to the time do-
main, after the processing and the upmixing of a stereo-
phonic signal has been done within the subband domain.
A monophonic input signal 430 is input into a filterbank
432, to derive the multiple subband representations of the
monophonic signal 430. The signal processing and upmixing
of the monophonic signal into 4 signals is done by a sig-
nal processing block 434, deriving subband representations
of a left dry signal 436a, a left wet signal 436b, a right
dry signal 438a and a right wet signal 438b. After a QMF
synthesis 440, a final left signal 442 can be derived from
the left dry signal 436a and the left wet signal 436b us-
ing an inventive apparatus for processing a decorrelated
signal 200, operative in the time domain. In the same way,
a final right signal 444 can be derived from the right dry
signal 438a and the right wet signal 438b.
As mentioned before, the present invention is not limited
to be operated on a time domain signal. The inventive fea-
ture of long-term spectral flattening in combination with
the short-term energy estimation and adjustment can also
be implemented in a subband filterbank. In the previously

shown examples, a QMF filterbank is used, however, it
should be understood that the invention is by no means
limited to this particular filterbank representation. Ac-
cording to the time domain implementation of the present
invention, the signal used for estimation of the temporal
envelope, i.e. the dry signal and the decorrelated signal
going into the processing unit, are high-pass filtered, in
the case of a QMF filterbank representation by means of
setting QMF subbands to 0 in the lower-frequency range.
The following paragraphs exemplify the use of the inven-
tive concept in a QMF subband domain, where m denotes the
subband, i.e. a frequency range of the original signal,
and N denotes the sample number within the subband repre-
sentation, and where the signal subband used for the long-
term spectral flattening comprises N samples.
Now assuming that

where Qdry(m,n) and Qwet(m,n) are the QMF subband matrices
holding the dry and the wet signal, and where Edry(m,n) and
Ewet(m,n) are the corresponding energies for all subband
samples. Here, m denotes the subband, starting at mstart be-
ing chosen to correspond to approx 2kHz, and where n is
the subband sample index running from zero to N, the num-
ber of subband samples within a frame being, which is 32
in one preferred embodiment, corresponding to approx 20ms.
For both energy matrices above the spectral envelope is
calculated as an average over all subband samples in the
frame. This corresponds to the long term spectral enve-
lope.



Furthermore, the mean total energy over the frame is cal-
culated according to:

Based on the equations above, a flattening gain curve can
be calculated for the two matrices:

By applying the gain curve calculated above to the energy
matrices for the wet and dry signal, long term spectrally
flat energy matrices are obtained according to:

The above energy matrices are used to calculate and apply
the temporal envelope of the wet signal using the highest
time resolution available in the QMF domain.

From the above description of the present invention imple-
mented in the subband domain, it is clear that the inven-
tive step of doing the long term spectral whitening in
combination with short term time envelope estimation, or

short time energy estimation/adjustment is not limited to
usage of LPC in the time domain.
In a further embodiment of the present invention, temporal
envelope shaping is used in the subband domain in the fre-
quency direction, to perform the inventive spectral flat-
tening, before applying temporal envelope shaping to the
wet signal.
It is know from prior art that a signal represented in the
frequency domain with low time resolution can be time en-
velope shaped by filtering in the frequency direction of
the frequency representation of the signal. This is used
in perceptual audio codecs to shape introduced quantiza-
tion noise of a signal represented in a long transform [J.
Herre and J. D. Johnston, "Enhancing the performance of
perceptual audio coding by using temporal noise shaping
(TNS)," in 101st AES Convention, Los Angeles, November
1996.].
Assuming a QMF filterbank with 64 channels and a prototype
filter of 640 samples, it is evident that the time resolu-
tion of the QMF subband representation is not as high as
when the temporal shaping is done in the time domain on
windows in the ms range. One way of shaping a signal in
the QMF domain with higher time resolution than natively
available in the QMF, is to do linear prediction in the
frequency direction. Hence, observing the dry signal in
the QMF domain above for a certain QMF slot, i.e. for a
subband sample n ,



is the polynomial obtained using the autocorrelation
method or the covariance method. Again it is important to
note that contrary to LPC in the time-domain, as was out-
lined earlier, the here estimated linear predictor is de-
vised to predict the complex QMF subband samples in the
frequency direction.
In Fig 10, the time/frequency matrix of the QMF is dis-
played. Every column corresponds to a QMF time-slot, i.e.
a subband sample. The rows corresponds to the subbands. As
is indicated in the figure, the estimation and application
of the linear predictor takes place independently within
every column. Furthermore, one column outlined in Fig 10
correspond to one frame being processed. The frame size
over which the whitening gain curves gwet(m) and gdry(m) are
estimated is also indicated in the figure. A frame size of
12 would for example mean processing 12 columns simultane-
ously.
In the previously described embodiment of the present in-
vention, the linear prediction in the frequency direction
is done in a complex QMF representation of the signal.
Again, assuming a QMF filterbank with 64 channels and a
prototype filter of 640 samples, and keeping in mind that
the predictor operates on a complex signal, a very low
order complex predictor is sufficient to track the tempo-
ral envelope of the signal within the QMF slot where the
predictor is applied. A preferred choice is predictor or-
der 1.
The estimated filter Hn corresponds to the temporal enve-
lope of a QMF signal for the specific subband sample, i.e.
a temporal envelope not available by just observing the
subband sample (since only one sample is available). This

sub-sample temporal envelope can be applied to the Qwet
signal by filtering the signal in the frequency direction
through the estimated filter, according to:

where m is the QMF slot, or subband sample, used for pre-
dictor estimation, and undergoing temporal shaping.
Although the wet signal being produced by the decorrelator
has a very flat temporal envelope, it is recommended to
first remove any temporal envelope on the wet signal prior
to applying that of the dry signal. This can be achieved
by doing the same temporal envelope estimation using lin-
ear prediction in the frequency direction as outlined
above, albeit on the wet signal, and using the filter ob-
tained to inverse filter the wet signal, thus removing any
temporal envelope, prior to applying the temporal envelope
of the dry signal.
In order to get an as closely matching temporal envelope
of the wet signal as possible, it is important that the
estimate of the temporal envelope derived by means of the
linear predictor in the frequency direction of the dry
signal is as good as possible. The present invention
teaches that the dry signal should undergo long term spec-
tral flattening prior to the estimation of its temporal
envelope by means of linear prediction. Hence, the previ-
ously calculated gain curve

should be applied to the dry signal used for temporal en-
velope estimation according to:


where n denotes the QMF slots, and m denotes the subband
index. It is evident that the gain correction curve is the
same for all subbands samples within the present frame be-
ing processed. This is obvious since the gain curve corre-
sponds to the required frequency selective gain adjustment

in order to remove the long term spectral envelope. The
obtained complex QMF representation is used for
estimating the temporal envelope filter using linear pre-
diction as outlined above.
The additional time resolution offered by the LPC filter-
ing aims at shaping the wet signal for transient dry sig-
nals. However, due to the use of a limited dataset of one
QMF slot for the LPC estimation there is still a risk that
fine temporal shaping is applied in a chaotic fashion. To
reduce this risk while keeping the performance for tran-
sient dry signals, the LPC estimation can be smoothed over
a few time slots. This smoothing has to take into consid-
eration the evolution over time of the frequency direction
covariance structure of the applied filter bank's analysis
of an isolated transient event. Specifically, in the case
of first order prediction and an oddly stacked complex
modulated filter bank with a total oversampling factor of
two, the smoothing taught by this invention consists of
the following modification on the prediction coefficient
an used in time slot n,

where d≥1 defines the prediction block size in the time
direction.
Fig. 11 shows a transmission system for a 5.1 input chan-
nel configuration, having a 5.1 channel encoder 600 that
downmixes the 6 original channels into a downmix 602 that
can be monophonic or comprise several discrete channels
and additional spatial parameters 604. The downmix 602 is

transmitted to the audio decoder 610 together with the
spatial parameters 604.
The decoder 610 is having one or more inventive appara-
tuses for processing a decorrelated signal to perform an
upmix of the downmix signal 602 including the inventive
temporal shaping of the decorrelated signals. Thus, in
such a transmission system, application of the inventive
concept on a decoder side leads to an improved perceptual
quality of the reconstructed 5.1 channel signal.
The above-described embodiments of the present invention
are merely illustrative for the principles of the present
invention and for methods for improved temporal shaping of
decorrelated signals. It is understood that modifications
and variations of the arrangements and the details de-
scribed herein will be apparent to others skilled in the
art. It is the intent therefore, to be limited only by the
scope of the impending patent claims, but not by the spe-
cific details presented by way of description and explana-
tion of the embodiments herein. It is also understood that
the explanation of the present invention is carried-out by
means of two channels and 5.1 channel examples, while it
is obvious to others skilled in the art that the same
principles apply for arbitrary channel configurations and,
hence, the present invention is not limited to a specific
channel configuration or embodiment with a specific number
of in-/output channels. The present invention is applica-
ble to any multi-channel reconstruction that utilises a
decorrelated version of a signal and, hence, it is fur-
thermore evident to those skilled in the art that the in-
vention is not limited to the particular way of doing
multi-channel reconstruction used in the exemplifications
above.
In short, the present invention primarily relates to
multi-channel reconstruction of audio signals based on an
available downmix signal and additional control data. Spa-

tial parameters are extracted on the encoder side repre-
senting the multi-channel characteristics given a downmix
of the original channels. The downmix signal and the spa-
tial representation is used in a decoder to recreate a
close resembling representation of the original multi-
channel signal, by means of distributing a combination of
the downmix signal and a decorrelated version of the same
to the channels being reconstructed. The invention is ap-
plicable in systems where a backwards compatible downmix
signal is desirable, such as stereo digital radio trans-
mission (DAB, XM satellite radio etc), but also to systems
that require a very compact representation of the multi-
channel signal.
The flattening of the spectrum was performed by inverse
filtering based on filter coefficients derived by LPC
analysis in the examples described above. It is understood
that any further operation yielding a signal with a flat-
tened spectrum is suited to be implemented to build a fur-
ther embodiment of the present invention. The application
would result in a reconstructed signal having the same ad-
vantageous properties.
Within a multi-channel audio decoder the place in the sig-
nal path, where the present invention is applied, is ir-
relevant for the inventive concept of improving the per-
ceptual quality of a reconstructed audio signal using an
inventive apparatus for processing a decorrelated signal.
Although, in a preferred embodiment, only a high-pass fil-
tered part of the wet signal is envelope-shaped according
to the present invention, the present invention may also
be applied on a wet signal having the full spectrum.
The windowing functions, used to apply gain corrections to
the long-term spectrally flattened signals as well as to
the short-term envelope shaping gain factors are to be un-
derstood as examples only. It is evident, that other win-

dow functions may be used that allow for a smooth transi-
tion of gain functions between neighbouring segments of
the signal to be processed.
Depending on certain implementation requirements of the
inventive methods, the inventive methods can be imple-
mented in hardware or in software. The implementation can
be performed using a digital storage medium, in particular
a disk, DVD or a CD having electronically readable control
signals stored thereon, which cooperate with a programma-
ble computer system such that the inventive methods are
performed. Generally, the present invention is, therefore,
a computer program product with a program code stored on a
machine readable carrier, the program code being operative
for performing the inventive methods when the computer
program product runs on a computer. In other words, the
inventive methods are, therefore, a computer program hav-
ing a program code for performing at least one of the in-
ventive methods when the computer program runs on a com-
puter.
While the foregoing has been particularly shown and de-
scribed with reference to particular embodiments thereof,
it will be understood by those skilled in the art that
various other changes in the form and details may be made
without departing from the spirit and scope thereof. It is
to be understood that various changes may be made in adapt-
ing to different embodiments without departing from the
broader concepts disclosed herein and comprehended by the
claims that follow.

We Claim:
1. Apparatus (200) for processing a decorrelated signal (202) derived from
an original signal (204) or a combination signal derived by combining the
original signal (204) and the decorrelated signal (202), comprising:
a spectral flattener (206, 208, 210, 220, 222, 224) for spectral flattening
the decorrelated signal (202), a signal derived from the decorrelated
signal (202) or the combination signal to obtain a first flattened signal,
and for spectral flattening the original signal (204) or a signal derived
from the original signal (204) to obtain a second flattened signal, the
spectral flattener (206, 208, 210, 220, 222, 224) being operative such
that a flattened signal has a flatter spectrum than a corresponding signal
before flattening;
characterized by
a time envelope shaper (232) for time envelope shaping the decorrelated
signal or the combination signal and using information on the first
flattened signal and the second flattened signal.

2. Apparatus in accordance with claim 1, in which the time envelope shaper
(232) is operative to shape the time envelope of the decorrelated signal
or the combination signal using a gain factor.
3. Apparatus in accordance with claim 1 or 2, in which the time envelope
shaper (232) is operative to shape the time envelope of the decorrelated
signal or the combination signal using a gain factor derived by comparing
(230) the energies comprised within corresponding portions of the first
flattened signal and the second flattened signal.
4. Apparatus in accordance with claims 1 to 3, in which the spectral flattener
(206, 208, 210, 220, 222, 224) is operative to derive the second flattened
signal from the original signal (204).
5. Apparatus in accordance with claims 1 to 3, in which the spectral
flattener (206, 208, 210, 220, 222, 224) is operative to derive the second
flattened signal from the signal derived from the original signal.
6. Apparatus in accordance with claim 1, in which the spectral flattener
(206, 208, 210, 220, 222, 224) is operative to flatten a first portion of the
decorrelated signal (202) or the combination signal; and

in which the time envelope shaper is operative to shape a second portion
of the decorrelated signal (202) or the combined signal, wherein the
second portion is included in the first portion.
7. Apparatus in accordance with claim 6, in which the size of the first portion
is more than 10 times the size of the second portion.
8. Apparatus in accordance with claim 1 to 7, in which the spectral flattener
(206, 208, 210, 220, 222, 224) is operative to flatten the spectrum by
means of filtering (210, 224) using filter coefficients derived by linear
predictive coding (208, 222).
9. Apparatus in accordance with claim 8, in which the spectral flattener
(206, 208, 210, 220, 222, 224) is operative to flatten the spectrum by
means of filtering using filtering coefficients derived using linear
prediction in the time direction (208, 222).
10. Apparatus in accordance with claim 1, in which the spectral flattener
(206, 208, 210, 220, 222, 224) is operative to obtain a spectrally
flattened representation of a signal in the time domain.

11. Apparatus in accordance with claim 1, in which the spectral flattener
(206, 208, 210, 220, 222, 224) is operative to obtain a spectrally
flattened representation of a signal in a subband domain.
12. Apparatus in accordance with claim 1, in which the spectral flattener
(206, 208, 210, 220, 222, 224) and the time envelope shaper (232) are
operative to process all frequencies of a full spectrum decorrelated signal
that are above a given frequency threshold.
13. Method for processing a decorrelated signal (202) derived from an
original signal (204) or a combination signal derived by combining the
original signal (204) and the decorrelated signal (202), the method
comprising:
spectrally flattening (206, 208, 210, 220, 222, 224) the decorrelated
signal (202), a signal derived from the decorrelated signal (202) or the
combination signal to obtain a first flattened signal, and spectrally
flattening the original signal (204) or a signal derived from the original
signal (204) to obtain a second flattened signal, a flattened signal having
a flatter spectrum than a corresponding signal before flattening;

characterized by
time envelope shaping (232) the decorrelated signal (202) or the
combination signal using information on the first flattened signal and
second flattened signal.
14. Spatial audio decoder, characterized by:
an input interface for receiving an original signal derived from a multi
channel signal having at least two channels and for receiving spatial
parameters describing an interrelation between a first channel and a
second channel of the multi channel signal;
a decorrelator (112) for deriving a decorrelated signal (202) from the
original signal (204) using the spatial parameters;
a spectral flattener (206, 208, 210, 220, 222, 224) for spectral flattening
the decorrelated signal (202), a signal derived from the decorrelated
signal (202) or a combination signal derived by combining the original
signal (204) and the decorrelated signal (202) to obtain a first flattened
signal, and for spectral flattening the original signal (204) or a signal
derived from the original signal (204) to obtain a second flattened signal,
the spectral flattener being operative such that a flattened signal has a
flatter spectrum than a corresponding signal before flattening; and

a time envelope shaper (232) for time envelope shaping the decorrelated
signal (202) or the combination signal and using information on the first
flattened signal and the second flattened signal.
15. Receiver or audio player, having an apparatus for processing a
decorrelated signal in accordance with claim 1.
16. Method of receiving or audio playing, the method having a method for
processing a decorrelated signal in accordance with claim 13.



ABSTRACT


TITLE: APPARATUS AND METHOD FOR ENVELOPE SHAPING A DECORRELATED
SIGNAL
The invention relates to an apparatus (200) for processing a decorrelated signal
(202) derived from an original signal (204) or a combination signal derived by
combining the original signal (204) and the decorrelated signal (202),
comprising: a spectral flattener (206, 208, 210, 220, 222, 224) for spectral
flattening the decorrelated signal (202), a signal derived from the decorrelated
signal (202) or the combination signal to obtain a first flattened signal, and for
spectral flattening the original signal (204) or a signal derived from the original
signal (204) to obtain a second flattened signal, the spectral flattener (206, 208,
210, 220, 222, 224) being operative such that a flattened signal has a flatter
spectrum than a corresponding signal before flattening; a time envelope shaper
(232) for time envelope shaping the decorrelated signal or the combination
signal and using information on the first flattened signal and the second
flattened signal.

Documents:

02857-kolnp-2007-abstract 1.1.pdf

02857-kolnp-2007-abstract.pdf

02857-kolnp-2007-claims 1.1.pdf

02857-kolnp-2007-claims.pdf

02857-kolnp-2007-correspondence others 1.1.pdf

02857-kolnp-2007-correspondence others 1.2.pdf

02857-kolnp-2007-correspondence others 1.3.pdf

02857-kolnp-2007-correspondence others.pdf

02857-kolnp-2007-description complete.pdf

02857-kolnp-2007-drawings.pdf

02857-kolnp-2007-form 1 1.1.pdf

02857-kolnp-2007-form 1.pdf

02857-kolnp-2007-form 13.pdf

02857-kolnp-2007-form 18.pdf

02857-kolnp-2007-form 2.pdf

02857-kolnp-2007-form 3.pdf

02857-kolnp-2007-form 5.pdf

02857-kolnp-2007-international publication.pdf

02857-kolnp-2007-international search report.pdf

02857-kolnp-2007-pct request form.pdf

02857-kolnp-2007-priority document.pdf

2857-KOLNP-2007-(21-05-2012)-DRAWINGS.pdf

2857-KOLNP-2007-(21-05-2012)-EXAMINATION REPORT REPLY RECEIVED.pdf

2857-KOLNP-2007-(21-05-2012)-FORM-1.pdf

2857-KOLNP-2007-(21-05-2012)-FORM-2.pdf

2857-KOLNP-2007-(21-05-2012)-OTHERS.pdf

2857-KOLNP-2007-(24-08-2012)-ABSTRACT.pdf

2857-KOLNP-2007-(24-08-2012)-AMANDED CLAIMS.pdf

2857-KOLNP-2007-(24-08-2012)-ANNEXURE TO FORM 3.pdf

2857-KOLNP-2007-(24-08-2012)-CORRESPONDENCE.pdf

2857-KOLNP-2007-(24-08-2012)-DESCRIPTION (COMPLETE).pdf

2857-KOLNP-2007-(24-08-2012)-DRAWINGS.pdf

2857-KOLNP-2007-(24-08-2012)-FORM-1.pdf

2857-KOLNP-2007-(24-08-2012)-FORM-2.pdf

2857-KOLNP-2007-(24-08-2012)-FORM-5.pdf

2857-KOLNP-2007-(24-08-2012)-OTHERS.pdf

2857-KOLNP-2007-(24-08-2012)-PETITION UNDER RULE 137.pdf

2857-KOLNP-2007-(25-03-2013)-ABSTRACT.pdf

2857-KOLNP-2007-(25-03-2013)-CLAIMS.pdf

2857-KOLNP-2007-(25-03-2013)-CORRESPONDENCE.pdf

2857-KOLNP-2007-(25-03-2013)-FORM 1.pdf

2857-KOLNP-2007-(25-03-2013)-FORM 2.pdf

2857-KOLNP-2007-(30-03-2012)-CERTIFIED COPIES(OTHER COUNTRIES).pdf

2857-KOLNP-2007-(30-03-2012)-CORRESPONDENCE.pdf

2857-KOLNP-2007-(30-03-2012)-FORM-13-1.pdf

2857-KOLNP-2007-(30-03-2012)-FORM-13.pdf

2857-KOLNP-2007-(30-03-2012)-PA-CERTIFIED COPIES.pdf

2857-KOLNP-2007-CANCELLED PAGES.pdf

2857-KOLNP-2007-CORRESPONDENCE 1.5.pdf

2857-KOLNP-2007-CORRESPONDENCE OTHERS 1.4.pdf

2857-KOLNP-2007-CORRESPONDENCE OTHERS 1.5.pdf

2857-KOLNP-2007-CORRESPONDENCE.pdf

2857-KOLNP-2007-EXAMINATION REPORT.pdf

2857-KOLNP-2007-FORM 13.pdf

2857-KOLNP-2007-FORM 18.pdf

2857-KOLNP-2007-FORM 26.pdf

2857-KOLNP-2007-GPA.pdf

2857-KOLNP-2007-GRANTED-ABSTRACT.pdf

2857-KOLNP-2007-GRANTED-CLAIMS.pdf

2857-KOLNP-2007-GRANTED-DESCRIPTION (COMPLETE).pdf

2857-KOLNP-2007-GRANTED-DRAWINGS.pdf

2857-KOLNP-2007-GRANTED-FORM 1.pdf

2857-KOLNP-2007-GRANTED-FORM 2.pdf

2857-KOLNP-2007-GRANTED-FORM 3.pdf

2857-KOLNP-2007-GRANTED-FORM 5.pdf

2857-KOLNP-2007-GRANTED-SPECIFICATION-COMPLETE.pdf

2857-KOLNP-2007-INTERNATIONAL PUBLICATION.pdf

2857-KOLNP-2007-INTERNATIONAL SEARCH REPORT & OTHERS.pdf

2857-KOLNP-2007-OTHERS-1.1.pdf

2857-KOLNP-2007-OTHERS.pdf

2857-KOLNP-2007-PA.pdf

2857-KOLNP-2007-REPLY TO EXAMINATION REPORT.pdf

abstract-02857-kolnp-2007.jpg


Patent Number 257035
Indian Patent Application Number 2857/KOLNP/2007
PG Journal Number 35/2013
Publication Date 30-Aug-2013
Grant Date 28-Aug-2013
Date of Filing 06-Aug-2007
Name of Patentee FRAUNHOFER-GESELLSCHAFT ZUR FORDERUNG DER ANGEWANDTEN FORSCHUNG E.V
Applicant Address HANSASTRASSE 27C 80686 MUNCHEN
Inventors:
# Inventor's Name Inventor's Address
1 KRISTOFER KJOERLING LOSTIGEN 10, 170 75 SOLNA, SWEDEN
2 JUERGEN HERRE HALLERSTRASSE 24 91054 BUCKENHOF GERMANY
3 SASCHA DISCH TURNSTRASSE 7 90763 FUERTH, GERMANY
4 LARS VILLEMOES MANDOLINVAEGEN 22 17556 JAERFAELLA SWEDEN
PCT International Classification Number H04S 3/00
PCT International Application Number PCT/EP2006/003097
PCT International Filing date 2006-04-05
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 60/671,583 2005-04-15 U.S.A.