Title of Invention	" AN ENCODER FOR GENERATING A PARAMETRIC REPRESENTATION OF AN AUDIO SIGNAL"
Abstract	Title: " An encoder for generating a parametric representation of an audio signal" The invention relates to an encoder for generating a parametric representation (314,316,318,320) of an audio signal having at least two original left channels (224a,224b) on a left side and two original right channels (224c,224d) on a right side with respect to a listening position, comprising a generator (220) for generating parametric information, the generator being operative to separately process several pairs of channels to derive a level information (230a,230b) for processed channel pairs, and to derive coherence information (232a,232b) for a channel pair including a first channel (228a) only having information from the left side and a second channel (228b) only having information from the right side; and a provider (222) for providing the parametric representation (238,314,316,318,320) by selecting the level information (230a,230b) for channel pairs and by determining a left/right coherence measure (236) using the coherence information (232a,232b) and to introduce the left/right coherence measure (236) into an output datastream as the only coherence information of the audio signal within the parametric representation (238;314,316,318,320).

Title of Invention

" AN ENCODER FOR GENERATING A PARAMETRIC REPRESENTATION OF AN AUDIO SIGNAL"

Abstract

Title: " An encoder for generating a parametric representation of an audio signal" The invention relates to an encoder for generating a parametric representation (314,316,318,320) of an audio signal having at least two original left channels (224a,224b) on a left side and two original right channels (224c,224d) on a right side with respect to a listening position, comprising a generator (220) for generating parametric information, the generator being operative to separately process several pairs of channels to derive a level information (230a,230b) for processed channel pairs, and to derive coherence information (232a,232b) for a channel pair including a first channel (228a) only having information from the left side and a second channel (228b) only having information from the right side; and a provider (222) for providing the parametric representation (238,314,316,318,320) by selecting the level information (230a,230b) for channel pairs and by determining a left/right coherence measure (236) using the coherence information (232a,232b) and to introduce the left/right coherence measure (236) into an output datastream as the only coherence information of the audio signal within the parametric representation (238;314,316,318,320).

Full Text	Multi-Channel Hierarchical Audio Coding with Compact Side- Information Field of the invention The present invention relates to multi-channel audio proc- essing and, in particular, to the generation and the use of compact parametric side information to describe the spatial properties of a multi-channel audio signal. Background of the invention and prior art In recent times, the multi-channel audio reproduction tech- nique is becoming more and more important. This may be due to the fact that audio compression/encoding techniques such as the wel1-known mp3 technique have made it possible to distribute audio records via the Internet or other trans- mission channels having a limited bandwidth. The mp3 coding technique has become so famous because of the fact that it allows distribution of all the records in a stereo format, i.e., a digital representation of the audio record includ- ing a first or left stereo channel and a second or right stereo channel. Nevertheless, there are basic shortcomings of conventional two-channel sound systems. Therefore, the surround tech- nique has been developed. A recommended multi-channe1- surround presentation format includes, in addition to two stereo channels L and R, an additional center channel C and two surround channels Ls, Rs. This reference sound format is also referred to as three/two-stereo, which means three front channels and two surround channels. In a playback en- vironment, at least five speakers at five appropriate loca- tions are needed to get an optimum sweet spot in a certain distance of the five wel1-placed loudspeakers. Recent approaches for the parametric coding of multi- channel audio signals (parametric stereo (PS), "spatial au- dio coding", "binaural cue coding" (BCC) etc.) represent a multi-channel audio signal by means of a downmix signal (could be monophonic or comprise several channels) and pa- rametric side information ("spatial cues"), characterizing its perceived spatial sound stage. The different approaches and techniques shall be reviewed shortly in the following paragraphs. A related technique, also known as parametric stereo, is described in J. Breebaart, S. van de Par, A. Kohlrausch, E. Schuijers, "High-Quality Parametric Spatial Audio Coding at Low Bitrates", AES 116th Convention, Berlin, Preprint 6072, May 2004, and E. Schuijers, J. Breebaart, H. Purnhagen, J. Engdegard, "Low Complexity Parametric Stereo Coding", AES 116th Convention, Berlin, Preprint 6073, May 2004. Several techniques are known in the art for reducing the amount of data required for transmission of a multi-channel audio signal. To this end, reference is made to Fig. 11, which shows a joint stereo device 60. This device can be a device implementing e.g. intensity stereo (IS) or binaural cue coding (BCC). Such a device generally receives - as an input - at least two channels (CH1, CH2, ... CHn) , and out- puts a single carrier channel and parametric data. The pa- rametric data are defined such that, in a decoder, an ap- proximation of an original channel (CH1, CH2, ... CHn) can be calculated. Normally, the carrier channel will include subband samples, spectral coefficients, time domain samples etc., which pro- vide a comparatively fine representation of the underlying signal, while the parametric data does not include such samples of spectral coefficients but include control pa- rameters for controlling a certain reconstruction algorithm such as weighting by multiplication, time shifting, fre- quency shifting, phase shifting, etc. The parametric data, therefore, includes only a comparatively coarse representa- tion of the signal or the associated channel. Stated in numbers, the amount of data required by a carrier channel can be in the range of 60 - 70 kbit/s in an MPEG coding scheme, while the amount of data required by parametric side information for one channel may be in the range of about 10 kbit/s for a 5.1 channel signal. An example for parametric data are the wel1-known scale factors, intensity stereo information or binaural cue parameters as will be described below. The BCC Technique is for example described in the AES con- vention paper 5574, "Binaural Cue Coding applied to Stereo and Multi-Channel Audio Compression", C. Faller, F. Baumgarte, May 2002, Munich, in the IEEE WASPAA Paper "Efficient representation of spatial audio using perceptual parametrization", October 2001, Mohonk, NY, and in the 2 ICASSP Papers "Estimation of. auditory spatial cues for binaural cue coding", and "Binaural cue coding: a novel and efficient representation of spatial audio", both authored by C. Faller, and F. Baumgarte, Orlando, FL, May 2002. In BCC encoding, a number of audio input channels are con- verted to a spectral representation using a DFT (Discrete Fourier Transform) based transform with overlapping win- dows. The resulting spectrum is divided into non- overlapping partitions. Each partition has a bandwidth pro- portional to the equivalent rectangular bandwidth (ERB). The inter-channel level differences (ICLD) and the inter- channel time differences (ICTD) are estimated for each par- tition. The inter-channel level differences ICLD and inter- channel time differences ICTD are normally given for each channel with respect to a reference channel and furthermore quantized. The transmitted parameters are finally calcu- lated in accordance with prescribed formulae (encoded) , which may depend on the specific partitions of the signal to be processed. At a decoder-side, the decoder receives a mono signal and the BCC bit stream. The mono signal is transformed into the frequency domain and input into a spatial synthesis block, which also receives decoded ICLD and ICTD values. In the spatial synthesis block, the BCC parameters (ICLD and ICTD) values are used to perform a weighting operation of the mono signal in order to synthesize the multi-channel sig- nals, which, after a frequency/time conversion, represent a reconstruction of the original multi-channel audio signal. In case of BCC, the joint stereo module 60 is operative to output the channel side information such that the paramet- ric channel data are quantized and encoded resulting in ICLD or ICTD parameters, wherein one of the original chan- nels is used as the reference channel while coding the channel side information. Normally, the carrier channel is formed of the sum of the participating original channels. Therefore, the above techniques additionally provide a suitable mono representation for playback equipment that can only process the carrier channel and is not able to process the parametric data for generating one or more ap- proximations of more than one input channel. The audio coding technique known as binaural cue coding (BCC) is also well described in the United States patent application publications US 2003, 0219130 Al, 2003/0026441 Al and 2003/0035553 A1. Additional reference is also made to "Binaural Cue Coding. Part II: Schemes and Applica- tions", C. Faller and F. Baumgarte, IEEE Trans, on Audio and Speech Proc, Vol. 11, No. 6, Nov. 2003 and to "Binau- ral cue coding applied to audio compression with flexible rendering", C. Faller and F. Baumgarte, AES 113th Conven- tion, Los Angeles, October 2002. The cited United States patent application publications and the two cited technical publications on the BCC technique authored by Faller and Baumgarte are incorporated herein by reference in their en- tireties. Although ICLD and ICTD parameters represent the most impor- tant sound source localization parameters, a spatial repre- sentation using these parameters only limits the maximum quality that can be achieved. To overcome this limitation, and hence to enable high-quality parametric coding, Parametric stereo (as described in J. Breebaart, S. van de Par, A. Kohlrausch, E. Schuijers (2005) "Parametric coding of stereo audio", Eurasip J. Applied Signal Proc. 9, 1305- 1322) applies three types of spatial parameters, referred to as Interchannel Intensity Differences (IIDs), Interchan- nel Phase Differences (IPDs), and Interchannel Coherence (IC) . The extension of the spatial parameter set with co- herence parameters enables a parameterization of the per- ceived spatial Miffuseness' or spatial ^compactness' of the sound stage. In the following, a typical generic BCC scheme for multi- channel audio coding is elaborated in more detail with ref- erence to Figures 12 to 14. Figure 9 shows such a generic binaural cue coding scheme for coding/transmission of multi-channel audio signals. The multi-channel audio input signal at an input 110 of a BCC encoder 112 is downmixed in a downmix block 114. In the present example, the original multi-channel signal at the input 110 is a 5-channel sur- round signal having a front left channel, a front right channel, a left surround channel, a right surround channel and a center channel. In a preferred embodiment of the pre- sent invention, the downmix block 114 produces a sum signal by a simple addition of these five channels into a mono signal. Other downmixing schemes are known in the art such that, using a multi-channel input signal, a downmix signal having a single channel can be obtained. This single chan- nel is output at a sum signal line 115. A side information obtained by a BCC analysis block 116 is output at a side information line 117. in the BCC analysis block, inter- channel level differences (ICLD), and inter-channel time differences (ICTD) are calculated as has been outlined above. The BCC analysis block 116 is formed to also calcu- late inter-channel correlation values (ICC values). The sum signal and the side information is transmitted, preferably in a quantized and encoded form, to a BCC decoder 120. The BCC decoder decomposes the transmitted sum signal into a number of subbands and applies scaling, delays and other processing to generate the subbands of the output multi- channel audio signals. This processing is performed such that ICLD, ICTD and ICC parameters (cues) of a recon- structed multi-channel signal at an output 121 are similar to the respective cues for the original multi-channel sig- nal at the input 110 of the BCC encoder 112. To this end, the BCC decoder 120 includes a BCC synthesis block 122 and a side information processing block 123. In the following, the internal construction of the BCC syn- thesis block 122 is explained with reference to Fig. 13. The sum signal on line 115 is input into a time/frequency conversion unit or filter bank FB 125. At the output of block 125, a number N of sub band signals are present, or, in an extreme case, a block of spectral coefficients, when the audio filter bank 125 performs a 1:1 transform, i.e., a transform which produces N spectral coefficients from N time domain samples (critical subsampling). The BCC synthesis block 122 further comprises a delay stage 126, a level modification stage 127, a correlation process- ing stage 128 and an inverse filter bank stage IFB 129. At the output of stage 129, the reconstructed multi-channel audio signal having for example five channels in case of a 5-channel surround system, can be output to a set of loud- speakers 124 as illustrated in Fig. 12. As shown in Fig. 13, the input signal s(n) is converted into the frequency domain or filter bank domain by means of element 125. The signal output by element 125 is multiplied such that several versions of the same signal are obtained as illustrated by branching node 130. The number of ver- sions of the original signal is equal to the number of out- put channels in the output signal to be reconstructed. When, in general, each version of the original signal at node 130 is subjected to a certain delay d1, d2, ..., d1, ..., dN. The delay parameters are computed by the side informa- tion processing block 123 in Fig. 12 and are derived from the inter-channel time differences as determined by the BCC analysis block 116. The same is true for the multiplication parameters a1, a2, ..., ai., ..., aN, which are also calculated by the side infor- mation processing block 123 based on the inter-channel level differences as calculated by the BCC analysis block 116. The ICC parameters calculated by the BCC analysis block 116 are used for controlling the functionality of block 128 such that certain correlations between the delayed and leve1-manipulated signals are obtained at the outputs of block 128. It is to be noted here that the ordering of the stages 126, 127, 128 may be different from the case shown in Fig. 13. One should be aware that, in a frame-wise processing of an audio signal, the BCC analysis is also performed frame- wise, i.e. time-varying, and also frequency-wise. This means that, for each spectral band, the BCC parameters are obtained individually. This further means that, in case the audio filter bank 125 decomposes the input signal into for example 32 band pass signals, the BCC analysis block ob- tains a set of BCC parameters for each of the 32 bands. Naturally the BCC synthesis block 122 from Fig. 12, which is shown in detail in Fig. 13, performs a reconstruction, which is also based on the 32 bands in the example. In the following, reference is made to Fig. 14 showing a setup to determine certain BCC parameters. Normally, ICLD, ICTD and ICC parameters can be defined between arbitrary pairs of channels. One method, that will be outlined here, consists of ICLD and ICTD parameters between a reference channel and each other channel. This is illustrated in Fig. 14A. ICC parameters can be defined in different ways. Most gen- erally, one could estimate ICC parameters in the encoder between all possible channel pairs as indicated in Fig. 14B. In this case, a decoder would synthesize ICC such that it is approximately the same as in the original multi- channel signal between all possible channel pairs. It was, however, proposed to estimate only ICC parameters between the strongest two channels at a time. This scheme is illus- trated in Fig. 14C, where an example is shown, in which at one time instance, an ICC parameter is estimated between channels 1 and 2, and, at another time instance, an ICC pa- rameter is calculated between channels 1 and 5. The decoder then synthesizes the inter-channel correlation between the strongest channels in the decoder and applies some heuris- tic rule for computing and synthesizing the inter-channel coherence for the remaining channel pairs. Regarding the calculation of, for example, the multiplica- tion parameters a1, ...,aN based on transmitted ICLD parame- ters, reference is made to AES convention paper 5574 cited above. The ICLD parameters represent an energy distribution in an original multi-channel signal. Without loss of gener- ality, it is shown in Fig. 14A that there are four ICLD pa- rameters showing the energy difference between all other channels and the front left channel. In the side informa- tion processing block 123, the multiplication parameters a1, ..., aN are derived from the ICLD parameters such that the total energy of all reconstructed output channels is the same as (or proportional to) the energy of the transmitted sum signal. A simple way for determining these parameters is a 2-stage process, in which, in a first stage, the mul- tiplication factor for the left front channel is set to unity, while multiplication factors for the other channels in Fig. 14A are determined from the transmitted ICLD val- ues. Then, in a second stage, the energy of all five chan- nels is calculated and compared to the energy of the trans- mitted sum signal. Then, all channels are downscaled using a downscaling factor which is equal for all channels, wherein the downscaling factor is selected such that the total energy of all reconstructed output channels is, after downscaling, equal to the total energy of the transmitted sum signal. Naturally, there are also other methods for calculating the multiplication factors, which do not rely on the 2-stage process but which only need a 1-stage process. Regarding the delay parameters, it is to be noted that the delay parameters ICTD, which are transmitted from a BCC en- coder can be used directly, when the delay parameter d1 for the left front channel is set to zero. No rescaling has to be done here, since a delay does not alter the energy of the signal. As has been outlined above with respect to Fig. 14, the pa- rametric side information, i.e., the interchannel level differences (ICLD), the interchannel time differences (ICTD) or the interchannel coherence parameter (ICC) can be calculated and transmitted for each of the five channels. This means that one, normally, transmits four sets of in- terchannel level differences for a five channel signal. The same is true for the interchannel time differences. With respect to the interchannel coherence parameter, it can also be sufficient to only transmit for example two sets of these parameters. As has been outlined above with respect to Fig. 13, there is not a single level difference parameter, time difference parameter or coherence parameter for one frame or time por- tion of a signal. Instead, these parameters are determined for several different frequency bands so that a frequency- dependent parametrization is obtained. Since it is pre- ferred to use for example 32 frequency channels, i.e., a filter bank having 32 frequency bands for BCC analysis and BCC synthesis, the parameters can occupy quite a lot of data. Although - compared to other multi-channel transmis- sions - the parametric representation results in a quite low data rate, there is a continuing need for further re- duction of the necessary data rate to represent a signal having more than two channels such as a multi-channel sur- round signal. The encoding of a multi-channel audio signal can be advan- tageously implemented using several existing modules, which perform a parametric stereo coding into a single mono- channel. The international patent application WO2004008805 A1 teaches how parametric stereo coders can be ordered in a hierarchical set-up such, that a given number of input audio channels are subsequently downmixed into one single mono-channel. The parametric side information, de- scribing the spatial properties of the downmix mono- channel, finally consists of all the parametric information subsequently produced during the iterative downmixing pro- cess. This means, that, if there are, for example, three stereo-to-mono downmixing processes involved in building the final mono signal, the final set of parameters building the parametric representation of the multi-channel audio signal consists of the three sets of the parameters derived during every single stereo-to-mono downmixing process. A hierarchical downmixing encoder is shown in Fig. 15, to explain the method of the prior art in more detail. Fig. 15 shows six original audio channels 200a to 200f that are transformed into a single monophonic audio channel 202 plus parametric side information. Therefore, the six original audio channels 200a to 200f have to be transformed from the time domain into the frequency domain, which is performed by transforming units 204, transforming the audio chan- nels 200a to 200f into the corresponding channels 206a to 206f in the frequency domain. Following the hierarchical approach, the channels 206a to 206f are pair-wise downmixed into three monophonic channels L, R and C (208a, 208b and 208c, respectively). During the downmixing of the three pairs of channels a parameter set is derived for each chan- nel pair, describing the spatial properties of the original stereophonic signal, downmixed into a monophonic signal. Thus, in this first downmixing step, three parameter sets 210a to 210c are generated to preserve the spatial informa- tion of the signals 206a to 206f. In the next step of the hierarchical downmixing, chan- nels 208a and 208b are downmixed into a channel 212 (LR) , generating a parameter set 210d (parameter set 4. To fi- nally derive only one single monophonic channel, a downmix- ing of the channels 208c and 212 is necessary, resulting in channel 214 (M) . This generates a fifth parameter set 210e (parameter set 5). Finally, the downmixed monophonic audio signal 214 is inversely transformed into the time domain to derive an audio signal 202 that can be played by standard equipment. As described above, a parametric representation of the downmix audio signal 202 according to the prior art con- sists of all the parameter sets 210a to 210e, which means that if one wants to rebuild the original multi-channel au- dio signal (channels 200a to 200f) from the monophonic au- dio signal 202, all the parameter sets 210a to 210e are re- quired as side information of the monophonic downmix signal 202. The US-patent application 11/032,689 (from here only re- ferred to as "prior art cue combination") describes a proc- ess for combining several cue values into a single trans- mitted one in order to save side information in a non- hierarchical coding scheme. To do so, all the channels are downmixed first and the cue codes are later on combined to form transmitted cue values (could also be one single value), the combination being dependent on a predefined mathematical function, in which the spatial parameters, that are derived directly from the input signals, are put in as variables. State-of-the-art techniques for the parametric coding of two ("stereo") or more ("multi-channel") audio input chan- nels derive the spatial parameters directly from the input signals. Examples of such parameters are inter-channel level differences (ICLD) or inter-channel intensity differ- ences (IID), inter-channel time delay (ICTD) or inter- channel phase differences (IPD), and inter-channel correla- tion/coherence (ICC), each of which are transmitted in a frequency-selective fashion, i.e. per frequency band. The application of the prior art cue combination teaches that several cue values can be combined to a single value that is transmitted from the encoder to the decoder side. The decoding process uses the transmitted single value instead of the originally individually transmitted cue values to reconstruct the multi-channel output signal. In a preferred embodiment, this scheme has been applied to the ICC parame- ters. It has been shown that this leads to a considerable reduction in the size of the cue side information while preserving the spatial quality of the vast majority of sig- nals. It is, however, not clear how this can be exploited in a hierarchical coding scheme. The patent application on prior art cue combination has de- tailed the principle of the invention by an example for a system based on two transmitted downmix channels. In the proposed method, with reference to figure 15, ICC values of Lf/Lr and Rf/Rr channel pairs are combined into a single transmitted ICC parameter. The two combined ICC values have been obtained during the downmixing of a front-left channel Lf and a rear-left channel Lr into the channel L and during the downmixing of a front-right Rf and a rear-right channel Rr into the channel R. Therefore, the two combined ICC val- ues that are finally being combined into the single trans- mitted ICC parameter, both carry information about the front/back correlation of the original channels and a com- bination of these two ICC values will generally preserve most of this information. If one would have to further downmix the L and R channels into one single mono channel, one would get a third ICC value, carrying information about the left/right correlation of the downmix channels L and R. According to the cue combination of prior art, one would now have to combine the three ICC values applying a given function transforming the three ICC values into one transmitted ICC parameter. One has the problem then that front/back information mixes with left/right information, which is obviously disadvanta- geous for a reproduction of the original multi-channel au- dio signal. In the US-application 11/032,689, this is avoided by transmitting two downmix channels, the L and R channels, that hold the left/right information, and addi- tionally transmitting one single ICC value, holding front/back information. This preserves the spatial proper- ties of the original channels at the cost of a substan- tially increased data rate, resulting from the full addi- tional downmix channel to be transmitted. Summary of the invention It is the object of the present invention to provide an im- proved concept to generate and to use a parametric repre- sentation of a multi-channel audio signal with compact side information in the context of a hierarchical coding scheme In accordance with the first aspect of the present inven- tion, this object is achieved by an encoder for generating a parametric representation of an audio signal having at least two original left channels on a left side and two original right channels on a right side with respect to a listening position, comprising: a generator for generating parametric information, the generator being operative to separately process several pairs of channels to derive a level information for processed channel pairs, and to de- rive coherence information for a channel pair including a first channel only having information from the left side and a second channel only having information from the right side, and a provider for providing the parametric represen- tation by selecting the level information for channel pairs and determining a left/right coherence measure using the coherence information. In accordance with a second aspect of the present inven- tion, this object is achieved by a decoder for processing a parametric representation of an original audio signal, the original audio signal having at least two original left channels on a left side and at least two original right channels on a right side with respect to a listening posi- tion, comprising: a receiver for providing the parametric representation of the audio signal, the receiver being op- erative to provide level information for channel pairs and to provide a left/right coherence measure for a channel pair including a left channel and a right channel, the left/right coherence measure representing a coherence in- formation between at least one channel pair including a first channel only having information from the left side and a second channel only having information from the right side; and a processor for supplying parametric information for channel pairs, the processor being operative to select level information from the parametric representation and to derive coherence information for at least one channel pair using the left/right coherence measure, the at least one channel pair including a first channel only having informa- tion from the left side and a second channel only having information from the right side. In accordance with a third aspect of the present invention, this object is achieved by a method for generating a para- metric representation of an audio signal. In accordance with a fourth aspect of the present inven- tion, this object is achieved by a computer program imple- menting the above method, when running on a computer. In accordance with a fifth aspect of the present invention, this object is achieved by a method for processing a para- metric representation of an original audio signal. In accordance with a sixth aspect of the present invention, this object is achieved by a computer program implementing the above method, when running on a computer. In accordance with a seventh aspect of the present inven- tion, this object is achieved by encoded audio data gene- rated by building a parametric representation of an audio signal having at least two original left channels on a left side and two original right channels on a right side with respect to a listening position, wherein the parametric representation comprises level differences for channel pairs and a left/right coherence measure derived from co- herence information from a channel pair including a first channel only having information from the left side and a second channel only having information from the right side. The present invention is based on the finding that a para- metric representation of a multi-channel audio signal sde- scribes the spatial properties of the audio signal well us- ing compact side information, when the coherence informa- tion, describing the coherence between a first and a second channel, is derived within a hierarchical encoding process only for channel pairs including a first channel having only information of a left side with respect to a listening position and including a second channel having only infor- mation from a right side with respect to a listening posi- tion. As in the hierarchical process the multiple audio channels of the original audio signal are downmixed itera- tively preferably into a monophonic channel, one has the chance to pick the relevant side-information parameters during the encoding process for a step involving only chan- nel pairs that bear the desired information needed to de- scribe the spatial properties of the original audio signal as good as possible. This allows to build a parametric rep- resentation of the original audio signal on the basis of those picked parameters or on a combination of those pa- rameters, allowing a significant reduction of the size of the side information, that is holding the spatial informa- tion of the downmix signal. The proposed concept allows combining cue values to reduce the side information rate of a downmix audio signal even for the case where only a single (monophonic) transmission channel is feasible. The inventive concept even allows different hierarchical topologies of the encoder. It is specifically clarified, how a suitable single ICC value can be derived, which can be applied in a spatial audio decoder using the hierarchical encoding/decoding approach to repro- duce the original sound image faithfully. One embodiment of the present invention implements a hier- archical encoding structure that combines the left front and the left rear audio channel of a 5.1 channel audio sig- nal into a left master channel and that simultaneously com- bines the right front and the right rear channel into a right master channel. Combining the left channels and the right channels separately, the important left/right coher- ence information is mainly preserved and is, according to the invention, derived in the second encoding step, in which the left master and the right master channels are downmixed into a stereo master channel. During this down- mixing process the ICC parameter for the whole system is derived, since this ICC parameter will be the ICC parameter resembling with most accuracy the left/right coherence. Within this embodiment of the present invention, one gets an ICC parameter, describing the most important left/right coherence of the six audio channels by simply arranging the hierarchical encoding steps in an appropriate way and not by applying some artificial function to a set of ICC pa- rameters, describing arbitrary pairs of channels, as it is the case in the prior art techniques. In a modification of the described embodiment of the pre- sent invention, the center channel and the low frequency channel of the 5.1 audio signal are downmixed into a center master channel, this channel holding mainly information about the center channel, since the low frequency channel contains only signals with such a low frequency that the origin of the signals can hardly be localized by humans. It can be advantageous to additionally steer the ICC value, derived as described above, by parameters describing the center master channel. This can be done, for example, by weighting the ICC value with energy information, the energy information telling how much energy is transmitted via the center master channel with respect to the stereo master channel. In a further embodiment of the present invention, the hier- archical encoding process is performed such, that in a first step the left-front and right-front channels of a 5.1 audio signal are downmixed into a front master channel, whereas the left-rear and the right-rear channels are down- mixed into a rear master channel. Therefore, in each of the downmixing processes an ICC value is generated, containing information about the important left/right coherence. The combined and transmitted ICC parameter is then derived from a combination of the two separate. ICC values, an advanta- geous way of deriving the transmitted ICC parameter is to build the weighted sum of the ICC values, using the level parameters of the channels as weights. In a modification of the invention, the center channel and the low frequency channel are downmixed into a center mas- ter channel and afterwards the center master channel and the front master channel are downmixed into a stereo master channel. In the latter downmixing process, a correlation between the center and the stereo channels is received, which is used to steer or modify a transmitted ICC parame- ter, thus also taking into account the center contribution to the front audio signal. A major advantage of the previ- ously described system is that one can build the coherence information such that channels, that contribute most to the audio signal, mainly define the transmitted ICC value. This will normally be the front channels, but for example in a multi-channel representation of a music concert, the signal of the applauding audience could be emphasized by mainly using the ICC value of the rear channels. It is a further advantage that the weighting between the front and the back channels can be varied dynamically, depending on the spa- tial properties of the multi-channel audio signal. In one embodiment of the present invention an inventive hi- erarchical decoder is operative to receive less ICC parame- ters than required by the number of existing decoding steps. The decoder is operational to derive the ICC parame- ters required for each decoding step from the received ICC parameters. This might be done deriving the additional ICC parameters using a deriving rule that is based on the received ICC pa- rameters and the received ICLD values or by using prede- fined values instead. In a preferred embodiment, however, the decoder is opera- tional to use a single transmitted ICC parameter for each individual decoding step. This is advantageous as the most important correlation, the left/right correlation is pre- served in a transmitted ICC parameter within the inventive concept. As this is the case, a listener will experience a reproduction of the signal that is resembling the original signal very well. It is to be remembered that the ICC pa- rameter is defining the perceptual wideness of a recon- structed signal. If the decoder would modify a transmitted ICC parameter after transmission, the ICC parameters de- scribing the perceptual wideness of the reconstructed sig- nal may become rather different for the left/right and for the front/back correlation within the hierarchical repro- duction. This would be most disadvantageous since then, a listener that moves or rotates his head will experience a signal that becomes perceptually wider or narrower, which is of course most disturbing. This can be avoided by dis- tributing a single received ICC parameter to the decoding units of a hierarchical decoder. In another preferred embodiment, an inventive decoder is operational to receive a full set of ICC values or alterna- tively a single ICC value, wherein the decoder recognizes the decoding strategy to apply by receiving a strategy in- dication within the bitstream. Such the backwards compati- ble decoder is also operational in prior art environments, decoding prior art signals transmitting a full set of ICC data. Brief description of the ACCOMPANYING drawings Preferred embodiments of the present invention are subse- quently described by referring to the enclosed drawings, wherein: Fig. 1 shows a block diagram of an embodiment of the in- ventive hierarchical audio encoder; Fig. 2 shows an embodiment of an inventive audio en- coder; Fig. 2a shows a possible steering scheme of an IIC pa- rameters of an inventive audio encoder; Fig. 3a,b shows graphical representations of side channel information; Fig. 4 shows a second embodiment of an inventive audio encoder; Fig. 5 shows a block diagram of a preferred embodiment of an inventive audio decoder; Fig. 6 shows an embodiment of an inventive audio de- coder; Fig. 7 shows another embodiment of an inventive audio decoder; Fig. 8 shows an inventive transmitter or audio recorder; Fig. 9 shows an inventive receiver or audio player; Fig. 10 shows an inventive transmission system; Fig. 11 shows a prior art joint stereo encoder; Fig. 12 shows a block diagram representation of a prior art BCC encoder/decoder chain; Fig. 13 shows a block diagram of a prior art implementa- tion of a BCC synthesis block; Fig. 14 shows a representation of a scheme for determin- ing BCC parameters; and Fig. 15 shows a prior art hierarchical encoder. Detailed Description of Preferred Embodiments Fig. 1 shows a block diagram of an inventive encoder to generate a parametric representation of an audio signal. Fig. 1 shows a generator 220 to subsequently combine audio channels and generate spatial parameters describing spatial properties of pairs of channels that are combined into a single channel. Fig. 1 further shows a provider 222 to pro- vide a parametric representation of a multi-channel audio signal by selecting level difference information between channel pairs and by determining a left/right coherence measure using coherence information generated by the gen- erator 220. To demonstrate the principle of the inventive concept of hierarchical multi-channel audio coding, Fig. 1 shows a case, where four original audio channels 224a to 224d are iteratively combined, resulting in a single channel 226. The original audio channels 224a and 224b represent the left-front and the left-rear channel of an original four- channel audio signal, the channels 224c and 224d represent the right-front and the right-rear channel, respectively. Without loss of generality, only two of various spatial pa- rameters are shown in Fig. 1 (ICLD and ICC) . According to the invention, the generator 220 combines the audio chan- nels 224a to 224d in such a way that during the combination process an ICC parameter can be derived that carries the important left/right coherence information. In a first step, the channels containing only left side in- formation 224a and 224b are combined into a left master channel 228a (L) and the two channels containing only right side information 224c and 224d are combined into a right master channel 228b (R) . During this combination the gen- erator generates two ICLD parameters 230a and 230b, both being spatial parameters containing information about the level difference of two original channels being combined into one single channel. The generator also generates two ICC parameters 232a and 232b, describing the correlation between the two channels being combined into a single chan- nel. The ICLD and ICC parameters 230a, 230b, 232a, and 232b are transferred to the provider 222. In the next step of the hierarchical generation process, the left master channel 228a is combined with the right master channel 228b into the resulting audio channel 226, wherein the generator provides an ICLD parameter 234 and an ICC parameter 236, both of them being transmitted to the provider 222. It is important to note that the ICC parame- ter 236 generated in this combination step mainly repre- sents the important left/right coherence information of the original four-channel audio signal represented by the audio channels 224a to 224d. Therefore, the provider 222 builds a parametrical represen- tation 238 from the available spatial parameters 230a,b, 232a,b, 234 and 236 such, that the parametrical representa- tion comprises the parameters 230a, 230b, 234, and 236. Fig. 2 shows a preferred embodiment of an inventive audio encoder that encodes a 5.1 multi-channel signal into a sin- gle monophonic signal. Fig. 2 shows three transformation units 240a to 240c, five 2-to-1-downmixers 242a to 242e, a parameter combination unit 244 and an inverse transformation unit 246. The origi- nal 5.1 channel audio signal is given by the left-front channel 248a, the left-rear channel 248b, the right-front channel 248c, the right-rear channel 248d, the center chan- nel 248e, and the low-frequency channel 248f. It is impor- tant to note that the original channels are grouped in such a way that the channels containing only left side informa- tion 248a and 248b form one channel pair, the channels con- taining only right side information 248c and 248d form an- other channel pair and that the center channel 248e and 248f are forming a third channel pair. The transformation units 240a to 240c convert the chan- nels 248a to 248f from the time domain into their spectral representation 250a to 250f in the frequency subband do- main. In the first hierarchical encoding step 252, the left channels 250a and 250b are encoded into a left master chan- nel 254a, the right channels 250c and 250d are encoded into a right master channel 254b and the center channel 250e and the low frequency channel 250f are encoded into a center master channel 256. During this first hierarchic encoding step 252, the three involved 2-to-1-encoders 242a to 242c generate the downmixed channels 254a, 254b, and 256, and in addition the important spatial parameter sets 260a, 260b, and 260c, wherein the parameter set 260a (parameter set 1) describes the spatial information between channels 250a and 250b, the parameter set 260b (parameter set 2) describes the spatial relation between channels 250c and 250d and the parameter set 260c (parameter set 3) describes the spatial relation between channels 250e and 250f. In a second hierarchical step 262, the left master chan- nel 254a and the right master channel 254b are downmixed into a stereo master channel 264, generating a spatial pa- rameter set 266 (parameter set 4), wherein the ICC parame- ter, of this parameter set 266 contains the important left/right correlation information. To build a combined ICC value from parameter set 266, the parameter set 266 can be transferred to the parameter combination unit 244 via a data connection 2 68. In the third hierarchical encoding step 272, the stereo master channel 264 is combined with the center master channel 256 to form a monophonic result channel 274. The parameter set 276, that is derived during this downmixing process, can be transferred via a data con- nection 278 to the parameter combination unit 244. Finally, the result channel 274 is transformed into the time domain by the inverse transformation unit 246, to build the mono- phonic downmix audio signal 280, which is the final mono- phonic representation of the original 5.1 channel signal represented by the audio channels 248a to 248f. To reconstruct the original 5.1 channel audio signal from the monophonic downmix audio channel 280, the parametric representation of the 5.1 channel audio signal is addition- ally needed. For the tree structure shown in Fig. 2, it can be seen that the left front and back channels are combined into an L-signal 254a. Similarly, the right front and back channels are combined into an R-signal 254b. Subsequently, the combination of the L and R-signals is carried out, which delivers parameter set number 4 (266). In the case of this hierarchical structure, a simple way of deriving a combined ICC value is to pick the ICC value of parameter set number 4 and take this as combined ICC value, which is then incorporated into the parametric representation of the 5.1 channel signal by the parameter combination unit 244. More sophisticated methods can also take into account the influence of the center channel (e.g. by using parameters from parameter set number 5), as shown in Fig. 2a. As an example, the energy ratio E(LR)/ E(C) of the energy contained in the LR (264) channel and in the C channel (256) from parameter set number 5 can be used to steer the ICC of value. In case most of the energy comes from the LR path, the transmitted ICC value should become close to the ICC value ICC(LR) of parameter set number 4. In case most of the energy comes from the C-path 256, the transmitted ICC value should become subsequently close to 1, as indi- cated in Fig. 2a. The Figure shows two possible ways to im- plement this steering of the ICC Parameter either by switching between two extreme values when the energy ratio crosses a given threshold 286 (steering function 288a) or by a smooth transition between the extreme values (steering function 288b). Figures 3a and 3b show a comparison of a possible paramet- ric representation of a 5.1 audio channel delivered from a hierarchical encoder structure using a prior art technique (Fig. 3a) and using the inventive concept for audio coding (Fig. 3b). Fig. 3a shows a parametric representation of a single time frame and a discrete frequency interval, as it would be provided by the prior art technique. Each of the 2-to-l en- coders 242a to 242e from Fig. 2 delivers one pair of ICLD and ICC parameters, the origin of the parameter pairs is indicated within Fig. 3a. Following the prior art approach, all parameter sets, as provided by the 2-to-l encoders 242a to 242e have to be transmitted together with the downmix monophonic audio signal 280 as side information to rebuild a 5.1 channel audio signal. Fig. 3b shows parameters derived following the inventive concept. Each of the 2-to-l encoders 242a to 242e contrib- utes only one parameter directly, the ICLD parameter. The single transmitted ICC parameter ICCC is derived by the pa- rameter combination unit 244, and not provided directly by the 2-to-l encoders 242a to 242e. As it is clearly seen in the figures 3a and 3b, the inventive concept for a hierar- chical encoder can reduce the amount of side information data significantly compared to prior art techniques. Fig. 4 shows another preferred embodiment of the current invention, allowing to encode a 5.1 channel audio signal into a monophonic audio signal in a hierarchical encoding process and to supply compact side information. As the principle hardware structure is equal to the one described in Fig. 2, the same items in the two figures are labeled with the same numbers. The difference is due to the differ- ent grouping of the input channels 248a to 248f and hence the order, in which the single channels are downmixed into the monophonic channel 274 differs from the downmixing or- der in Fig.2. Therefore, only the aspects differing from the description of Fig. 2, which are vital for the under- standing of the embodiment of the current invention shown in Fig. 4, are described in the following. The left-front channel 248a and the right-front chan- nel 248c are grouped together to form a channel pair, the center channel 248e and the low-frequency channel 248f form another input channel pair and the third input channel pair of the 5.1 audio signal is formed by the left-rear chan- nel 248b and the right-rear channel 248d. In a first hierarchical encoding step 252, the left-front channel 250a and the right-front channel 250c are downmixed into a front master channel 290 (F), the center chan- nel 250e and the low-frequency channel 250f are downmixed into a center master channel 292 (C) and the left-rear channel 250b and the right-rear channel 250d are downmixed into a rear master channel 294 (S) . A parameter set 300a (parameter set 1) describes the front master channel 290, a parameter set 300b (parameter set 2) describes the center master channel 292, and a parameter set 300c (parameter set 3) describes the rear master channel 294. It is important to note that the parameter set 300a as well as the parameter set 300c hold information that describes the important left/right correlation between the original channels 248a to 248f. Therefore, parameter set 300a and parameter set 300c is made available to the parameter com- bination unit 244 via data links 302a and 302b. In a second encoding step 262, the front master channel 290 and the center master channel 292 are downmixed into a pure front channel 304, generating a parameter set 300d (parame- ter set 4). This parameter set 300d is also made available to the parameter combination unit 244 via a data link 306. In a third hierarchical encoding step 272, the pure front channel 304 is downmixed with the rear master channel 294 into the result channel 274 (M), which is then transformed into the time domain by the inverse transformation unit 24 6 to form the final monophonic downmix audio channel 280. The parameter set 300e (Parameter Set 5), originating from the downmixing of the pure front channel 304 and the rear mas- ter channel 294 is also made available to the parameter combination unit 244 via a data link 310. The tree structure in Fig. 4 first performs a combination of the left and right channels separately for front and rear. Thus, basic left/right correlation/coherence is pre- sent in the parameter sets 1 and 3 (300a, 300c) . A combined ICC value could be built by the parameter combination unit 244 by building the weighted average between the ICC values of parameter sets 1 and 3. This means that more weight will be given to stronger channel pairs (Lf/Rf ver- sus Lr/Rr). One can achieve the same by deriving a combined ICC Parameter ICCC building the weighted sum: ICCC = (AICC1 + BICC2)/(A+B) wherein A denotes the energy within the pair of channels corresponding to ICC1 and B denotes the energy within the pair of channels corresponding to ICC2. In an alternative embodiment, more sophisticated methods can also take into account the influence of the center channel (e.g. by taking into account parameters of the pa- rameter set number 4). Fig. 5 shows an inventive decoder, to process received com- pact side information, being a parametric representation of an original four-channel audio signal. Fig. 5 comprises a receiver 310 to provide a compact parametric representation of the four-channel audio signal and a processor 312 to process the. compact parametric representation such that a full parametric representation of the four-channel audio signal is supplied, which enables one to reconstruct the four-channel audio signal from a received monophonic audio signal. The receiver 310 receives the spatial parameters ICLD (B) 314, ICLD (F) 316, ICLD (R) 318 and ICC 320. The pro- vided parametric representation, consisting of the parame- ters 314 to 320, describes the spatial properties of the original audio channels 324a to 324d. As a first up-mixing step, the processor 312 supplies the spatial parameters describing a first channel pair 326a, being a combination of two channels 324a and 324b (Rf and Lf) and a second channel pair 326b, being a combination of two channels 324c and 324d (Rr and Lr). To do so, the level difference 314 of the channel pairs is required. Since both channel pairs 326a and 326b contain a left channel as well as a right channel, the difference between the channel pairs describes mainly a front/back correlation. Therefore, the received ICC parameter 320, carrying mainly information about the left/right coherence, is provided by the proces- sor 312 such that the left/right coherence information is preferably used to supply the individual ICC parameters for the channel pairs 326a and 326b. In the next step, the processor 312 supplies appropriate spatial parameters to be able to reconstruct the single au- dio channels 324a and 324b from channel 326a, and the chan- nels 324c and 324d from channel 326b. To do so, the proces- sor 312 supplies the level differences 316 and 318, and the processor 312 has to supply appropriate ICC values for the two channel pairs, since each of the channel pairs 326a and 326b contains important left/right coherence informa- tion. In one example, the processor 312 could simply provide the combined received ICC value 320 to up-mix channel pairs 326a and 326b. Alternatively, the received combined ICC value 320 could be weighted to derive individual ICC values for the two channel pairs, the weights being for ex- ample based on the level difference 314 of the two channel pairs. In a preferred embodiment of the present invention, the processor provides the received ICC parameter 320 for every single upmixing step to avoid the introduction of addi- tional artefacts during the reproduction of the channels 324a to 324d. Fig. 6 shows a preferred embodiment of a decoder incorpo- rating a hierarchical decoding procedure according to the current invention, to decode a monophonic audio signal to a 5.1 multi-channel audio signal, making use of a compact pa- rametric representation of an original 5.1 audio signal. Fig. 6 shows a transforming unit 350, a parameter- processing unit 352, five 1-to-2 decoders 354a to 354e and three inverse transforming units 356a to 356c. It should be noted that the embodiment of an inventive de- coder according to Fig. 6 is the counterpart of the encoder described in Fig. 2 and designed to receive a monophonic downmix audio channel 358, which shall finally be up-mixed into a 5.1 audio signal consisting of audio channels 360a (lf), 360b (1r), 360c (rf) , 360d (rr), 360e (co) and 360f (lfe) . The downmix channel 358 (m) is received and transformed from the time domain to the frequency domain into its frequency representation 362 using the transform- ing unit 350. The parameter-processing unit 352 receives a combined and compact set of spatial parameters 364 in par- allel with the downmix channel 358. In a first step 363 of the hierarchical decoding process, the monophonic downmix channel 362 is up-mixed into a ste- reo master channel 364 (LR) and a center master channel 366 (C) . In a second step 368 of the hierarchical decoding process, the stereo master channel 364 is up-mixed into a left mas- ter channel 370 (L) and a right master channel 372 (R). In a third step of the decoding process, the left master channel 370 is up-mixed into a left-front channel 374a and a left-rear channel 374b, the right master channel 372 is up-mixed into a right-front channel 374c and right-rear channel 374d, and the center master channel 366 is up-mixed to a center channel 374e and a low-frequency channel 374f. Finally, the six single audio channels 374a to 374f are transformed by the inverse transforming units 356a to 356c into their representation in the time domain and thus build the reconstructed 5.1 audio signal, having six audio chan- nels 360a to 360f. To retain the original spatial property of the 5.1 audio signal, the parameter processing unit 352, especially the way the parameter processing unit provides the individual parameter sets 380a to 380e, is vital, espe- cially the way the parameter processing unit 352 derives the individual parameter sets 380a to 380e. The received combined ICC parameter describes the important left/right coherence of the original six channel audio sig- nal. Therefore, the parameter processing unit 352 builds the ICC value of parameter set 4 (380d) such that it resem- bles the left/right correlation information of the origi- nally received spatial value, being transmitted within the parameter set 364. In the simplest possible implementation the parameter processing unit 352 simply uses the received combined ICC parameter. Another preferred embodiment of a decoder according to the current invention is shown in Fig. 7, the decoder in Fig. 7 being the counterpart of the encoder from Fig. 4. As the encoder in Fig. 7 comprises the same functional blocks as the decoder in Fig. 6, the following discussion is limited to the steps in which the hierarchical decoding process differs from the one in Fig. 6. This is mainly due to the fact that the monophonic signal 362 is up-mixed in a different order and a different channel combination, since the original 5.1 audio signal had been downmixed differ- ently than the one received in Fig. 6. In the first step 363 of the hierarchical decoding process, the monophonic signal 362 is up-mixed into a rear master channel 400 (S) and a pure front channel 402 (CF). In a second step 368, the pure front channel 402 is up- mixed into a front master channel 404 and a center master channel 406. In a third decoding step 372, the front master channel is up-mixed into a left-front channel 374a and a right-front channel 374c, the center master channel 406 is up-mixed into a center channel 374e and a low-frequency channel 374f and the rear master channel 400 is up-mixed into a left- rear channel 374b and a right-rear channel 374d. Finally, the six audio channels 374a to 374f are transformed from the frequency domain into their time-domain representa- tions 360a to 360f, building the reconstructed 5.1 audio signal. To preserve the spatial properties of the original 5.1 signal, having been coded as side information by the encoder, the parameter processing unit 352 supplies the pa- rameter sets 410a to 410e for the 1-to-2 decoders 354a to 354e. As the important left/right correlation information is needed in the third up-mixing process 372 to build the Lf, Rf, Lr, and Rr channels, the parameter-processing unit 352 may supply an appropriate ICC value in the parame- ter sets 410a and 410c, in the simplest implementation sim- ply taking the transmitted ICC parameter to build the pa- rameter sets 410a and 410c. In a possible alternative, the received ICC parameter could be transformed into individual parameters for parameter sets 410a and 410c by applying a suitable weighting function to the received ICC parameter, their weight being for example dependent on the energy transmitted in the front master channel 404 and in the rear master channel 400. In an even more sophisticated implemen- tation, the parameter-processing unit 352 could also take into account center channel information to supply an indi- vidual ICC value for parameter set 5 and parameter set 4 (410a, 410b). Fig. 8 is showing an inventive audio transmitter or re- corder 500 that is having an encoder 220, an input inter- face 502 and an output interface 504. An audio signal can be supplied at the input interface 502 of the transmitter/recorder 500. The audio signal is en- coded using an inventive encoder 220 within the transmit- ter/recorder and the encoded representation is output at the output interface 504 of the transmitter/recorder 500. The encoded representation may then be transmitted or stored on a storage medium. Fig. 9 shows an inventive receiver or audio player 520, having an inventive decoder 312, a bit stream input 522, and an audio output 524. A bit stream can be input at the input 522 of the inventive receiver/audio player 520. The bit stream then is decoded using the decoder 312 and the decoded signal is output or played at the output 524 of the inventive receiver/audio player 520. Fig. 10 shows a transmission system comprising an inventive transmitter 500, and an inventive receiver 520. The audio signal input at the input interface 502 of the transmitter 500 is encoded and transferred from the out- put 504 of the transmitter 500 to the input 522 of the re- ceiver 520. The receiver decodes the audio signal and plays back or outputs the audio signal on its output 524. The discussed examples of inventive decoders downmix a multi-channel audio signal into a monophonic audio signal. It is of course alternatively possible to downmix a multi- channel signal into a stereophonic signal, which would for example mean for the embodiments discussed in Figs. 2 and 4, that one step in the hierarchical encoding process could be by-passed. All other numbers of resulting channels are also possible. The proposed method to hierarchically encode or decode multi-channel audio information providing/using a compact parametric representation of the spatial properties of the audio signal is described mainly by shrinking the side in- formation by combining multiple ICC values into one single transmitted ICC value. It is to note here that the de- scribed invention is in no way limited to the use of just one combined ICC value. Instead, e.g., two combined values can be generated, one describing the important left/right correlation, the other one describing a front/back correla- tion. This can advantageously be implemented, for example, in the embodiment of the current invention shown in Fig. 2, where on the one hand a left front channel 250a and a left rear channel 250b is combined into a left master channel 254a, and where a right front channel 250c and a right rear chan- nel 250d is combined into a rear master channel 254b. These two encoding steps therefore yield information about the front back correlation of the original audio signal, which can easily be processed to provide an additional ICC value, holding front/back correlation information. Furthermore, in a preferred modification of the current in- vention, it is advantageous to have encoding/decoding proc- esses, which can do both, use the prior art individually transmitted parameters, and, depending on a signaling side information that is sent from encoder to decoder, also use combined transmitted parameters. Such a system can advanta- geously achieve both, higher representation accuracy (using individually transmitted parameters) and, alternatively, a low side information bit rate (using combined parameters). Typically, the choice of this setting is made by the user depending on the application requirements, such as the amount of side information that can be accommodated by the transmission system used. This allows to use the same uni- fied encoder/decoder architecture while being able to oper- ate within a wide range of side information bit rate/precision trade-offs. This is an important capability in order to cover a wide range of possible applications with differing requirements and transmission capacity. In another modification of such an advantageous embodiment, the choice of the operating mode could also be made auto- matically by the encoder, which analyses for example the deviation of the decoded values from the ideal result in case the combined transmission mode was used. If no sig- nificant deviation is found, then combined parameter trans- mission is employed. A decoder could even decide himself, based on an analysis of the provided side information, which mode is the appropriate one to use. For example, if there were just one spatial parameter provided, the decoder would automatically switch into the decoding mode using combined transmitted parameters. In another advantageous modification of the current inven- tion, the encoder/decoder switches automatically from the mode using combined transmitted parameters to the mode us- ing individually transmitted parameters, to ensure the best possible compromise between an audio reproduction quality and a desired low side information bit rate. As can be seen from the described preferred embodiments of the encoders/decoders in Figs. 2, 4, 6, and 7, these units make use of the same functional blocks. Therefore, another preferred embodiment builds an encoder and a decoder using the same hardware within one housing. In an alternative embodiment of the current invention it is possible to dynamically switch between the different encod- ing schemes by grouping different channels together as channel pairs, making it possible to dynamically use the encoding scheme that provides the best possible audio qual- ity for the given multi-channel audio signal. It is not necessary to transmit the monophonic downmix channel alongside the parametric representation of a multi- channel audio signal. It is also possible to transmit the parametric representation alone, to enable a listener, who already owns a monophonic downmix of the multi-channel au- dio signal, for example as a record, to reproduce a multi- channel signal using his existing multi-channel equipment and a parametric side information. To summarize, the present invention allows to determine these combined parameters advantageously from known prior art parameters. Applying the inventive concept of combining parameters in a hierarchical encoder/decoder structure, one can downmix a multi-channel audio signal into a mono-based parametric representation, obtaining a precise parametriza- tion of the original signal at a low side information rate (= bit-rate reduction). It is one objective of the present invention that the en- coder combines certain parameters with the objective of re- ducing the number of parameters that have to be transmit- ted. Then, the decoder derives the missing parameters from parameters that have been transmitted, instead of using de- fault parameter values, as it is the case in systems of prior art, for example the one being shown in Fig. 15. This advantage becomes evident reviewing again the embodi- ment of a hierarchical parametric multi-channel audio coder using prior art techniques, an example shown in Fig. 15. There, the input signals (Lf, Rf, Lr, Rr, C and LFE, corre- sponding to the left front, right front, left rear, right rear, center and low frequency enhancement channels, re- spectively) are segmented and transformed to the frequency domain to obtain the required time/frequency tiles. The re- sulting signals are subsequently combined in a pair-wise fashion. For example, the signals Lf and Lr are combined to form signal "L". A corresponding spatial parameter set (1) is generated to model the spatial properties between the signals Lf and Lr (i.e. consisting of one or more of IIDs, ICCs, IPDs). In the embodiment according to the prior art shown in Fig. 15, this process is repeated until a single output channel (M) is obtained, the output channel being accompanied by five parameter sets. The application of prior art hierarchical coding techniques would then imply the transmission of all parameter sets. It should be noted, however, that not all parameter sets have to contain values for all possible spatial parameters. For example, parameter set 1 in Fig. 15 may consist of IID and ICC parameters, while parameter set 3 may consist of IDD parameters only. If certain parameters are not trans- mitted for specific sets, the prior art hierarchical de- coder will apply a default value for these parameters (for example ICC = + 1, IPD = 0, etc.). Thus, each parameter set represents a specific signal combination only and does not describe spatial properties of the remaining channel pairs. This loss of knowledge about the spatial properties of sig- nals, who's parameters are not being transmitted, can be avoided using the inventive concept, in which the encoder is combining specific parameters such that the most impor- tant spatial properties of the original signal are pre- served. When, for example, ICC parameters are combined into a sin- gle value, the combined parameters can be used in the de- coder as a substitute for all individual parameters (or the individual parameter used in the decoder can be derived from the transmitted ones). It is an important feature that the encoder parameter combination process is carried out such that the sound image of the original multi-channel signal is preserved as closely as possible after recon- struction by the decoder. Transmitting ICC parameters, this means that the width (decorrelation) of the original sound field should be retained. It is to be noted here that the most important ICC value is between the left/right axis since the listener usually is facing forward in the listening set-up. This can be taken into account advantageously to build the hierarchical en- coding structure such that a suitable parametric represen- tation of the audio signal can be obtained during the it- erative encoding process, wherein the resulting combined ICC value represents mainly the left/right decorrelation. This will be explained in more detail later when discussing preferred embodiments of the current invention. The inventive encoding/decoding scheme allows to reduce the number of transmitted parameters from a encoder to a de- coder using a hierarchical structure of a spatial audio system by means of the two following measures: • combining the individual encoder parameters to form a combined parameter, which is transmitted to the de- coder instead of individual ones. The combination of the parameters is carried out such that the signal sound image (including L/R correlation/coherence) is preserved as far as possible. • the transmitted combined parameter is used in the de- coder instead of several transmitted individual pa- rameters (or the actually used parameters are derived from the combined one). Depending on certain implementation requirements of the in- ventive methods, the inventive methods can be implemented in hardware or in software. The implementation can be per- formed using a digital storage medium, in particular a disk, DVD or a CD having electronically readable control signals stored thereon, which cooperate with a programmable computer system such that the inventive methods are per- formed. Generally, the present invention is, therefore, a computer program product with a program code stored on a machine readable carrier, the program code being operative for performing the inventive methods when the computer pro- gram product runs on a computer. In other words, the inven- tive methods are, therefore, a computer program having a program code for performing at least one of the inventive methods when the computer program runs on a computer. While the foregoing has been particularly shown and de- scribed with reference to particular embodiments thereof, it will be understood by those skilled in the art that various other changes in the form and details may be made without departing from the spirit and scope thereof. It is to be understood that various changes may be made in adapt- ing to different embodiments without departing from the broader concepts disclosed herein and comprehended by the claims that follow. We Claim: 1. An encoder for generating a parametric representation (314,316,318,320) of an audio signal having at least two original left channels (224a,224b) on a left side and two original right channels (224c,224d) on a right side with respect to a listening position, comprising: a generator (220) for generating parametric information, the generator being operative to separately process several pairs of channels to derive a level information (230a,230b) for processed channel pairs, and to derive coherence information (232a,232b) for a channel pair including a first channel (228a) only having information from the left side and a second channel (228b) only having information from the right side; and a provider (222) for providing the parametric representation (238,314,316,318,320) by selecting the level information (230a,230b) for channel pairs and by determining a left/right coherence measure (236) using the coherence information (232a,232b) and to introduce the left/right coherence measure (236) into an output datastream as the only coherence information of the audio signal within the parametric representation (238;314,316,318,320). 2. The encoder as claimed in claim 1, wherein the generator (220) is operative to process a left-front channel (If) and a left- rear channel (Ir) to derive a If/lr level information (230a), wherein a combination of the left-front channel (If) and the left-rear (Ir) channel forms a left master channel (LM), and to process a right- front channel (rf) and a right-rear channel (rr) to derive a rf/rr level information (230b), wherein a combination of the right-front channel (rf) and the right-rear (rr) channel forms a right master channel (RM); and to process the left master channel (LM) and the right master channel (RM) to derive a Im/rm level information (234) and to derive the coherence information (236), wherein a combination of the left master channel (LM) and the right master channel (RM) forms a stereo master channel (SM). 3. The encoder as claimed in claim 2, wherein the generator (220) is operative to process a center channel (ce) and a low-frequency channel (lo) to derive a ce/lo level information, wherein a combination of the center channel (ce) and the low- frequency channel (lo) forms a center master channel (CM). 4. The encoder as claimed in claim 3, wherein the generator (220) is operative to process the stereo master channel (SM) and the center master channel (CM) to derive a sm/cm level information, wherein a combination of the stereo master channel (SM) and the center master (CM) channel forms a downmix channel; and in which the provider (222) is operative to determine the left/right coherence measure using the coherence information (232a,232b) and the sm/cm level information. 5. The encoder as claimed in claim 4, wherein the provider (222) is operative to calculate the left/right coherence measure depending on the sm/cm level information such that, in a case, in which the sm/cm level information indicates, that more energy is in the stereo master channel (SM) than in the center master channel (CM), the left/right coherence measure is more close to the coherence information (232a, 232b) compared to a situation, and wherein the sm/cm level information indicates, that more energy is in the center master channel (CM), in which case the left/right coherence measure is more close to unity. 6. The encoder as claimed in claim 4, wherein the provider (222) is operative to calculate the left/right coherence measure depending on the sm/cm level information such that, in a case, wherein the sm/cm level information indicates, that a ratio of the energy in the stereo master channel (SM) and the energy in the center master channel (CM) exceeds a predefined value, the left/right coherence measure is set to the coherence information (232a,232b) compared to a situation, wherein the sm/cm level information indicates, that the ratio of the energy in the stereo master channel SM to the energy in the center master channel (CM) stays below or equals the predefined value, and wherein the left/right coherence measure is set to unity. 7. The encoder as claimed in claim l,wherein the generator (220) is operative to process a left- front channel (If) and a right-front channel (rf) to derive a If/rf level information and a first coherence information (232a,232b), wherein a combination of the left-front channel (If) and the right-front channel (rf) forms a front master channel (FM), and to process a left-rear channel (Ir) and a right-rear channel (rr) to derive a Ir/rr level information and to derive a second coherence information (232a,232b), wherein a combination of the left- rear channel (Ir) and the right-rear channel (rr) forms a rear master channel (RM), and wherein the provider (222) is operative to determine the left/right coherence measure combining the first coherence information (232a) and the second coherence information (232b). 8. The encoder as claimed in claim 7, wherein the provider (222) is operative to determine the left/right coherence measure based on a weighted sum of the first and the second coherence information (232a,232b), using level information of the front master channel (FM) and level information of the rear master channel (RM) as weights. 9. The encoder as claimed in claim 7, wherein the generator (220) is operative to process a center channel (ce) and a low-frequency channel (lo) to derive a ce/lo level information, and wherein a combination of the center channel (ce) and the low-frequency channel (lo) forms a center master channel (CM). 10. The encoder as claimed in claim 9, wherein the generator (220) is operative to process the front master channel (FM) and the center master channel (CM) to derive a fm/cm level information, wherein a combination of the front master channel (FM) and the center master channel (CM) forms a pure front channel (PF); and wherein the provider (222) is operative to determine the left/right coherence measure combining the first and the second coherence information (232a,232b) additionally using the fm/cm level information. 11. The encoder as claimed in claim 10, wherein the generator (220) is operative to process the pure front channel (PF) and the rear master channel (RM) to derive a pf/rm level information, and wherein a combination of the pure front channel (PF) and the rear master channel (RM) forms a downmix channel. 12. The encoder as claimed in claim 1, wherein the generator (220) is operative to process the pairs of channels in discrete time frames of a given length. 13. The encoder as claimed in claim 1, wherein the generator (220) is operative to process the pairs of channels in discrete frequency intervals of a given bandwidth. 14. A decoder for processing a parametric representation (314,316,318,320) of an original audio signal, the original audio signal having at least two original left channels (224a,224b) on a left side and at least two original right channels (224c,224d) on a right side with respect to a listening position, comprising: a receiver (310) for providing the parametric representation (314,316,318,320) of the audio signal, the receiver (310) being operative to provide level information (314,316,318) for channel pairs and to provide a left/right coherence measure (320) for a channel pair including a left channel and a right channel as the only coherence information of the original audio signal within the parametric representation (314,316,318,320), the left/right coherence measure representing a coherence information between at least one channel pair including a first channel only having information from the left side and a second channel only having information from the right side; and a processor (312) for supplying parametric information for channel pairs, the processor (312) being operative to select level information (314,316,318) from the parametric representation (314,316,318,320) and to derive coherence information for at least one channel pair using the left/right coherence measure (320), the at least one channel pair including a first channel only having information from the left side and a second channel only having information from the right side. 15. The decoder as claimed in claim 14, wherein the receiver (310) is operative to provide a If/lr level information (316) for a channel pair of an original left-front channel (If) and an original left-rear channel (Ir), wherein a combination of the original left-front channel (If) and the original left-rear channel (Ir) forms a left master channel (LM); provide a rf/rr level information (318) for a channel pair of an original right-front channel (rf) and an original right-rear channel (rr), wherein a combination of the original right-front channel (rf) and the original right-rear channel (rr) forms an right master channel (RM); provide a Im/rm level information (314) for a channel pair of the left master channel (LM) and the right master channel (RM), wherein a combination of the left master channel (LM) and the right master channel (RM) forms a stereo master channel (SM); and wherein the processor (312) is operative to provide coherence information for the left master channel (LM) and the right master channel (RM) using the left/right coherence measure (320); the decoder comprising an upmixer, the upmixer having : a first 1-to-2 upmixer (354b) for generation of the left master channel (LM) and the right master channel (RM) from the stereo master channel (SM) using the Im/rm level information and the left/right coherence measure; a second 1-to-2 upmixer (354e) for generation of the original left-front channel (If) and the original left-rear channel (Ir) from the left master channel (LM) using the If/lr level information and a predefined coherence information; and a third 1-to-2 upmixer (354d) for generation of the original right-front (rf) channel and the original right-rear channel (rr) from the right master channel (RM) using the rf/rr level information and a predefined coherence information. 16. The decoder as claimed in claim 15, wherein the receiver (310) is operative to provide a ce/lo level information for a channel pair of an original center channel (ce) and of an original low-frequency channel (lo), wherein a combination of the original center channel (ce) and of the original low-frequency channel (lo) forms a center master channel (CM); and wherein the upmixer is comprising a fourth 1- to-2 upmixer (354a) for generation of the original center channel (ce) and the original low-frequency channel (lo) from the center master channel (CM) using the ce/lo level information and a predefined coherence information. 17. The decoder as claimed in claim 16, wherein the receiver (310) is operative to provide a sm/cm level information for a channel pair of the stereo master channel (SM) and of the center master channel (CM), wherein a combination of the stereo master channel (SM) and of the center master channel (CM) forms a downmix channel; and wherein the upmixer is comprising a fifth 1- to-2 upmixer (254b) for generation of the stereo master channel (SM) and the center master channel (CM) from the downmix channel using the sm/cm level information and a predefined coherence information. 18. The decoder as claimed in claim 14, wherein the receiver (310) is operative to provide a If/rf level information (316) for a channel pair of an original left-front channel (If) and of an original right-front channel (rf), wherein a combination of the original left-front channel (If) and of the original right-front channel (rf) forms a front master channel (FM); provide a Ir/rr level information (318) for a channel pair of an original left-rear channel (Ir) and an original right-rear channel (rr), wherein a combination of the original left-rear channel (Ir) and the original right-rear channel (rr) forms a rear master channel (RM); and wherein the processor (312) is operative to supply a first coherence information for the original left-front channel (If) and the original right-front channel (rf) and to supply a second coherence information for the original left-rear channel (Ir) and the original right- rear channel (rr) using the left/right coherence measure; the decoder comprising an upmixer, the upmixer having : a first 1-to-2 upmixer (354e) for generation of the original left-front channel (If) and the original right-front channel (rf) from the front master channel (FM) using the If/rf level information and the left/right coherence measure; a second 1-to-2 upmixer (354c) for generation of the original left-rear channel (Ir) and the original right-rear channel (rr) from the rear master (RM) channel using the Ir/rr level information and the left/right coherence measure. 19. The decoder as claimed in claim 18, wherein the receiver (310) is operative to provide a ce/lo level information for a channel pair of an original center channel (ce) and of an original low-frequency channel (lo), wherein a combination of the original center channel (ce) and of the original low-frequency channel (lo) forms a center master channel (CM); and wherein the upmixer is comprising a third 1- to-2 upmixer (354d) for generation of the original center channel (co) and the original low-frequency channel (lo) from the center master channel (CM) using the ce/lo level information and a predefined coherence information. 20. The decoder as claimed in claim 19, wherein the receiver (310) is operative to provide a fm/cm level information for a channel pair of the front master channel (FM) and the center master channel (CM), wherein a combination of the front master channel (FM) and the center master channel (CM) forms a pure front channel (PF); and wherein the upmixer is comprising a fourth 1- to-2 upmixer (354b) for generation of the front master channel (FM) and the center master channel (CM) from the pure front channel (PF) using the fm/cm level information and a predefined coherence information. 21. The decoder as claimed in claim 20, wherein the receiver (310) is operative to provide a pf/rm level information for a channel pair of the pure front channel ( PF) and the rear master channel (RM), wherein a combination of the pure front channel (PF) and the rear, master channel (RM) forms a downmix channel; and wherein the upmixer is comprising a fifth 1- to-2 upmixer (354a) for generation of the pure front channel (PF) and the rear master channel (RM) from the downmix channel using the pf/rm level information and a predefined coherence information. 22. The decoder as claimed in claim 14, wherein the processor (312) is operative to derive coherence measures for all channel pairs by distributing the received left/right coherence as the coherence measures. 23. The decoder as claimed in claim 14, wherein the receiver (310) is operative to operate in a first mode, providing level information for channel pairs and providing a left/right coherence measure for a channel pair comprising a left channel and a right channel as the only coherence information of the audio signal within the parametric representation (314,316,318,320), the left/right coherence measure representing a coherence information between at least one channel pair including a first channel only having information from the left side and a second channel only having information from the right side with respect to a listening position; or to operate in a second mode, providing the level information for channel pairs and the coherence information for the same channel pairs; and wherein the processor (312) is operative to supply parametric information for channel pairs in the first mode, the processor (312) being operative to select the level information from the parametric representation (314,316,318,320) and to derive the coherence information for at least one channel pair using the left/right coherence measure, the at least one channel pair including a first channel only having information from the left side and a second channel only having information from the right side; or in the second mode, the processor (312) being operative to select the level information from the parametric representation (314,316,318,320) and to select the coherence information from the parametric representation (314,316,318,320). 24. The decoder as claimed in claim 23, the receiver (310) comprises a mode receiver for selecting a operating mode using received mode information, the mode information indicating the first or the second mode to be used. 25. A method for generating a parametric representation (314,316,318,320) of an audio signal having at least two original left channels (224a,224b) and at least two original right channels ( 224c,224d) with respect to a listening position, the method comprising: generating parametric information by separately processing several pairs of channels to derive a level information for processed channel pairs and by deriving coherence information for a channel pair including a first channel only having information from the left side and a second channel only having information from the right side, and providing the parametric representation (314,316,318,320) by selecting level information for travel pairs and by determining a left/right coherence measure using the coherence information and introducing the left/right coherence measure into an output datastream as the only coherence information of the audio signal within the parametric representation (314,316,318,320). !6. A method for processing a parametric representation (314,316,318,320) of an original audio signal, the original audio signal having at least two original left :hannels (224a,224b) on the left side and at least two original right channels 224c,224d) on the right side with respect to a listening position, the method :omprising: providing the parametric representation (314,316,318,320) of the audio signal by providing a level information for channel pairs and by providing a left/right coherence measure for a channel pair including a left channel and a right channel as the only coherence information of the audio signal within the parametric representation (314,316,318,320) , the left/right coherence measure epresenting a coherence information between at least one channel pair including a first channel only having information from the left side and a second channel only having information from the right side; and supplying parametric information for channel pairs by selecting level information rom the parametric representation (314,316,318,320) and by deriving coherence information for at least one channel pair using the left/right coherence neasure, the at least one channel pair including a first channel only having information from the left side and a second channel only having information from the right side. 17. A receiver or audio player having a decoder as claimed in claim 14. 28. A transmitter or audio recorder having an encoder as claimed in claim 1. 29. A method of receiving or audio playing, the method having a method as claimed in claim 26. 30. A method of transmitting or audio recording, the method having a method as claimed in claim 25. 31. A transmission system including a transmitter as claimed in claim 28 and a receiver as claimed in claim 27. 32. A method of transmitting and receiving, the method of transmitting having a method as claimed in claim 25 and the method of receiving having a method as claimed in claim 26. Abstract Title: " An encoder for generating a parametric representation of an audio signal" The invention relates to an encoder for generating a parametric representation (314,316,318,320) of an audio signal having at least two original left channels (224a,224b) on a left side and two original right channels (224c,224d) on a right side with respect to a listening position, comprising a generator (220) for generating parametric information, the generator being operative to separately process several pairs of channels to derive a level information (230a,230b) for processed channel pairs, and to derive coherence information (232a,232b) for a channel pair including a first channel (228a) only having information from the left side and a second channel (228b) only having information from the right side; and a provider (222) for providing the parametric representation (238,314,316,318,320) by selecting the level information (230a,230b) for channel pairs and by determining a left/right coherence measure (236) using the coherence information (232a,232b) and to introduce the left/right coherence measure (236) into an output datastream as the only coherence information of the audio signal within the parametric representation (238;314,316,318,320).

Full Text

Multi-Channel Hierarchical Audio Coding with Compact Side-
Information
Field of the invention
The present invention relates to multi-channel audio proc-
essing and, in particular, to the generation and the use of
compact parametric side information to describe the spatial
properties of a multi-channel audio signal.
Background of the invention and prior art
In recent times, the multi-channel audio reproduction tech-
nique is becoming more and more important. This may be due
to the fact that audio compression/encoding techniques such
as the wel1-known mp3 technique have made it possible to
distribute audio records via the Internet or other trans-
mission channels having a limited bandwidth. The mp3 coding
technique has become so famous because of the fact that it
allows distribution of all the records in a stereo format,
i.e., a digital representation of the audio record includ-
ing a first or left stereo channel and a second or right
stereo channel.
Nevertheless, there are basic shortcomings of conventional
two-channel sound systems. Therefore, the surround tech-
nique has been developed. A recommended multi-channe1-
surround presentation format includes, in addition to two
stereo channels L and R, an additional center channel C and
two surround channels Ls, Rs. This reference sound format
is also referred to as three/two-stereo, which means three
front channels and two surround channels. In a playback en-
vironment, at least five speakers at five appropriate loca-
tions are needed to get an optimum sweet spot in a certain
distance of the five wel1-placed loudspeakers.

Recent approaches for the parametric coding of multi-
channel audio signals (parametric stereo (PS), "spatial au-
dio coding", "binaural cue coding" (BCC) etc.) represent a
multi-channel audio signal by means of a downmix signal
(could be monophonic or comprise several channels) and pa-
rametric side information ("spatial cues"), characterizing
its perceived spatial sound stage. The different approaches
and techniques shall be reviewed shortly in the following
paragraphs.
A related technique, also known as parametric stereo, is
described in J. Breebaart, S. van de Par, A. Kohlrausch, E.
Schuijers, "High-Quality Parametric Spatial Audio Coding at
Low Bitrates", AES 116th Convention, Berlin, Preprint 6072,
May 2004, and E. Schuijers, J. Breebaart, H. Purnhagen, J.
Engdegard, "Low Complexity Parametric Stereo Coding", AES
116th Convention, Berlin, Preprint 6073, May 2004.
Several techniques are known in the art for reducing the
amount of data required for transmission of a multi-channel
audio signal. To this end, reference is made to Fig. 11,
which shows a joint stereo device 60. This device can be a
device implementing e.g. intensity stereo (IS) or binaural
cue coding (BCC). Such a device generally receives - as an
input - at least two channels (CH1, CH2, ... CHn) , and out-
puts a single carrier channel and parametric data. The pa-
rametric data are defined such that, in a decoder, an ap-
proximation of an original channel (CH1, CH2, ... CHn) can be
calculated.
Normally, the carrier channel will include subband samples,
spectral coefficients, time domain samples etc., which pro-
vide a comparatively fine representation of the underlying
signal, while the parametric data does not include such
samples of spectral coefficients but include control pa-
rameters for controlling a certain reconstruction algorithm
such as weighting by multiplication, time shifting, fre-
quency shifting, phase shifting, etc. The parametric data,

therefore, includes only a comparatively coarse representa-
tion of the signal or the associated channel. Stated in
numbers, the amount of data required by a carrier channel
can be in the range of 60 - 70 kbit/s in an MPEG coding
scheme, while the amount of data required by parametric
side information for one channel may be in the range of
about 10 kbit/s for a 5.1 channel signal. An example for
parametric data are the wel1-known scale factors, intensity
stereo information or binaural cue parameters as will be
described below.
The BCC Technique is for example described in the AES con-
vention paper 5574, "Binaural Cue Coding applied to Stereo
and Multi-Channel Audio Compression", C. Faller,
F. Baumgarte, May 2002, Munich, in the IEEE WASPAA Paper
"Efficient representation of spatial audio using perceptual
parametrization", October 2001, Mohonk, NY, and in the
2 ICASSP Papers "Estimation of. auditory spatial cues for
binaural cue coding", and "Binaural cue coding: a novel and
efficient representation of spatial audio", both authored
by C. Faller, and F. Baumgarte, Orlando, FL, May 2002.
In BCC encoding, a number of audio input channels are con-
verted to a spectral representation using a DFT (Discrete
Fourier Transform) based transform with overlapping win-
dows. The resulting spectrum is divided into non-
overlapping partitions. Each partition has a bandwidth pro-
portional to the equivalent rectangular bandwidth (ERB).
The inter-channel level differences (ICLD) and the inter-
channel time differences (ICTD) are estimated for each par-
tition. The inter-channel level differences ICLD and inter-
channel time differences ICTD are normally given for each
channel with respect to a reference channel and furthermore
quantized. The transmitted parameters are finally calcu-
lated in accordance with prescribed formulae (encoded) ,
which may depend on the specific partitions of the signal
to be processed.

At a decoder-side, the decoder receives a mono signal and
the BCC bit stream. The mono signal is transformed into the
frequency domain and input into a spatial synthesis block,
which also receives decoded ICLD and ICTD values. In the
spatial synthesis block, the BCC parameters (ICLD and ICTD)
values are used to perform a weighting operation of the
mono signal in order to synthesize the multi-channel sig-
nals, which, after a frequency/time conversion, represent a
reconstruction of the original multi-channel audio signal.
In case of BCC, the joint stereo module 60 is operative to
output the channel side information such that the paramet-
ric channel data are quantized and encoded resulting in
ICLD or ICTD parameters, wherein one of the original chan-
nels is used as the reference channel while coding the
channel side information.
Normally, the carrier channel is formed of the sum of the
participating original channels.
Therefore, the above techniques additionally provide a
suitable mono representation for playback equipment that
can only process the carrier channel and is not able to
process the parametric data for generating one or more ap-
proximations of more than one input channel.
The audio coding technique known as binaural cue coding
(BCC) is also well described in the United States patent
application publications US 2003, 0219130 Al, 2003/0026441
Al and 2003/0035553 A1. Additional reference is also made
to "Binaural Cue Coding. Part II: Schemes and Applica-
tions", C. Faller and F. Baumgarte, IEEE Trans, on Audio
and Speech Proc, Vol. 11, No. 6, Nov. 2003 and to "Binau-
ral cue coding applied to audio compression with flexible
rendering", C. Faller and F. Baumgarte, AES 113th Conven-
tion, Los Angeles, October 2002. The cited United States
patent application publications and the two cited technical

publications on the BCC technique authored by Faller and
Baumgarte are incorporated herein by reference in their en-
tireties.
Although ICLD and ICTD parameters represent the most impor-
tant sound source localization parameters, a spatial repre-
sentation using these parameters only limits the maximum
quality that can be achieved. To overcome this limitation,
and hence to enable high-quality parametric coding,
Parametric stereo (as described in J. Breebaart, S. van de
Par, A. Kohlrausch, E. Schuijers (2005) "Parametric coding
of stereo audio", Eurasip J. Applied Signal Proc. 9, 1305-
1322) applies three types of spatial parameters, referred
to as Interchannel Intensity Differences (IIDs), Interchan-
nel Phase Differences (IPDs), and Interchannel Coherence
(IC) . The extension of the spatial parameter set with co-
herence parameters enables a parameterization of the per-
ceived spatial Miffuseness' or spatial ^compactness' of
the sound stage.
In the following, a typical generic BCC scheme for multi-
channel audio coding is elaborated in more detail with ref-
erence to Figures 12 to 14. Figure 9 shows such a generic
binaural cue coding scheme for coding/transmission of
multi-channel audio signals. The multi-channel audio input
signal at an input 110 of a BCC encoder 112 is downmixed in
a downmix block 114. In the present example, the original
multi-channel signal at the input 110 is a 5-channel sur-
round signal having a front left channel, a front right
channel, a left surround channel, a right surround channel
and a center channel. In a preferred embodiment of the pre-
sent invention, the downmix block 114 produces a sum signal
by a simple addition of these five channels into a mono
signal. Other downmixing schemes are known in the art such
that, using a multi-channel input signal, a downmix signal
having a single channel can be obtained. This single chan-
nel is output at a sum signal line 115. A side information
obtained by a BCC analysis block 116 is output at a side

information line 117. in the BCC analysis block, inter-
channel level differences (ICLD), and inter-channel time
differences (ICTD) are calculated as has been outlined
above. The BCC analysis block 116 is formed to also calcu-
late inter-channel correlation values (ICC values). The sum
signal and the side information is transmitted, preferably
in a quantized and encoded form, to a BCC decoder 120. The
BCC decoder decomposes the transmitted sum signal into a
number of subbands and applies scaling, delays and other
processing to generate the subbands of the output multi-
channel audio signals. This processing is performed such
that ICLD, ICTD and ICC parameters (cues) of a recon-
structed multi-channel signal at an output 121 are similar
to the respective cues for the original multi-channel sig-
nal at the input 110 of the BCC encoder 112. To this end,
the BCC decoder 120 includes a BCC synthesis block 122 and
a side information processing block 123.
In the following, the internal construction of the BCC syn-
thesis block 122 is explained with reference to Fig. 13.
The sum signal on line 115 is input into a time/frequency
conversion unit or filter bank FB 125. At the output of
block 125, a number N of sub band signals are present, or,
in an extreme case, a block of spectral coefficients, when
the audio filter bank 125 performs a 1:1 transform, i.e., a
transform which produces N spectral coefficients from N
time domain samples (critical subsampling).
The BCC synthesis block 122 further comprises a delay stage
126, a level modification stage 127, a correlation process-
ing stage 128 and an inverse filter bank stage IFB 129. At
the output of stage 129, the reconstructed multi-channel
audio signal having for example five channels in case of a
5-channel surround system, can be output to a set of loud-
speakers 124 as illustrated in Fig. 12.
As shown in Fig. 13, the input signal s(n) is converted
into the frequency domain or filter bank domain by means of

element 125. The signal output by element 125 is multiplied
such that several versions of the same signal are obtained
as illustrated by branching node 130. The number of ver-
sions of the original signal is equal to the number of out-
put channels in the output signal to be reconstructed.
When, in general, each version of the original signal at
node 130 is subjected to a certain delay d1, d2, ..., d1, ...,
dN. The delay parameters are computed by the side informa-
tion processing block 123 in Fig. 12 and are derived from
the inter-channel time differences as determined by the BCC
analysis block 116.
The same is true for the multiplication parameters a1, a2,
..., ai., ..., aN, which are also calculated by the side infor-
mation processing block 123 based on the inter-channel
level differences as calculated by the BCC analysis block
116.
The ICC parameters calculated by the BCC analysis block 116
are used for controlling the functionality of block 128
such that certain correlations between the delayed and
leve1-manipulated signals are obtained at the outputs of
block 128. It is to be noted here that the ordering of the
stages 126, 127, 128 may be different from the case shown
in Fig. 13.
One should be aware that, in a frame-wise processing of an
audio signal, the BCC analysis is also performed frame-
wise, i.e. time-varying, and also frequency-wise. This
means that, for each spectral band, the BCC parameters are
obtained individually. This further means that, in case the
audio filter bank 125 decomposes the input signal into for
example 32 band pass signals, the BCC analysis block ob-
tains a set of BCC parameters for each of the 32 bands.
Naturally the BCC synthesis block 122 from Fig. 12, which
is shown in detail in Fig. 13, performs a reconstruction,
which is also based on the 32 bands in the example.

In the following, reference is made to Fig. 14 showing a
setup to determine certain BCC parameters. Normally, ICLD,
ICTD and ICC parameters can be defined between arbitrary
pairs of channels. One method, that will be outlined here,
consists of ICLD and ICTD parameters between a reference
channel and each other channel. This is illustrated in Fig.
14A.
ICC parameters can be defined in different ways. Most gen-
erally, one could estimate ICC parameters in the encoder
between all possible channel pairs as indicated in Fig.
14B. In this case, a decoder would synthesize ICC such that
it is approximately the same as in the original multi-
channel signal between all possible channel pairs. It was,
however, proposed to estimate only ICC parameters between
the strongest two channels at a time. This scheme is illus-
trated in Fig. 14C, where an example is shown, in which at
one time instance, an ICC parameter is estimated between
channels 1 and 2, and, at another time instance, an ICC pa-
rameter is calculated between channels 1 and 5. The decoder
then synthesizes the inter-channel correlation between the
strongest channels in the decoder and applies some heuris-
tic rule for computing and synthesizing the inter-channel
coherence for the remaining channel pairs.
Regarding the calculation of, for example, the multiplica-
tion parameters a1, ...,aN based on transmitted ICLD parame-
ters, reference is made to AES convention paper 5574 cited
above. The ICLD parameters represent an energy distribution
in an original multi-channel signal. Without loss of gener-
ality, it is shown in Fig. 14A that there are four ICLD pa-
rameters showing the energy difference between all other
channels and the front left channel. In the side informa-
tion processing block 123, the multiplication parameters
a1, ..., aN are derived from the ICLD parameters such that the
total energy of all reconstructed output channels is the
same as (or proportional to) the energy of the transmitted
sum signal. A simple way for determining these parameters

is a 2-stage process, in which, in a first stage, the mul-
tiplication factor for the left front channel is set to
unity, while multiplication factors for the other channels
in Fig. 14A are determined from the transmitted ICLD val-
ues. Then, in a second stage, the energy of all five chan-
nels is calculated and compared to the energy of the trans-
mitted sum signal. Then, all channels are downscaled using
a downscaling factor which is equal for all channels,
wherein the downscaling factor is selected such that the
total energy of all reconstructed output channels is, after
downscaling, equal to the total energy of the transmitted
sum signal.
Naturally, there are also other methods for calculating the
multiplication factors, which do not rely on the 2-stage
process but which only need a 1-stage process.
Regarding the delay parameters, it is to be noted that the
delay parameters ICTD, which are transmitted from a BCC en-
coder can be used directly, when the delay parameter d1 for
the left front channel is set to zero. No rescaling has to
be done here, since a delay does not alter the energy of
the signal.
As has been outlined above with respect to Fig. 14, the pa-
rametric side information, i.e., the interchannel level
differences (ICLD), the interchannel time differences
(ICTD) or the interchannel coherence parameter (ICC) can be
calculated and transmitted for each of the five channels.
This means that one, normally, transmits four sets of in-
terchannel level differences for a five channel signal. The
same is true for the interchannel time differences. With
respect to the interchannel coherence parameter, it can
also be sufficient to only transmit for example two sets of
these parameters.

As has been outlined above with respect to Fig. 13, there
is not a single level difference parameter, time difference
parameter or coherence parameter for one frame or time por-
tion of a signal. Instead, these parameters are determined
for several different frequency bands so that a frequency-
dependent parametrization is obtained. Since it is pre-
ferred to use for example 32 frequency channels, i.e., a
filter bank having 32 frequency bands for BCC analysis and
BCC synthesis, the parameters can occupy quite a lot of
data. Although - compared to other multi-channel transmis-
sions - the parametric representation results in a quite
low data rate, there is a continuing need for further re-
duction of the necessary data rate to represent a signal
having more than two channels such as a multi-channel sur-
round signal.
The encoding of a multi-channel audio signal can be advan-
tageously implemented using several existing modules, which
perform a parametric stereo coding into a single mono-
channel. The international patent application
WO2004008805 A1 teaches how parametric stereo coders can be
ordered in a hierarchical set-up such, that a given number
of input audio channels are subsequently downmixed into one
single mono-channel. The parametric side information, de-
scribing the spatial properties of the downmix mono-
channel, finally consists of all the parametric information
subsequently produced during the iterative downmixing pro-
cess. This means, that, if there are, for example, three
stereo-to-mono downmixing processes involved in building
the final mono signal, the final set of parameters building
the parametric representation of the multi-channel audio
signal consists of the three sets of the parameters derived
during every single stereo-to-mono downmixing process.
A hierarchical downmixing encoder is shown in Fig. 15, to
explain the method of the prior art in more detail. Fig. 15
shows six original audio channels 200a to 200f that are
transformed into a single monophonic audio channel 202 plus

parametric side information. Therefore, the six original
audio channels 200a to 200f have to be transformed from the
time domain into the frequency domain, which is performed
by transforming units 204, transforming the audio chan-
nels 200a to 200f into the corresponding channels 206a to
206f in the frequency domain. Following the hierarchical
approach, the channels 206a to 206f are pair-wise downmixed
into three monophonic channels L, R and C (208a, 208b and
208c, respectively). During the downmixing of the three
pairs of channels a parameter set is derived for each chan-
nel pair, describing the spatial properties of the original
stereophonic signal, downmixed into a monophonic signal.
Thus, in this first downmixing step, three parameter sets
210a to 210c are generated to preserve the spatial informa-
tion of the signals 206a to 206f.
In the next step of the hierarchical downmixing, chan-
nels 208a and 208b are downmixed into a channel 212 (LR) ,
generating a parameter set 210d (parameter set 4. To fi-
nally derive only one single monophonic channel, a downmix-
ing of the channels 208c and 212 is necessary, resulting in
channel 214 (M) . This generates a fifth parameter set 210e
(parameter set 5). Finally, the downmixed monophonic audio
signal 214 is inversely transformed into the time domain to
derive an audio signal 202 that can be played by standard
equipment.
As described above, a parametric representation of the
downmix audio signal 202 according to the prior art con-
sists of all the parameter sets 210a to 210e, which means
that if one wants to rebuild the original multi-channel au-
dio signal (channels 200a to 200f) from the monophonic au-
dio signal 202, all the parameter sets 210a to 210e are re-
quired as side information of the monophonic downmix signal
202.
The US-patent application 11/032,689 (from here only re-
ferred to as "prior art cue combination") describes a proc-

ess for combining several cue values into a single trans-
mitted one in order to save side information in a non-
hierarchical coding scheme. To do so, all the channels are
downmixed first and the cue codes are later on combined to
form transmitted cue values (could also be one single
value), the combination being dependent on a predefined
mathematical function, in which the spatial parameters,
that are derived directly from the input signals, are put
in as variables.
State-of-the-art techniques for the parametric coding of
two ("stereo") or more ("multi-channel") audio input chan-
nels derive the spatial parameters directly from the input
signals. Examples of such parameters are inter-channel
level differences (ICLD) or inter-channel intensity differ-
ences (IID), inter-channel time delay (ICTD) or inter-
channel phase differences (IPD), and inter-channel correla-
tion/coherence (ICC), each of which are transmitted in a
frequency-selective fashion, i.e. per frequency band. The
application of the prior art cue combination teaches that
several cue values can be combined to a single value that
is transmitted from the encoder to the decoder side. The
decoding process uses the transmitted single value instead
of the originally individually transmitted cue values to
reconstruct the multi-channel output signal. In a preferred
embodiment, this scheme has been applied to the ICC parame-
ters. It has been shown that this leads to a considerable
reduction in the size of the cue side information while
preserving the spatial quality of the vast majority of sig-
nals. It is, however, not clear how this can be exploited
in a hierarchical coding scheme.
The patent application on prior art cue combination has de-
tailed the principle of the invention by an example for a
system based on two transmitted downmix channels. In the
proposed method, with reference to figure 15, ICC values of
Lf/Lr and Rf/Rr channel pairs are combined into a single
transmitted ICC parameter. The two combined ICC values have

been obtained during the downmixing of a front-left channel
Lf and a rear-left channel Lr into the channel L and during
the downmixing of a front-right Rf and a rear-right channel
Rr into the channel R. Therefore, the two combined ICC val-
ues that are finally being combined into the single trans-
mitted ICC parameter, both carry information about the
front/back correlation of the original channels and a com-
bination of these two ICC values will generally preserve
most of this information. If one would have to further
downmix the L and R channels into one single mono channel,
one would get a third ICC value, carrying information about
the left/right correlation of the downmix channels L and R.
According to the cue combination of prior art, one would
now have to combine the three ICC values applying a given
function transforming the three ICC values into one
transmitted ICC parameter.
One has the problem then that front/back information mixes
with left/right information, which is obviously disadvanta-
geous for a reproduction of the original multi-channel au-
dio signal. In the US-application 11/032,689, this is
avoided by transmitting two downmix channels, the L and R
channels, that hold the left/right information, and addi-
tionally transmitting one single ICC value, holding
front/back information. This preserves the spatial proper-
ties of the original channels at the cost of a substan-
tially increased data rate, resulting from the full addi-
tional downmix channel to be transmitted.
Summary of the invention
It is the object of the present invention to provide an im-
proved concept to generate and to use a parametric repre-
sentation of a multi-channel audio signal with compact side
information in the context of a hierarchical coding scheme
In accordance with the first aspect of the present inven-
tion, this object is achieved by an encoder for generating

a parametric representation of an audio signal having at
least two original left channels on a left side and two
original right channels on a right side with respect to a
listening position, comprising: a generator for generating
parametric information, the generator being operative to
separately process several pairs of channels to derive a
level information for processed channel pairs, and to de-
rive coherence information for a channel pair including a
first channel only having information from the left side
and a second channel only having information from the right
side, and a provider for providing the parametric represen-
tation by selecting the level information for channel pairs
and determining a left/right coherence measure using the
coherence information.
In accordance with a second aspect of the present inven-
tion, this object is achieved by a decoder for processing a
parametric representation of an original audio signal, the
original audio signal having at least two original left
channels on a left side and at least two original right
channels on a right side with respect to a listening posi-
tion, comprising: a receiver for providing the parametric
representation of the audio signal, the receiver being op-
erative to provide level information for channel pairs and
to provide a left/right coherence measure for a channel
pair including a left channel and a right channel, the
left/right coherence measure representing a coherence in-
formation between at least one channel pair including a
first channel only having information from the left side
and a second channel only having information from the right
side; and a processor for supplying parametric information
for channel pairs, the processor being operative to select
level information from the parametric representation and to
derive coherence information for at least one channel pair
using the left/right coherence measure, the at least one
channel pair including a first channel only having informa-
tion from the left side and a second channel only having
information from the right side.

In accordance with a third aspect of the present invention,
this object is achieved by a method for generating a para-
metric representation of an audio signal.
In accordance with a fourth aspect of the present inven-
tion, this object is achieved by a computer program imple-
menting the above method, when running on a computer.
In accordance with a fifth aspect of the present invention,
this object is achieved by a method for processing a para-
metric representation of an original audio signal.
In accordance with a sixth aspect of the present invention,
this object is achieved by a computer program implementing
the above method, when running on a computer.
In accordance with a seventh aspect of the present inven-
tion, this object is achieved by encoded audio data gene-
rated by building a parametric representation of an audio
signal having at least two original left channels on a left
side and two original right channels on a right side with
respect to a listening position, wherein the parametric
representation comprises level differences for channel
pairs and a left/right coherence measure derived from co-
herence information from a channel pair including a first
channel only having information from the left side and a
second channel only having information from the right side.
The present invention is based on the finding that a para-
metric representation of a multi-channel audio signal sde-
scribes the spatial properties of the audio signal well us-
ing compact side information, when the coherence informa-
tion, describing the coherence between a first and a second
channel, is derived within a hierarchical encoding process
only for channel pairs including a first channel having
only information of a left side with respect to a listening
position and including a second channel having only infor-

mation from a right side with respect to a listening posi-
tion. As in the hierarchical process the multiple audio
channels of the original audio signal are downmixed itera-
tively preferably into a monophonic channel, one has the
chance to pick the relevant side-information parameters
during the encoding process for a step involving only chan-
nel pairs that bear the desired information needed to de-
scribe the spatial properties of the original audio signal
as good as possible. This allows to build a parametric rep-
resentation of the original audio signal on the basis of
those picked parameters or on a combination of those pa-
rameters, allowing a significant reduction of the size of
the side information, that is holding the spatial informa-
tion of the downmix signal.
The proposed concept allows combining cue values to reduce
the side information rate of a downmix audio signal even
for the case where only a single (monophonic) transmission
channel is feasible. The inventive concept even allows
different hierarchical topologies of the encoder. It is
specifically clarified, how a suitable single ICC value can
be derived, which can be applied in a spatial audio decoder
using the hierarchical encoding/decoding approach to repro-
duce the original sound image faithfully.
One embodiment of the present invention implements a hier-
archical encoding structure that combines the left front
and the left rear audio channel of a 5.1 channel audio sig-
nal into a left master channel and that simultaneously com-
bines the right front and the right rear channel into a
right master channel. Combining the left channels and the
right channels separately, the important left/right coher-
ence information is mainly preserved and is, according to
the invention, derived in the second encoding step, in
which the left master and the right master channels are
downmixed into a stereo master channel. During this down-
mixing process the ICC parameter for the whole system is
derived, since this ICC parameter will be the ICC parameter

resembling with most accuracy the left/right coherence.
Within this embodiment of the present invention, one gets
an ICC parameter, describing the most important left/right
coherence of the six audio channels by simply arranging the
hierarchical encoding steps in an appropriate way and not
by applying some artificial function to a set of ICC pa-
rameters, describing arbitrary pairs of channels, as it is
the case in the prior art techniques.
In a modification of the described embodiment of the pre-
sent invention, the center channel and the low frequency
channel of the 5.1 audio signal are downmixed into a center
master channel, this channel holding mainly information
about the center channel, since the low frequency channel
contains only signals with such a low frequency that the
origin of the signals can hardly be localized by humans. It
can be advantageous to additionally steer the ICC value,
derived as described above, by parameters describing the
center master channel. This can be done, for example, by
weighting the ICC value with energy information, the energy
information telling how much energy is transmitted via the
center master channel with respect to the stereo master
channel.
In a further embodiment of the present invention, the hier-
archical encoding process is performed such, that in a
first step the left-front and right-front channels of a 5.1
audio signal are downmixed into a front master channel,
whereas the left-rear and the right-rear channels are down-
mixed into a rear master channel. Therefore, in each of the
downmixing processes an ICC value is generated, containing
information about the important left/right coherence. The
combined and transmitted ICC parameter is then derived from
a combination of the two separate. ICC values, an advanta-
geous way of deriving the transmitted ICC parameter is to
build the weighted sum of the ICC values, using the level
parameters of the channels as weights.

In a modification of the invention, the center channel and
the low frequency channel are downmixed into a center mas-
ter channel and afterwards the center master channel and
the front master channel are downmixed into a stereo master
channel. In the latter downmixing process, a correlation
between the center and the stereo channels is received,
which is used to steer or modify a transmitted ICC parame-
ter, thus also taking into account the center contribution
to the front audio signal. A major advantage of the previ-
ously described system is that one can build the coherence
information such that channels, that contribute most to the
audio signal, mainly define the transmitted ICC value. This
will normally be the front channels, but for example in a
multi-channel representation of a music concert, the signal
of the applauding audience could be emphasized by mainly
using the ICC value of the rear channels. It is a further
advantage that the weighting between the front and the back
channels can be varied dynamically, depending on the spa-
tial properties of the multi-channel audio signal.
In one embodiment of the present invention an inventive hi-
erarchical decoder is operative to receive less ICC parame-
ters than required by the number of existing decoding
steps. The decoder is operational to derive the ICC parame-
ters required for each decoding step from the received ICC
parameters.
This might be done deriving the additional ICC parameters
using a deriving rule that is based on the received ICC pa-
rameters and the received ICLD values or by using prede-
fined values instead.
In a preferred embodiment, however, the decoder is opera-
tional to use a single transmitted ICC parameter for each
individual decoding step. This is advantageous as the most
important correlation, the left/right correlation is pre-
served in a transmitted ICC parameter within the inventive
concept. As this is the case, a listener will experience a

reproduction of the signal that is resembling the original
signal very well. It is to be remembered that the ICC pa-
rameter is defining the perceptual wideness of a recon-
structed signal. If the decoder would modify a transmitted
ICC parameter after transmission, the ICC parameters de-
scribing the perceptual wideness of the reconstructed sig-
nal may become rather different for the left/right and for
the front/back correlation within the hierarchical repro-
duction. This would be most disadvantageous since then, a
listener that moves or rotates his head will experience a
signal that becomes perceptually wider or narrower, which
is of course most disturbing. This can be avoided by dis-
tributing a single received ICC parameter to the decoding
units of a hierarchical decoder.
In another preferred embodiment, an inventive decoder is
operational to receive a full set of ICC values or alterna-
tively a single ICC value, wherein the decoder recognizes
the decoding strategy to apply by receiving a strategy in-
dication within the bitstream. Such the backwards compati-
ble decoder is also operational in prior art environments,
decoding prior art signals transmitting a full set of ICC
data.
Brief description of the ACCOMPANYING drawings
Preferred embodiments of the present invention are subse-
quently described by referring to the enclosed drawings,
wherein:
Fig. 1 shows a block diagram of an embodiment of the in-
ventive hierarchical audio encoder;
Fig. 2 shows an embodiment of an inventive audio en-
coder;

Fig. 2a shows a possible steering scheme of an IIC pa-
rameters of an inventive audio encoder;
Fig. 3a,b shows graphical representations of side channel
information;
Fig. 4 shows a second embodiment of an inventive audio
encoder;
Fig. 5 shows a block diagram of a preferred embodiment
of an inventive audio decoder;
Fig. 6 shows an embodiment of an inventive audio de-
coder;
Fig. 7 shows another embodiment of an inventive audio
decoder;
Fig. 8 shows an inventive transmitter or audio recorder;
Fig. 9 shows an inventive receiver or audio player;
Fig. 10 shows an inventive transmission system;
Fig. 11 shows a prior art joint stereo encoder;
Fig. 12 shows a block diagram representation of a prior
art BCC encoder/decoder chain;
Fig. 13 shows a block diagram of a prior art implementa-
tion of a BCC synthesis block;
Fig. 14 shows a representation of a scheme for determin-
ing BCC parameters; and
Fig. 15 shows a prior art hierarchical encoder.

Detailed Description of Preferred Embodiments
Fig. 1 shows a block diagram of an inventive encoder to
generate a parametric representation of an audio signal.
Fig. 1 shows a generator 220 to subsequently combine audio
channels and generate spatial parameters describing spatial
properties of pairs of channels that are combined into a
single channel. Fig. 1 further shows a provider 222 to pro-
vide a parametric representation of a multi-channel audio
signal by selecting level difference information between
channel pairs and by determining a left/right coherence
measure using coherence information generated by the gen-
erator 220.
To demonstrate the principle of the inventive concept of
hierarchical multi-channel audio coding, Fig. 1 shows a
case, where four original audio channels 224a to 224d are
iteratively combined, resulting in a single channel 226.
The original audio channels 224a and 224b represent the
left-front and the left-rear channel of an original four-
channel audio signal, the channels 224c and 224d represent
the right-front and the right-rear channel, respectively.
Without loss of generality, only two of various spatial pa-
rameters are shown in Fig. 1 (ICLD and ICC) . According to
the invention, the generator 220 combines the audio chan-
nels 224a to 224d in such a way that during the combination
process an ICC parameter can be derived that carries the
important left/right coherence information.
In a first step, the channels containing only left side in-
formation 224a and 224b are combined into a left master
channel 228a (L) and the two channels containing only right
side information 224c and 224d are combined into a right
master channel 228b (R) . During this combination the gen-
erator generates two ICLD parameters 230a and 230b, both
being spatial parameters containing information about the
level difference of two original channels being combined
into one single channel. The generator also generates two

ICC parameters 232a and 232b, describing the correlation
between the two channels being combined into a single chan-
nel. The ICLD and ICC parameters 230a, 230b, 232a, and 232b
are transferred to the provider 222.
In the next step of the hierarchical generation process,
the left master channel 228a is combined with the right
master channel 228b into the resulting audio channel 226,
wherein the generator provides an ICLD parameter 234 and an
ICC parameter 236, both of them being transmitted to the
provider 222. It is important to note that the ICC parame-
ter 236 generated in this combination step mainly repre-
sents the important left/right coherence information of the
original four-channel audio signal represented by the audio
channels 224a to 224d.
Therefore, the provider 222 builds a parametrical represen-
tation 238 from the available spatial parameters 230a,b,
232a,b, 234 and 236 such, that the parametrical representa-
tion comprises the parameters 230a, 230b, 234, and 236.
Fig. 2 shows a preferred embodiment of an inventive audio
encoder that encodes a 5.1 multi-channel signal into a sin-
gle monophonic signal.
Fig. 2 shows three transformation units 240a to 240c, five
2-to-1-downmixers 242a to 242e, a parameter combination
unit 244 and an inverse transformation unit 246. The origi-
nal 5.1 channel audio signal is given by the left-front
channel 248a, the left-rear channel 248b, the right-front
channel 248c, the right-rear channel 248d, the center chan-
nel 248e, and the low-frequency channel 248f. It is impor-
tant to note that the original channels are grouped in such
a way that the channels containing only left side informa-
tion 248a and 248b form one channel pair, the channels con-
taining only right side information 248c and 248d form an-
other channel pair and that the center channel 248e and
248f are forming a third channel pair.

The transformation units 240a to 240c convert the chan-
nels 248a to 248f from the time domain into their spectral
representation 250a to 250f in the frequency subband do-
main. In the first hierarchical encoding step 252, the left
channels 250a and 250b are encoded into a left master chan-
nel 254a, the right channels 250c and 250d are encoded into
a right master channel 254b and the center channel 250e and
the low frequency channel 250f are encoded into a center
master channel 256. During this first hierarchic encoding
step 252, the three involved 2-to-1-encoders 242a to 242c
generate the downmixed channels 254a, 254b, and 256, and in
addition the important spatial parameter sets 260a, 260b,
and 260c, wherein the parameter set 260a (parameter set 1)
describes the spatial information between channels 250a and
250b, the parameter set 260b (parameter set 2) describes
the spatial relation between channels 250c and 250d and the
parameter set 260c (parameter set 3) describes the spatial
relation between channels 250e and 250f.
In a second hierarchical step 262, the left master chan-
nel 254a and the right master channel 254b are downmixed
into a stereo master channel 264, generating a spatial pa-
rameter set 266 (parameter set 4), wherein the ICC parame-
ter, of this parameter set 266 contains the important
left/right correlation information. To build a combined ICC
value from parameter set 266, the parameter set 266 can be
transferred to the parameter combination unit 244 via a
data connection 2 68. In the third hierarchical encoding
step 272, the stereo master channel 264 is combined with
the center master channel 256 to form a monophonic result
channel 274. The parameter set 276, that is derived during
this downmixing process, can be transferred via a data con-
nection 278 to the parameter combination unit 244. Finally,
the result channel 274 is transformed into the time domain
by the inverse transformation unit 246, to build the mono-
phonic downmix audio signal 280, which is the final mono-

phonic representation of the original 5.1 channel signal
represented by the audio channels 248a to 248f.
To reconstruct the original 5.1 channel audio signal from
the monophonic downmix audio channel 280, the parametric
representation of the 5.1 channel audio signal is addition-
ally needed. For the tree structure shown in Fig. 2, it can
be seen that the left front and back channels are combined
into an L-signal 254a. Similarly, the right front and back
channels are combined into an R-signal 254b. Subsequently,
the combination of the L and R-signals is carried out,
which delivers parameter set number 4 (266). In the case of
this hierarchical structure, a simple way of deriving a
combined ICC value is to pick the ICC value of parameter
set number 4 and take this as combined ICC value, which is
then incorporated into the parametric representation of the
5.1 channel signal by the parameter combination unit 244.
More sophisticated methods can also take into account the
influence of the center channel (e.g. by using parameters
from parameter set number 5), as shown in Fig. 2a.
As an example, the energy ratio E(LR)/ E(C) of the energy
contained in the LR (264) channel and in the C channel
(256) from parameter set number 5 can be used to steer the
ICC of value. In case most of the energy comes from the LR
path, the transmitted ICC value should become close to the
ICC value ICC(LR) of parameter set number 4. In case most
of the energy comes from the C-path 256, the transmitted
ICC value should become subsequently close to 1, as indi-
cated in Fig. 2a. The Figure shows two possible ways to im-
plement this steering of the ICC Parameter either by
switching between two extreme values when the energy ratio
crosses a given threshold 286 (steering function 288a) or
by a smooth transition between the extreme values (steering
function 288b).
Figures 3a and 3b show a comparison of a possible paramet-
ric representation of a 5.1 audio channel delivered from a

hierarchical encoder structure using a prior art technique
(Fig. 3a) and using the inventive concept for audio coding
(Fig. 3b).
Fig. 3a shows a parametric representation of a single time
frame and a discrete frequency interval, as it would be
provided by the prior art technique. Each of the 2-to-l en-
coders 242a to 242e from Fig. 2 delivers one pair of ICLD
and ICC parameters, the origin of the parameter pairs is
indicated within Fig. 3a. Following the prior art approach,
all parameter sets, as provided by the 2-to-l encoders 242a
to 242e have to be transmitted together with the downmix
monophonic audio signal 280 as side information to rebuild
a 5.1 channel audio signal.
Fig. 3b shows parameters derived following the inventive
concept. Each of the 2-to-l encoders 242a to 242e contrib-
utes only one parameter directly, the ICLD parameter. The
single transmitted ICC parameter ICCC is derived by the pa-
rameter combination unit 244, and not provided directly by
the 2-to-l encoders 242a to 242e. As it is clearly seen in
the figures 3a and 3b, the inventive concept for a hierar-
chical encoder can reduce the amount of side information
data significantly compared to prior art techniques.
Fig. 4 shows another preferred embodiment of the current
invention, allowing to encode a 5.1 channel audio signal
into a monophonic audio signal in a hierarchical encoding
process and to supply compact side information. As the
principle hardware structure is equal to the one described
in Fig. 2, the same items in the two figures are labeled
with the same numbers. The difference is due to the differ-
ent grouping of the input channels 248a to 248f and hence
the order, in which the single channels are downmixed into
the monophonic channel 274 differs from the downmixing or-
der in Fig.2. Therefore, only the aspects differing from
the description of Fig. 2, which are vital for the under-

standing of the embodiment of the current invention shown
in Fig. 4, are described in the following.
The left-front channel 248a and the right-front chan-
nel 248c are grouped together to form a channel pair, the
center channel 248e and the low-frequency channel 248f form
another input channel pair and the third input channel pair
of the 5.1 audio signal is formed by the left-rear chan-
nel 248b and the right-rear channel 248d.
In a first hierarchical encoding step 252, the left-front
channel 250a and the right-front channel 250c are downmixed
into a front master channel 290 (F), the center chan-
nel 250e and the low-frequency channel 250f are downmixed
into a center master channel 292 (C) and the left-rear
channel 250b and the right-rear channel 250d are downmixed
into a rear master channel 294 (S) . A parameter set 300a
(parameter set 1) describes the front master channel 290, a
parameter set 300b (parameter set 2) describes the center
master channel 292, and a parameter set 300c (parameter
set 3) describes the rear master channel 294.
It is important to note that the parameter set 300a as well
as the parameter set 300c hold information that describes
the important left/right correlation between the original
channels 248a to 248f. Therefore, parameter set 300a and
parameter set 300c is made available to the parameter com-
bination unit 244 via data links 302a and 302b.
In a second encoding step 262, the front master channel 290
and the center master channel 292 are downmixed into a pure
front channel 304, generating a parameter set 300d (parame-
ter set 4). This parameter set 300d is also made available
to the parameter combination unit 244 via a data link 306.
In a third hierarchical encoding step 272, the pure front
channel 304 is downmixed with the rear master channel 294
into the result channel 274 (M), which is then transformed

into the time domain by the inverse transformation unit 24 6
to form the final monophonic downmix audio channel 280. The
parameter set 300e (Parameter Set 5), originating from the
downmixing of the pure front channel 304 and the rear mas-
ter channel 294 is also made available to the parameter
combination unit 244 via a data link 310.
The tree structure in Fig. 4 first performs a combination
of the left and right channels separately for front and
rear. Thus, basic left/right correlation/coherence is pre-
sent in the parameter sets 1 and 3 (300a, 300c) . A combined
ICC value could be built by the parameter combination
unit 244 by building the weighted average between the ICC
values of parameter sets 1 and 3. This means that more
weight will be given to stronger channel pairs (Lf/Rf ver-
sus Lr/Rr). One can achieve the same by deriving a combined
ICC Parameter ICCC building the weighted sum:
ICCC = (A*ICC1 + B*ICC2)/(A+B)
wherein A denotes the energy within the pair of channels
corresponding to ICC1 and B denotes the energy within the
pair of channels corresponding to ICC2.
In an alternative embodiment, more sophisticated methods
can also take into account the influence of the center
channel (e.g. by taking into account parameters of the pa-
rameter set number 4).
Fig. 5 shows an inventive decoder, to process received com-
pact side information, being a parametric representation of
an original four-channel audio signal. Fig. 5 comprises a
receiver 310 to provide a compact parametric representation
of the four-channel audio signal and a processor 312 to
process the. compact parametric representation such that a
full parametric representation of the four-channel audio
signal is supplied, which enables one to reconstruct the

four-channel audio signal from a received monophonic audio
signal.
The receiver 310 receives the spatial parameters ICLD
(B) 314, ICLD (F) 316, ICLD (R) 318 and ICC 320. The pro-
vided parametric representation, consisting of the parame-
ters 314 to 320, describes the spatial properties of the
original audio channels 324a to 324d.
As a first up-mixing step, the processor 312 supplies the
spatial parameters describing a first channel pair 326a,
being a combination of two channels 324a and 324b (Rf and
Lf) and a second channel pair 326b, being a combination of
two channels 324c and 324d (Rr and Lr). To do so, the level
difference 314 of the channel pairs is required. Since both
channel pairs 326a and 326b contain a left channel as well
as a right channel, the difference between the channel
pairs describes mainly a front/back correlation. Therefore,
the received ICC parameter 320, carrying mainly information
about the left/right coherence, is provided by the proces-
sor 312 such that the left/right coherence information is
preferably used to supply the individual ICC parameters for
the channel pairs 326a and 326b.
In the next step, the processor 312 supplies appropriate
spatial parameters to be able to reconstruct the single au-
dio channels 324a and 324b from channel 326a, and the chan-
nels 324c and 324d from channel 326b. To do so, the proces-
sor 312 supplies the level differences 316 and 318, and
the processor 312 has to supply appropriate ICC values for
the two channel pairs, since each of the channel pairs 326a
and 326b contains important left/right coherence informa-
tion.
In one example, the processor 312 could simply provide the
combined received ICC value 320 to up-mix channel
pairs 326a and 326b. Alternatively, the received combined
ICC value 320 could be weighted to derive individual ICC

values for the two channel pairs, the weights being for ex-
ample based on the level difference 314 of the two channel
pairs.
In a preferred embodiment of the present invention, the
processor provides the received ICC parameter 320 for every
single upmixing step to avoid the introduction of addi-
tional artefacts during the reproduction of the channels
324a to 324d.
Fig. 6 shows a preferred embodiment of a decoder incorpo-
rating a hierarchical decoding procedure according to the
current invention, to decode a monophonic audio signal to a
5.1 multi-channel audio signal, making use of a compact pa-
rametric representation of an original 5.1 audio signal.
Fig. 6 shows a transforming unit 350, a parameter-
processing unit 352, five 1-to-2 decoders 354a to 354e and
three inverse transforming units 356a to 356c.
It should be noted that the embodiment of an inventive de-
coder according to Fig. 6 is the counterpart of the encoder
described in Fig. 2 and designed to receive a monophonic
downmix audio channel 358, which shall finally be up-mixed
into a 5.1 audio signal consisting of audio channels 360a
(lf), 360b (1r), 360c (rf) , 360d (rr), 360e (co) and
360f (lfe) . The downmix channel 358 (m) is received and
transformed from the time domain to the frequency domain
into its frequency representation 362 using the transform-
ing unit 350. The parameter-processing unit 352 receives a
combined and compact set of spatial parameters 364 in par-
allel with the downmix channel 358.
In a first step 363 of the hierarchical decoding process,
the monophonic downmix channel 362 is up-mixed into a ste-
reo master channel 364 (LR) and a center master channel 366
(C) .

In a second step 368 of the hierarchical decoding process,
the stereo master channel 364 is up-mixed into a left mas-
ter channel 370 (L) and a right master channel 372 (R).
In a third step of the decoding process, the left master
channel 370 is up-mixed into a left-front channel 374a and
a left-rear channel 374b, the right master channel 372 is
up-mixed into a right-front channel 374c and right-rear
channel 374d, and the center master channel 366 is up-mixed
to a center channel 374e and a low-frequency channel 374f.
Finally, the six single audio channels 374a to 374f are
transformed by the inverse transforming units 356a to 356c
into their representation in the time domain and thus build
the reconstructed 5.1 audio signal, having six audio chan-
nels 360a to 360f. To retain the original spatial property
of the 5.1 audio signal, the parameter processing unit 352,
especially the way the parameter processing unit provides
the individual parameter sets 380a to 380e, is vital, espe-
cially the way the parameter processing unit 352 derives
the individual parameter sets 380a to 380e.
The received combined ICC parameter describes the important
left/right coherence of the original six channel audio sig-
nal. Therefore, the parameter processing unit 352 builds
the ICC value of parameter set 4 (380d) such that it resem-
bles the left/right correlation information of the origi-
nally received spatial value, being transmitted within the
parameter set 364. In the simplest possible implementation
the parameter processing unit 352 simply uses the received
combined ICC parameter.
Another preferred embodiment of a decoder according to the
current invention is shown in Fig. 7, the decoder in Fig. 7
being the counterpart of the encoder from Fig. 4.
As the encoder in Fig. 7 comprises the same functional
blocks as the decoder in Fig. 6, the following discussion

is limited to the steps in which the hierarchical decoding
process differs from the one in Fig. 6. This is mainly due
to the fact that the monophonic signal 362 is up-mixed in a
different order and a different channel combination, since
the original 5.1 audio signal had been downmixed differ-
ently than the one received in Fig. 6.
In the first step 363 of the hierarchical decoding process,
the monophonic signal 362 is up-mixed into a rear master
channel 400 (S) and a pure front channel 402 (CF).
In a second step 368, the pure front channel 402 is up-
mixed into a front master channel 404 and a center master
channel 406.
In a third decoding step 372, the front master channel is
up-mixed into a left-front channel 374a and a right-front
channel 374c, the center master channel 406 is up-mixed
into a center channel 374e and a low-frequency channel 374f
and the rear master channel 400 is up-mixed into a left-
rear channel 374b and a right-rear channel 374d. Finally,
the six audio channels 374a to 374f are transformed from
the frequency domain into their time-domain representa-
tions 360a to 360f, building the reconstructed 5.1 audio
signal.
To preserve the spatial properties of the original
5.1 signal, having been coded as side information by the
encoder, the parameter processing unit 352 supplies the pa-
rameter sets 410a to 410e for the 1-to-2 decoders 354a to
354e. As the important left/right correlation information
is needed in the third up-mixing process 372 to build the
Lf, Rf, Lr, and Rr channels, the parameter-processing
unit 352 may supply an appropriate ICC value in the parame-
ter sets 410a and 410c, in the simplest implementation sim-
ply taking the transmitted ICC parameter to build the pa-
rameter sets 410a and 410c. In a possible alternative, the
received ICC parameter could be transformed into individual

parameters for parameter sets 410a and 410c by applying a
suitable weighting function to the received ICC parameter,
their weight being for example dependent on the energy
transmitted in the front master channel 404 and in the rear
master channel 400. In an even more sophisticated implemen-
tation, the parameter-processing unit 352 could also take
into account center channel information to supply an indi-
vidual ICC value for parameter set 5 and parameter set 4
(410a, 410b).
Fig. 8 is showing an inventive audio transmitter or re-
corder 500 that is having an encoder 220, an input inter-
face 502 and an output interface 504.
An audio signal can be supplied at the input interface 502
of the transmitter/recorder 500. The audio signal is en-
coded using an inventive encoder 220 within the transmit-
ter/recorder and the encoded representation is output at
the output interface 504 of the transmitter/recorder 500.
The encoded representation may then be transmitted or
stored on a storage medium.
Fig. 9 shows an inventive receiver or audio player 520,
having an inventive decoder 312, a bit stream input 522,
and an audio output 524.
A bit stream can be input at the input 522 of the inventive
receiver/audio player 520. The bit stream then is decoded
using the decoder 312 and the decoded signal is output or
played at the output 524 of the inventive receiver/audio
player 520.
Fig. 10 shows a transmission system comprising an inventive
transmitter 500, and an inventive receiver 520.
The audio signal input at the input interface 502 of the
transmitter 500 is encoded and transferred from the out-
put 504 of the transmitter 500 to the input 522 of the re-

ceiver 520. The receiver decodes the audio signal and plays
back or outputs the audio signal on its output 524.
The discussed examples of inventive decoders downmix a
multi-channel audio signal into a monophonic audio signal.
It is of course alternatively possible to downmix a multi-
channel signal into a stereophonic signal, which would for
example mean for the embodiments discussed in Figs. 2 and
4, that one step in the hierarchical encoding process could
be by-passed. All other numbers of resulting channels are
also possible.
The proposed method to hierarchically encode or decode
multi-channel audio information providing/using a compact
parametric representation of the spatial properties of the
audio signal is described mainly by shrinking the side in-
formation by combining multiple ICC values into one single
transmitted ICC value. It is to note here that the de-
scribed invention is in no way limited to the use of just
one combined ICC value. Instead, e.g., two combined values
can be generated, one describing the important left/right
correlation, the other one describing a front/back correla-
tion.
This can advantageously be implemented, for example, in the
embodiment of the current invention shown in Fig. 2, where
on the one hand a left front channel 250a and a left rear
channel 250b is combined into a left master channel 254a,
and where a right front channel 250c and a right rear chan-
nel 250d is combined into a rear master channel 254b. These
two encoding steps therefore yield information about the
front back correlation of the original audio signal, which
can easily be processed to provide an additional ICC value,
holding front/back correlation information.
Furthermore, in a preferred modification of the current in-
vention, it is advantageous to have encoding/decoding proc-
esses, which can do both, use the prior art individually

transmitted parameters, and, depending on a signaling side
information that is sent from encoder to decoder, also use
combined transmitted parameters. Such a system can advanta-
geously achieve both, higher representation accuracy (using
individually transmitted parameters) and, alternatively, a
low side information bit rate (using combined parameters).
Typically, the choice of this setting is made by the user
depending on the application requirements, such as the
amount of side information that can be accommodated by the
transmission system used. This allows to use the same uni-
fied encoder/decoder architecture while being able to oper-
ate within a wide range of side information bit
rate/precision trade-offs. This is an important capability
in order to cover a wide range of possible applications
with differing requirements and transmission capacity.
In another modification of such an advantageous embodiment,
the choice of the operating mode could also be made auto-
matically by the encoder, which analyses for example the
deviation of the decoded values from the ideal result in
case the combined transmission mode was used. If no sig-
nificant deviation is found, then combined parameter trans-
mission is employed. A decoder could even decide himself,
based on an analysis of the provided side information,
which mode is the appropriate one to use. For example, if
there were just one spatial parameter provided, the decoder
would automatically switch into the decoding mode using
combined transmitted parameters.
In another advantageous modification of the current inven-
tion, the encoder/decoder switches automatically from the
mode using combined transmitted parameters to the mode us-
ing individually transmitted parameters, to ensure the best
possible compromise between an audio reproduction quality
and a desired low side information bit rate.

As can be seen from the described preferred embodiments of
the encoders/decoders in Figs. 2, 4, 6, and 7, these units
make use of the same functional blocks. Therefore, another
preferred embodiment builds an encoder and a decoder using
the same hardware within one housing.
In an alternative embodiment of the current invention it is
possible to dynamically switch between the different encod-
ing schemes by grouping different channels together as
channel pairs, making it possible to dynamically use the
encoding scheme that provides the best possible audio qual-
ity for the given multi-channel audio signal.
It is not necessary to transmit the monophonic downmix
channel alongside the parametric representation of a multi-
channel audio signal. It is also possible to transmit the
parametric representation alone, to enable a listener, who
already owns a monophonic downmix of the multi-channel au-
dio signal, for example as a record, to reproduce a multi-
channel signal using his existing multi-channel equipment
and a parametric side information.
To summarize, the present invention allows to determine
these combined parameters advantageously from known prior
art parameters. Applying the inventive concept of combining
parameters in a hierarchical encoder/decoder structure, one
can downmix a multi-channel audio signal into a mono-based
parametric representation, obtaining a precise parametriza-
tion of the original signal at a low side information rate
(= bit-rate reduction).
It is one objective of the present invention that the en-
coder combines certain parameters with the objective of re-
ducing the number of parameters that have to be transmit-
ted. Then, the decoder derives the missing parameters from
parameters that have been transmitted, instead of using de-
fault parameter values, as it is the case in systems of
prior art, for example the one being shown in Fig. 15.

This advantage becomes evident reviewing again the embodi-
ment of a hierarchical parametric multi-channel audio coder
using prior art techniques, an example shown in Fig. 15.
There, the input signals (Lf, Rf, Lr, Rr, C and LFE, corre-
sponding to the left front, right front, left rear, right
rear, center and low frequency enhancement channels, re-
spectively) are segmented and transformed to the frequency
domain to obtain the required time/frequency tiles. The re-
sulting signals are subsequently combined in a pair-wise
fashion. For example, the signals Lf and Lr are combined to
form signal "L". A corresponding spatial parameter set (1)
is generated to model the spatial properties between the
signals Lf and Lr (i.e. consisting of one or more of IIDs,
ICCs, IPDs). In the embodiment according to the prior art
shown in Fig. 15, this process is repeated until a single
output channel (M) is obtained, the output channel being
accompanied by five parameter sets. The application of
prior art hierarchical coding techniques would then imply
the transmission of all parameter sets.
It should be noted, however, that not all parameter sets
have to contain values for all possible spatial parameters.
For example, parameter set 1 in Fig. 15 may consist of IID
and ICC parameters, while parameter set 3 may consist of
IDD parameters only. If certain parameters are not trans-
mitted for specific sets, the prior art hierarchical de-
coder will apply a default value for these parameters (for
example ICC = + 1, IPD = 0, etc.). Thus, each parameter set
represents a specific signal combination only and does not
describe spatial properties of the remaining channel pairs.
This loss of knowledge about the spatial properties of sig-
nals, who's parameters are not being transmitted, can be
avoided using the inventive concept, in which the encoder
is combining specific parameters such that the most impor-
tant spatial properties of the original signal are pre-
served.

When, for example, ICC parameters are combined into a sin-
gle value, the combined parameters can be used in the de-
coder as a substitute for all individual parameters (or the
individual parameter used in the decoder can be derived
from the transmitted ones). It is an important feature that
the encoder parameter combination process is carried out
such that the sound image of the original multi-channel
signal is preserved as closely as possible after recon-
struction by the decoder. Transmitting ICC parameters, this
means that the width (decorrelation) of the original sound
field should be retained.
It is to be noted here that the most important ICC value is
between the left/right axis since the listener usually is
facing forward in the listening set-up. This can be taken
into account advantageously to build the hierarchical en-
coding structure such that a suitable parametric represen-
tation of the audio signal can be obtained during the it-
erative encoding process, wherein the resulting combined
ICC value represents mainly the left/right decorrelation.
This will be explained in more detail later when discussing
preferred embodiments of the current invention.
The inventive encoding/decoding scheme allows to reduce the
number of transmitted parameters from a encoder to a de-
coder using a hierarchical structure of a spatial audio
system by means of the two following measures:
• combining the individual encoder parameters to form a
combined parameter, which is transmitted to the de-
coder instead of individual ones. The combination of
the parameters is carried out such that the signal
sound image (including L/R correlation/coherence) is
preserved as far as possible.
• the transmitted combined parameter is used in the de-
coder instead of several transmitted individual pa-

rameters (or the actually used parameters are derived
from the combined one).
Depending on certain implementation requirements of the in-
ventive methods, the inventive methods can be implemented
in hardware or in software. The implementation can be per-
formed using a digital storage medium, in particular a
disk, DVD or a CD having electronically readable control
signals stored thereon, which cooperate with a programmable
computer system such that the inventive methods are per-
formed. Generally, the present invention is, therefore, a
computer program product with a program code stored on a
machine readable carrier, the program code being operative
for performing the inventive methods when the computer pro-
gram product runs on a computer. In other words, the inven-
tive methods are, therefore, a computer program having a
program code for performing at least one of the inventive
methods when the computer program runs on a computer.
While the foregoing has been particularly shown and de-
scribed with reference to particular embodiments thereof,
it will be understood by those skilled in the art that
various other changes in the form and details may be made
without departing from the spirit and scope thereof. It is
to be understood that various changes may be made in adapt-
ing to different embodiments without departing from the
broader concepts disclosed herein and comprehended by the
claims that follow.

We Claim:
1. An encoder for generating a parametric representation (314,316,318,320) of
an audio signal having at least two original left channels (224a,224b) on a left
side and two original right channels (224c,224d) on a right side with respect to a
listening position, comprising:
a generator (220) for generating parametric information, the generator being
operative to separately process several pairs of channels to derive a level
information (230a,230b) for processed channel pairs, and to derive coherence
information (232a,232b) for a channel pair including a first channel (228a) only
having information from the left side and a second channel (228b) only having
information from the right side; and
a provider (222) for providing the parametric representation
(238,314,316,318,320) by selecting the level information (230a,230b) for
channel pairs and by determining a left/right coherence measure (236) using the
coherence information (232a,232b) and to introduce the left/right coherence
measure (236) into an output datastream as the only coherence information of
the audio signal within the parametric representation (238;314,316,318,320).

2. The encoder as claimed in claim 1, wherein the generator (220) is operative
to process a left-front channel (If) and a left- rear channel (Ir) to derive a If/lr
level information (230a), wherein a combination of the left-front channel (If) and
the left-rear (Ir) channel forms a left master channel (LM), and to process a
right- front channel (rf) and a right-rear channel (rr) to derive a rf/rr level
information (230b), wherein a combination of the right-front channel (rf) and the
right-rear (rr) channel forms a right master channel (RM); and
to process the left master channel (LM) and the right master channel (RM) to
derive a Im/rm level information (234) and to derive the coherence information
(236), wherein a combination of the left master channel (LM) and the right
master channel (RM) forms a stereo master channel (SM).
3. The encoder as claimed in claim 2, wherein the generator (220) is operative to
process a center channel (ce) and a low-frequency channel (lo) to derive a ce/lo
level information, wherein a combination of the center channel (ce) and the low-
frequency channel (lo) forms a center master channel (CM).
4. The encoder as claimed in claim 3, wherein the generator (220) is operative
to process the stereo master channel (SM) and the center master channel (CM)
to derive a sm/cm level information, wherein a combination of the stereo master
channel (SM) and the center master (CM) channel forms a downmix channel;
and
in which the provider (222) is operative to determine the left/right coherence
measure using the coherence information (232a,232b) and the sm/cm level
information.

5. The encoder as claimed in claim 4, wherein the provider (222) is operative to
calculate the left/right coherence measure depending on the sm/cm level
information such that, in a case, in which the sm/cm level information indicates,
that more energy is in the stereo master channel (SM) than in the center master
channel (CM), the left/right coherence measure is more close to the coherence
information (232a, 232b) compared to a situation, and wherein the sm/cm level
information indicates, that more energy is in the center master channel (CM), in
which case the left/right coherence measure is more close to unity.
6. The encoder as claimed in claim 4, wherein the provider (222) is operative to
calculate the left/right coherence measure depending on the sm/cm level
information such that, in a case, wherein the sm/cm level information indicates,
that a ratio of the energy in the stereo master channel (SM) and the energy in
the center master channel (CM) exceeds a predefined value, the left/right
coherence measure is set to the coherence information (232a,232b) compared to
a situation, wherein the sm/cm level information indicates, that the ratio of the
energy in the stereo master channel SM to the energy in the center master
channel (CM) stays below or equals the predefined value, and wherein the
left/right coherence measure is set to unity.

7. The encoder as claimed in claim l,wherein the generator (220) is operative to
process a left- front channel (If) and a right-front channel (rf) to derive a If/rf
level information and a first coherence information (232a,232b), wherein a
combination of the left-front channel (If) and the right-front channel (rf) forms a
front master channel (FM), and to process a left-rear channel (Ir) and a right-rear
channel (rr) to derive a Ir/rr level information and to derive a second coherence
information (232a,232b), wherein a combination of the left- rear channel (Ir) and
the right-rear channel (rr) forms a rear master channel (RM), and wherein the
provider (222) is operative to determine the left/right coherence measure
combining the first coherence information (232a) and the second coherence
information (232b).
8. The encoder as claimed in claim 7, wherein the provider (222) is operative to
determine the left/right coherence measure based on a weighted sum of the first
and the second coherence information (232a,232b), using level information of
the front master channel (FM) and level information of the rear master channel
(RM) as weights.
9. The encoder as claimed in claim 7, wherein the generator (220) is operative
to process a center channel (ce) and a low-frequency channel (lo) to derive a
ce/lo level information, and wherein a combination of the center channel (ce)
and the low-frequency channel (lo) forms a center master channel (CM).

10. The encoder as claimed in claim 9, wherein the generator (220) is operative
to process the front master channel (FM) and the center master channel (CM) to
derive a fm/cm level information, wherein a combination of the front master
channel (FM) and the center master channel (CM) forms a pure front channel
(PF); and
wherein the provider (222) is operative to determine the left/right coherence
measure combining the first and the second coherence information (232a,232b)
additionally using the fm/cm level information.
11. The encoder as claimed in claim 10, wherein the generator (220) is operative
to process the pure front channel (PF) and the rear master channel (RM) to
derive a pf/rm level information, and wherein a combination of the pure front
channel (PF) and the rear master channel (RM) forms a downmix channel.
12. The encoder as claimed in claim 1, wherein the generator (220) is operative
to process the pairs of channels in discrete time frames of a given length.
13. The encoder as claimed in claim 1, wherein the generator (220) is operative
to process the pairs of channels in discrete frequency intervals of a given
bandwidth.

14. A decoder for processing a parametric representation (314,316,318,320) of
an original audio signal, the original audio signal having at least two original left
channels (224a,224b) on a left side and at least two original right channels
(224c,224d) on a right side with respect to a listening position, comprising:
a receiver (310) for providing the parametric representation (314,316,318,320)
of the audio signal, the receiver (310) being operative to provide level
information (314,316,318) for channel pairs and to provide a left/right coherence
measure (320) for a channel pair including a left channel and a right channel as
the only coherence information of the original audio signal within the parametric
representation (314,316,318,320), the left/right coherence measure representing
a coherence information between at least one channel pair including a first
channel only having information from the left side and a second channel only
having information from the right side; and
a processor (312) for supplying parametric information for channel pairs, the
processor (312) being operative to select level information (314,316,318) from
the parametric representation (314,316,318,320) and to derive coherence
information for at least one channel pair using the left/right coherence measure
(320), the at least one channel pair including a first channel only having
information from the left side and a second channel only having information from
the right side.

15. The decoder as claimed in claim 14, wherein the receiver (310) is operative
to provide a If/lr level information (316) for a channel pair of an original left-front
channel (If) and an original left-rear channel (Ir), wherein a combination of the
original left-front channel (If) and the original left-rear channel (Ir) forms a left
master channel (LM);
provide a rf/rr level information (318) for a channel pair of an original right-front
channel (rf) and an original right-rear channel (rr), wherein a combination of the
original right-front channel (rf) and the original right-rear channel (rr) forms an
right master channel (RM);
provide a Im/rm level information (314) for a channel pair of the left master
channel (LM) and the right master channel (RM), wherein a combination of the
left master channel (LM) and the right master channel (RM) forms a stereo
master channel (SM); and
wherein the processor (312) is operative to provide coherence information for
the left master channel (LM) and the right master channel (RM) using the
left/right coherence measure (320);
the decoder comprising an upmixer, the upmixer having :
a first 1-to-2 upmixer (354b) for generation of the left master channel (LM) and
the right master channel (RM) from the stereo master channel (SM) using the
Im/rm level information and the left/right coherence measure;

a second 1-to-2 upmixer (354e) for generation of the original left-front channel
(If) and the original left-rear channel (Ir) from the left master channel (LM) using
the If/lr level information and a predefined coherence information; and
a third 1-to-2 upmixer (354d) for generation of the original right-front (rf)
channel and the original right-rear channel (rr) from the right master channel
(RM) using the rf/rr level information and a predefined coherence information.
16. The decoder as claimed in claim 15, wherein the receiver (310) is operative
to provide a ce/lo level information for a channel pair of an original center
channel (ce) and of an original low-frequency channel (lo), wherein a
combination of the original center channel (ce) and of the original low-frequency
channel (lo) forms a center master channel (CM); and wherein the upmixer is
comprising a fourth 1- to-2 upmixer (354a) for generation of the original center
channel (ce) and the original low-frequency channel (lo) from the center master
channel (CM) using the ce/lo level information and a predefined coherence
information.
17. The decoder as claimed in claim 16, wherein the receiver (310) is operative
to provide a sm/cm level information for a channel pair of the stereo master
channel (SM) and of the center master channel (CM), wherein a combination of
the stereo master channel (SM) and of the center master channel (CM) forms a
downmix channel; and

wherein the upmixer is comprising a fifth 1- to-2 upmixer (254b) for generation
of the stereo master channel (SM) and the center master channel (CM) from the
downmix channel using the sm/cm level information and a predefined coherence
information.
18. The decoder as claimed in claim 14, wherein the receiver (310) is operative
to
provide a If/rf level information (316) for a channel pair of an original left-front
channel (If) and of an original right-front channel (rf), wherein a combination of
the original left-front channel (If) and of the original right-front channel (rf)
forms a front master channel (FM);
provide a Ir/rr level information (318) for a channel pair of an original left-rear
channel (Ir) and an original right-rear channel (rr), wherein a combination of the
original left-rear channel (Ir) and the original right-rear channel (rr) forms a rear
master channel (RM); and
wherein the processor (312) is operative to supply a first coherence information
for the original left-front channel (If) and the original right-front channel (rf) and
to supply a second coherence information for the original left-rear channel (Ir)
and the original right- rear channel (rr) using the left/right coherence measure;
the decoder comprising an upmixer, the upmixer having :

a first 1-to-2 upmixer (354e) for generation of the original left-front channel (If)
and the original right-front channel (rf) from the front master channel (FM) using
the If/rf level information and the left/right coherence measure;
a second 1-to-2 upmixer (354c) for generation of the original left-rear channel
(Ir) and the original right-rear channel (rr) from the rear master (RM) channel
using the Ir/rr level information and the left/right coherence measure.
19. The decoder as claimed in claim 18, wherein the receiver (310) is operative
to provide a ce/lo level information for a channel pair of an original center
channel (ce) and of an original low-frequency channel (lo), wherein a
combination of the original center channel (ce) and of the original low-frequency
channel (lo) forms a center master channel (CM); and
wherein the upmixer is comprising a third 1- to-2 upmixer (354d) for
generation of the original center channel (co) and the original low-frequency
channel (lo) from the center master channel (CM) using the ce/lo level
information and a predefined coherence information.
20. The decoder as claimed in claim 19, wherein the receiver (310) is operative
to provide a fm/cm level information for a channel pair of the front master
channel (FM) and the center master channel (CM), wherein a combination of the
front master channel (FM) and the center master channel (CM) forms a pure
front channel (PF); and

wherein the upmixer is comprising a fourth 1- to-2 upmixer (354b) for
generation of the front master channel (FM) and the center master channel (CM)
from the pure front channel (PF) using the fm/cm level information and a
predefined coherence information.
21. The decoder as claimed in claim 20, wherein the receiver (310) is operative
to provide a pf/rm level information for a channel pair of the pure front channel
( PF) and the rear master channel (RM), wherein a combination of the pure front
channel (PF) and the rear, master channel (RM) forms a downmix channel; and
wherein the upmixer is comprising a fifth 1- to-2 upmixer (354a) for generation
of the pure front channel (PF) and the rear master channel (RM) from the
downmix channel using the pf/rm level information and a predefined coherence
information.
22. The decoder as claimed in claim 14, wherein the processor (312) is operative
to derive coherence measures for all channel pairs by distributing the received
left/right coherence as the coherence measures.

23. The decoder as claimed in claim 14, wherein the receiver (310) is operative
to
operate in a first mode, providing level information for channel pairs and
providing a left/right coherence measure for a channel pair comprising a left
channel and a right channel as the only coherence information of the audio
signal within the parametric representation (314,316,318,320), the left/right
coherence measure representing a coherence information between at least one
channel pair including a first channel only having information from the left side
and a second channel only having information from the right side with respect to
a listening position; or
to operate in a second mode, providing the level information for channel pairs
and the coherence information for the same channel pairs; and
wherein the processor (312) is operative to supply parametric information for
channel pairs in the first mode, the processor (312) being operative to select the
level information from the parametric representation (314,316,318,320) and to
derive the coherence information for at least one channel pair using the left/right
coherence measure, the at least one channel pair including a first channel only
having information from the left side and a second channel only having
information from the right side; or
in the second mode, the processor (312) being operative to select the level
information from the parametric representation (314,316,318,320) and to select
the coherence information from the parametric representation
(314,316,318,320).

24. The decoder as claimed in claim 23, the receiver (310) comprises a mode
receiver for selecting a operating mode using received mode information, the
mode information indicating the first or the second mode to be used.
25. A method for generating a parametric representation (314,316,318,320) of
an audio signal having at least two original left channels (224a,224b) and at
least two original right channels ( 224c,224d) with respect to a listening position,
the method comprising:
generating parametric information by separately processing several pairs of
channels to derive a level information for processed channel pairs and by
deriving coherence information for a channel pair including a first channel only
having information from the left side and a second channel only having
information from the right side, and
providing the parametric representation (314,316,318,320) by selecting level
information for travel pairs and by determining a left/right coherence measure
using the coherence information and introducing the left/right coherence
measure into an output datastream as the only coherence information of the
audio signal within the parametric representation (314,316,318,320).

!6. A method for processing a parametric representation (314,316,318,320) of
an original audio signal, the original audio signal having at least two original left
:hannels (224a,224b) on the left side and at least two original right channels
224c,224d) on the right side with respect to a listening position, the method
:omprising:
providing the parametric representation (314,316,318,320) of the audio signal by
providing a level information for channel pairs and by providing a left/right
coherence measure for a channel pair including a left channel and a right
channel as the only coherence information of the audio signal within the
parametric representation (314,316,318,320) , the left/right coherence measure
epresenting a coherence information between at least one channel pair
including a first channel only having information from the left side and a second
channel only having information from the right side; and
supplying parametric information for channel pairs by selecting level information
rom the parametric representation (314,316,318,320) and by deriving
coherence information for at least one channel pair using the left/right coherence
neasure, the at least one channel pair including a first channel only having
information from the left side and a second channel only having information from
the right side.
17. A receiver or audio player having a decoder as claimed in claim 14.

28. A transmitter or audio recorder having an encoder as claimed in claim 1.
29. A method of receiving or audio playing, the method having a method as
claimed in claim 26.
30. A method of transmitting or audio recording, the method having a method
as claimed in claim 25.
31. A transmission system including a transmitter as claimed in claim 28 and a
receiver as claimed in claim 27.
32. A method of transmitting and receiving, the method of transmitting having a
method as claimed in claim 25 and the method of receiving having a method as
claimed in claim 26.

Abstract
Title: " An encoder for generating a parametric representation of an audio
signal"
The invention relates to an encoder for generating a parametric representation
(314,316,318,320) of an audio signal having at least two original left channels
(224a,224b) on a left side and two original right channels (224c,224d) on a right
side with respect to a listening position, comprising a generator (220) for
generating parametric information, the generator being operative to separately
process several pairs of channels to derive a level information (230a,230b) for
processed channel pairs, and to derive coherence information (232a,232b) for a
channel pair including a first channel (228a) only having information from the left
side and a second channel (228b) only having information from the right side;
and a provider (222) for providing the parametric representation
(238,314,316,318,320) by selecting the level information (230a,230b) for
channel pairs and by determining a left/right coherence measure (236) using the
coherence information (232a,232b) and to introduce the left/right coherence
measure (236) into an output datastream as the only coherence information of
the audio signal within the parametric representation (238;314,316,318,320).

" AN ENCODER FOR GENERATING A PARAMETRIC REPRESENTATION OF AN AUDIO SIGNAL"

Documents:

Inventors:

PCT Conventions: