Title of Invention

METHOD FOR ENCODING, METHOD FOR DECODING, AUDIO ENCODER, AUDIO PLAYER AND AUDIO SYSTEM FOR ENCODING/DECODING A SIGNAL

Abstract In a sinusoidal audio encoder a number of sinusoids are estimated per audio segment. A sinusoid is represented by frequency, amplitude and phase. Normally, phase is quantised independent of frequency. The invention uses a frequency dependent quantisation of phase, and in particular the low frequencies are quantised using smaller quantisation intervals than at higher frequencies. Thus, the unwrapped phases of the lower frequencies are quantised more accurately, possibly with a smaller quantisation range, than the phases of the higher frequencies. The invention gives a significant improvement in decoded signal quality, especially for low bit-rate quantisers.
Full Text Low bit-rate audio encoding
The present invention relates to encoding and decoding of broadband signals such as particular audio signals.
When transmitting broadband signals, e.g. audio signals such as speech, compression or encoding techniques are used to reduce the bandwidth or bit rate of the signal.
Figure 1 shows a known parametric encoding scheme, in particular a sinusoidal encoder, which is used in the present invention, and which is described in WO 01/69593. In this encoder, an input audio signal x(t) is split into several (possibly overlapping) time segments or frames, typically of duration 20 ms each. Each segment is decomposed into transient, sinusoidal and noise components. It is also possible to derive other components of the input audio signal such as harmonic complexes, although these are not relevant for the purposes of the present invention.
In the sinusoidal analyser 130, the signal x2 for each segment is modelled using a number of sinusoids represented by amplitude, frequency and phase parameters. This information is usually extracted for an analysis time interval by performing a Fourier transform (FT) which provides a spectral representation of the interval including: frequencies, amplitudes for each frequency, and phases for each frequency, where each phase is "wrapped",i.e. in the range {-7t;n}. Once the sinusoidal information for a segment is estimated, a tracking algorithm is initiated. This algorithm uses a cost function to link sinusoids in different segments with each other on a segment-to-segmentbasis to obtain so-called tracks. The tracking algorithm thus results in sinusoidal codes Cs comprising sinusoidal tracks that start at a specific time instance, evolve for a certain duration of time over a plurality of time segments and then stop.
In such sinusoidal encoding, it is usual to transmit frequency information for the tracks formed in the encoder. This can be done in a simple manner and with relatively low costs, since tracks only have slowly varying frequency. Frequency information can

PHNL030921
therefore be transmitted efficiently by time differential encoding. In general, amplitude can also be encoded differentially over time.
In contrast to frequency, phase changes more rapidly with time. If the frequency is constant, the phase will change linearly with time, and frequency changes will result in corresponding phase deviations from the linear course. As a function of the track segment index, phase will have an approximately linear behaviour. Transmission of encoded phase is therefore more complicated. However, when transmitted, phase is limited to the range {-n;n}, i.e. the phase is "wrapped", as provided by the Fourier transform. Because of this modulo 2n representation of phase, the structural inter-frame relation of the phase is lost and, at first sight appears to be a random variable.
However, since the phase is the integral of the frequency, the phase is redundant and needs, in principle, not be transmitted. This is called phase continuation and reduces the bit rate significantly.
In phase continuation, only the first sinusoid of each track is transmitted in order to save bit rate. Each subsequent phase is calculated from the initial phase and frequencies of the track. Since the frequencies are quantised and not always very accurately estimated, the continuous phase will deviate from the measured phase. Experiments show that phase continuation degrades the quality of an audio signal.
Transmitting the phase for every sinusoid increases the quality of the decoded signal at the receiver end, but it also results in a significant increase in bit rate/bandwidth. Therefore, a joint frequency/phase quantiser, in which the measured phases of a sinusoidal track having values between -n and n are unwrapped using the measured frequencies and linking information, results in monotonically increasing unwrapped phases along a track. In that encoder the unwrapped phases are quantised using an Adaptive Differential Pulse Code Modulation (ADPCM) quantiser and transmitted to the decoder. The decoder derives the frequencies and the phases of a sinusoidal track from the unwrapped phase trajectory.
In phase continuation, only the encoded frequency is transmitted, and the phase is recovered at the decoder from the frequency data by exploiting the integral relation between phase and frequency. It is known, however, that when phase continuation is used, the phase cannot be perfectly recovered. If frequency errors occur, e.g. due to measurement errors in the frequency or due to quantisation noise, the phase, being reconstructed using the integral relation, will typically show an error having the character of drift. This is because frequency errors have an approximately random character. Low-frequency errors are

FHNLU3092 1
amplified by integration, and consequently the recovered phase will tend to drift away from the actually measured phase. This leads to audible artifacts.
This is illustrated in Figure 2a where Q and i// are the real frequency and real phase, respectively, for a track. In both the encoder and decoder frequency and phase have an integral relationship as represented by the letter "I". The quantisation process in the encoder is modelled as an added noise n. In the decoder, the recovered phase ψ thus includes two components: the real phase ψ and a noise component 82, where both the spectrum of the recovered phase and the power spectral density function of the noise z% have a pronounced low-frequency character.
Thus, it can be seen that in phase continuation, since the recovered phase is the integral of a low-frequency signal, the recovered phase is a low-frequency signal itself. However, the noise introduced in the reconstruction process is also dominant in this low-frequency range. It is therefore difficult to separate these sources with a view to filtering the noise n introduced during encoding.
In conventional quantisationmethods, frequency and phase are quantised independent of each other. In general, a uniform scalar quantiser is applied to the phase parameter. For perceptual reasons the lower frequencies should be quantised more accurately than the higher frequencies. Therefore the frequencies are converted to a non-uniform representation using the ERB or Bark function and then quantised uniformly, resulting in a non-uniform quantiser. Also physical reasons can be found: in harmonic complexes, higher harmonic frequencies tend to have higher frequency variations than the lower frequencies.
When the frequency and phase are quantisedjointly, frequency dependent quantisation accuracy is not straightforward. The use of a uniform quantisation approach results in a low quality sound reconstruction. Furthermore, for the high frequencies, where the quantisation accuracy can be lowered, a quantiser can be developed that needs less bits. For the unwrapped phases, a similar mechanism would be desirable.
The invention provides a method of encoding a broadband signal, in particular an audio signal such as a speech signal using a low bit-rate. In the sinusoidal encoder a number of sinusoids are estimated per audio segment. A sinusoid is represented by frequency, amplitude and phase. Normally, phase is quantised independent of frequency. The invention uses a frequency dependent quantisation of phase, and in particular the low frequencies are quantised using smaller quantisation intervals than at higher frequencies.

PHNL030921
Thus, the unwrapped phases of the lower frequencies are quantised more accurately, possibly with a smaller quantisation range, than the phases of the higher frequencies. The invention gives a significant improvement in decoded signal quality, especially for low bit-rate quantisers.
The invention enables the use of joint quantisation of frequency and phase while having a non-uniform frequency quantisation as well. This results in the advantage of transmitting phase information with a low bit rate while still maintaining good phase accuracy and signal quality at all frequencies, in particular also at low frequencies.
The advantage of this method is improved phase accuracy, in particular at the lower frequencies, where a phase error corresponds to a larger time error than at higher frequencies. This is important, since the human ear is not only sensitive to frequency and phase but also to absolute timing as in transients, and the method of the invention results in improved sound quality, especially when only a small number of bits is used for quantising the phase and frequency values. On the other hand, a required sound quality can be obtained using fewer bits. Since the low frequencies are slowly varying, the quantisation range can be more limited and a more accurate quantisation is obtained. Furthermore, the adaptation to a finer quantisation is much faster.
The invention can be used in an audio encoder where sinusoids are used. The invention relates both to the encoder and the decoder.
Fig. 1 shows a prior art audio encoder in which an embodiment of the invention is implemented;
Fig. 2a illustrates the relationship between phase and frequency in prior art systems;
Fig. 2b illustrates the relationship between phase and frequency in audio systems according to the present invention;
Figs. 3a and 3b show a preferred embodiment of a sinusoidal encoder component of the audio encoder of Figure 1;
Fig. 4 shows an audio player in which an embodiment of the invention is implemented; and
Figs. 5a and 5b show a preferred embodiment of a sinusoidal synthesizer component of the audio player of Figure 4; and

Fig. 6 shows a system comprising an audio encoder and an audio player according to the invention.
Preferred embodiments of the invention will now be described with reference to the accompanying drawings wherein like components have been accorded like reference numerals and, unless otherwise stated, perform like functions. In a preferred embodiment of the present invention, the encoder 1 is a sinusoidal encoder of the type described in WO 01/69593, Figure 1. The operation of this prior art encoder and its corresponding decoder has been well described and description is only provided here where relevant to the present invention.
In both the prior art and the preferred embodiment of the present invention, the audio encoder 1 samples an input audio signal at a certain sampling frequency resulting in a digital representation x(t) of the audio signal. The encoder 1 then separates the sampled input signal into three components: transient signal components, sustained deterministic components, and sustained stochastic components. The audio encoder 1 comprises a transient encoder 11, a sinusoidal encoder 13 and a noise encoder 14.
The transient encoder 11 comprises a transient detector (TD) 110, a transient analyzer (TA) 111 and a transient synthesizer (TS) 112. First, the signal x(t) enters the transient detector 110. This detector 110 estimates if there is a transient signal component and its position. This information is fed to the transient analyzer 111. If the position of a transient signal component is determined, the transient analyzer 111 tries to extract (the main part of) the transient signal component. It matches a shape function to a signal segment preferably starting at an estimated start position, and determines content underneath the shape function, by employing for example a (small) number of sinusoidal components. This information is contained in the transient code CT, and more detailed information on generating the transient code CT is provided in WO 01/69593.
The transient code CT is furnished to the transient synthesizer 112. The synthesized transient signal component is subtracted from the input signal x(t) in subtractor 16, resulting in a signal xl. A gain control mechanism GC (12) is used to produce x2 from xl.
The signal x2 is furnished to the sinusoidal encoder 13 where it is analyzed in a sinusoidal analyzer (SA) 130, which determines the (deterministic) sinusoidal components. It will therefore be seen that while the presence of the transient analyser is desirable, it is not













Table 3 shows an example of frequency dependent scale factors and corresponding initial tables Q and R for a 2-bit ADPCM quantiser. The audio frequency range 0-22050 Hz is divided into four frequency sub-ranges. It is seen that the phase accuracy is improved in the lower frequency ranges relative to the higher frequency ranges.
The number of frequency sub-ranges and the frequency dependent scale factors may vary and can be chosen to fit the individual purpose and requirements. Like described above, the frequency dependent initial tables Q and R in table 3 may be up-scaled and down-scaled dynamically to adapt to the evolution in phase from one time segment to the next.
In e.g. a 3-bit ADPCM quantiser, the initial boundaries of the eight quantisation intervals defined by the 3 bits can be defined as follows: Q = {.co -1.41 -0.707 -0.35 0 0.35 0.707 1.41 oo}, and can have minimum grid size?c/64, and a maximum grid size n/2. The representation table R may look like:
R= {-2.117,-1.0585,-0.5285,-0.1750,0.1750,0.5285,1.0585, 2.117}. A similar frequency dependent initialisation of the table Q and R as shown in Table 3 may be used in this case.
From the sinusoidal code Cs generated with the sinusoidal encoder, the sinusoidal signal component is reconstructed by a sinusoidal synthesizer (SS) 131 in the same manner as will be described for the sinusoidal synthesizer (SS) 32 of the decoder. This signal is subtracted in subtractor 17 from the input x2 to the sinusoidal encoder 13, resulting in a remaining signal x3. The residual signal x3 produced by the sinusoidal encoder 13 is passed to the noise analyzer 14of the preferred embodiment which produces a noise code CN representative of this noise, as described in, for example, international patent application No. PCT/EP00/04599.
Finally, in a multiplexer 15, an audio stream AS is constituted which includes the codes CT, CS and CN. The audio stream AS is furnished to e.g. a data bus, an antenna system, a storage medium etc.
Fig. 4 shows an audio player 3 suitable for decoding an audio stream AS% e.g. generated by an encoder 1 of Fig. 1, obtained from a data bus, antenna system, storage medium etc. The audio stream AS' is de-multiplexed in a de-multiplexer 30 to obtain the codes CT, CS and CN- These codes are furnished to a transient synthesizer 31, a sinusoidal synthesizer 32 and a noise synthesizer 33 respectively. From the transient code CT, the transient signal components are calculated in the transient synthesizer 31. In case the transient code indicates a shape function, the shape is calculated based on the received parameters. Further, the shape content is calculated based on the frequencies and amplitudes

of the sinusoidal components. If the transient code Or indicates a step, then no transient is calculated. The total transient signal yr is a sum of all transients.
The sinusoidal code Cs including the information encoded by the analyser 130 is used by the sinusoidal synthesizer 32 to generate signal ys. Referring now to Figures 5a and b, the sinusoidal synthesizer 32 comprises a phase decoder (PD) 56 compatible with the phase encoder 46. Here, a de-quantiser (DQ)60 in conjunction with a second-order prediction filter (PF) 64 produces (an estimate of) the unwrapped phase ψ from: the
representation levels r; initial information #(0), &(0) provided to the prediction filter (PF)
64 and the initial quantization step for the quantization controller (QC)62.
As illustrated in Figure 2b, the frequency can be recovered from the unwrapped phase y/ by differentiation, Assuming that the phase error at the decoder is
approximately white and since differentiation amplifies the high frequencies, the differentiation can be combined with a low-pass filter to reduce the noise and, thus, to obtain an accurate estimate of the frequency at the decoder.
In the preferred embodiment, a filtering unit (FR) 58 approximates the
differentiation which is necessary to obtain the frequency a> from the unwrapped phase by procedures as forward, backward or central differences. This enables the decoder to produce as output the phases ψ and frequencies ψ usable in a conventional manner to synthesize the
sinusoidal component of the encoded signal.
At the same time, as the sinusoidal components of the signal are being synthesized, the noise code CN is fed to a noise synthesizerNS 33, which is mainly a filter, having a frequency response approximating the spectrum of the noise. The NS 33 generates reconstructed noise yN by filtering a white noise signal with the noise code CN- The total signal y(t) comprises the sum of the transient signal yt and the product of any amplitude decompression (g) and the sum of the sinusoidal signal ys and the noise signal yn The audio player comprises two adders 36 and 37 to sum respective signals. The total signal is furnished to an output unit 35, which is e.g. a speaker.
Fig. 6 shows an audio system according to the invention comprising an audio encoder 1 as shown in Fig. 1 and an audio player 3 as shown in Fig. 4. Such a system offers playing and recording features. The audio stream AS is furnished from the audio encoder to the audio player over a communication channel 2, which may be a wireless connection, a data 20 bus or a storage medium. In case the communication channel 2 is a storage medium, the storage medium may be fixed in the system or may also be a removable disc, memory

stick etc. The communication channel 2 may be part of the audio system, but will however often be outside the audio system.
The coded data from several consecutive segments are linked. This is done as follows. For each segment a number of sinusoids are determined (for example using an FFT). A sinusoid consists of a frequency, amplitude and phase. The number of sinusoids is variable per segment. Once the sinusoids are determined for a segment, an analysis is done to connect to sinusoids from the previous segment. This is called 'linking' or 'tracking1. The analysis is based on the difference between a sinusoid of the current segment and all sinusoids from the previous segment. A link/track is made with the sinusoid in the previous segment that has the smallest difference. If even the smallest difference is larger than a certain threshold value, no connection to sinusoids of the previous segment is made. In this way a new sinusoid is created or "born".
The difference between sinusoids is determined using a 'cost function', which uses the frequency, amplitude and phase of the sinusoids. This analysis is performed for each segment. The result is a large number of tracks for an audio signal. A track has a birth, which is a sinusoid that has no connection with sinusoids from the previous segment. A birth sinusoid is encoded non-differentially. Sinusoids that are connected to sinusoids from previous segments are called continuations and they are encoded differentially with respect to the sinusoids from the previous segment. This saves a lot of bits, since only differences are encoded and not absolute values.
If f(n-l) is the frequency from a sinusoid from the previous segment and f(n) is a connected sinusoid from the current segment, then f(n) - f(n+l) is transmitted to the decoder. The number n represents the number in the track, n=l is the birth, n = 2 is the first continuations etc. The same is true for the amplitudes. The phase value of the initial sinusoid (= birth sinusoid) is transmitted, whereas for a continuation, no phase is transmitted, but the phase can be retrieved from the frequencies. If a track has no continuation in the next segment, the track ends or "dies".

Documents:


Patent Number 223954
Indian Patent Application Number 571/CHENP/2006
PG Journal Number 47/2008
Publication Date 21-Nov-2008
Grant Date 24-Sep-2008
Date of Filing 15-Feb-2006
Name of Patentee KONINKLIJKE PHILIPS ELECTRONICS N.V.
Applicant Address Groenewoudseweg 1, NL-5621 BA Eindhoven,
Inventors:
# Inventor's Name Inventor's Address
1 GERRITS, Andreas, J GERRITS, Andreas, J. c/o Prof. Holstlaan 6, NL-5656 AA, Eindhoven,
2 DEN BRINKER, Albertus, C C/o Prof. Holstlaan 6, NL-5656 AA, Eindhoven,
PCT International Classification Number G10L19/08
PCT International Application Number PCT/IB2004/051172
PCT International Filing date 2004-07-08
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 03102225.4 2003-07-18 EUROPEAN UNION