Title of Invention

AN APPARATUS FOR DETERMINING AN ENCODING RATE FOR AN INPUT SIGNAL IN A VARIABLE RATE VOCODER

Abstract ABSTRACT It is a fist objective of the present invention to provide a method by which to reduce the probability of coding low energy unvoiced speech as background noise. The present invention determines an encoding rate by examining sub bands of the input signal, by this method unvoiced speech can be distinguished from background noise. A second objective of the present invention is to provide a means by which to set the threshold levels that takes into account signal energy as well as background noise energy. In the present invention, the background noise is not used to determine threshold values, rather the signal to noise ratio of an input signal is use to determine the threshold values. A third objective of the present invention is to provide a method for coding music passing through a variable rate locoer. The present invention examines the periodicity of the input signal to distinguish music from background noise.
Full Text



The present invention relates to an apparatus for determining an encoding rate for an input signal in a variable rate vocoder. More particularly, the present invention relates to a novel and improved apparatus tor determining speech encoding rate in a variable rate vocoder.
Variable rate speech compression systems typically use some form of rate determination algorithm before encoding begins. The rate determination algorithm assigns a higher bit rate encoding scheme to segments of the audio signal in which speech is present and a lower rate encoding scheme for silent segments. In this way a lower average bit rate will be achieved while the voice quality of the reconstructed speech will remain high. Thus to operate efficiently a variable rate speech coder requires a robust rate determination algorithm that can distinguish speech from silence in a variety of background noise environments.
One such variable rate speech compression system or variable rate vocoder is disclosed in copending US Patent Application Serial No. 07/713,661 filed June 11, 1991 entitled "Variable Rate Vocoder" and assigned to the assignee of the present invention, the disclosure of which is incorporated by reference. In this particular implementation of a variable rate vocoder, input speech is encoded using Code Excited Linear Predictive Coding (CELP) techniques at one of several rates as determined by the level of speech activity. The level of speech activity is determined from the energy in the input audio samples which may contain background noise in addition to voiced speech. In order for the vocoder to provide high quality voice encoding over varying levels of background noise, an adaptively adjusting threshold technique is required to

compensate for the affect of background noise on the rate decision algorithm.
Vocoders are typically used in communication devices such as cellular telephones or personal communication devices to provide digital signal compression of an analog audio signal that is converted to digital form for transmission. In a mobile environment in which a cellular telephone or personal communication device may be used, high levels of

background noise energy make it difficult for the rate determination algorithm to distinguish low energy unvoiced sounds from background noise silence using a signal energy based rate determination algorithm. Thus unvoiced sounds frequently get encoded at lower bit rates and the 5 voice quality becomes degraded as consonants such as "s","x","ch","sh","t", etc. are lost in the reconstructed speech.
Vocoders that base rate decisions solely on the energy of background noise fail to take into account the signal strength relative to the background noise in setting threshold values. A vocoder that bases its threshold levels
10 solely on background noise tends to compress the threshold levels together when the background noise rises. If the signal level were to remain fixed this is the correct approach to setting the threshold levels, however, were the signal level to rise with the background noise level, then compressing the threshold levels is not an optimal solution. An alternative method for
15 setting threshold levels that takes into account signal strength is needed in variable rate vocoders.
A final problem that remains arises during the playing of music through background noise energy based rate decision vocoders. When people speak, they must pause to breathe which allows the threshold levels
20 to reset to the proper background noise level. However, in transmission of music through a vocoder, such as arises in music-on-hold conditions, no pauses occur and the threshold levels will continue rising until the music starts to be coded at a rate less than full rate. In such a condition the variable rate coder has confused music with background noise.
25
SUMMARY OF THE INVENTION
The present invention is a novel and improved method and apparatus for determining an encoding rate in a variable rate vocoder. It is a
30 fist objective of the present invention to provide a method by which to reduce the probability of coding low energy unvoiced speech as background noise. In the present invention, the input signal is filtered into a high frequency component and a low frequency component. The filtered components of the input signal are then individually analyzed to detect the
35 presence of speech. Because unvoiced speech has a high frequency component its strength relative to a high frequency band is more distinct from the background noise in that band than it is compared to the background noise over the entire frequency band.

A second objective of the present invention is to provide a means by which to set the threshold levels that takes into account signal energy as well as background noise energy. In the present invention, the setting of voice detection thresholds is based upon an estimate of the signal to noise ratio (SNR) of the input signal. In the exemplary embodiment, the signal energy is estimated as the maximum signal energy during times of active speech and the background noise energy is estimated as the minimum signal energy during times of silence.
A third objective of the present invention is to provide a method for coding music passing a variable rate vocoder. In the exemplary embodiment, the rate selection apparatus detects a number of consecutive frames over which the threshold levels have risen and checks for periodicity over the number of frames. If the input signal is periodic this would indicate the presence of music. If the presence of music is detected then the thresholds are set at levels such that the signal is coded at full rate.
Accordingly the present invention provides an ^paratus for determining an encoding rate for an input signal in a variable rate vocoder comprising subband energy computation means for receiving said input signal and determining a pluraUty of subband energy values in accordance with a predetermined subband energy computation format; a plurality of subband rate determination means wherein each of said plurality of subband rate determination means is for receiving a corresponding one of said plurality of subband energy values and determining a subband encoding rate in accordance with said corresponding one of said plurality of subband energy values to

provide a plurality of subband encoding rates; and encoding rate selection means for receiving said plurality of said subband encoding rates and for selecting said encoding rate for said input signal in accordance with said plurality of subband encoding rates.
The features, objects and advantages of the present invention will become more apparent from the detailed description set forth below when taken in conjunction with the drawings in which like reference characters identify correspondingly throughout and wherein :
Figure 1 is a block diagram of the present invention.
Referring to Figure 1 the input signal, S(n), is provided to subband energy computation element 4 and subband energy computation element 6. The input signal S(n) is comprised of an audio signal and background noise. The audio signal is typically speech, but it may also be music. In the exemplary embodiment, S(n), is provided in twenty millisecond frames of 160 samples each. In the exemplary embodiment, input signal S(n) has frequency components from 0 kHz to 4 kHz, which is approximately the bandwidth of a human speech signal.
In the exemplary embodiment, the 4 kHz input signal, S(n), is filtered into two separate subbands. The two separate subbands he between 0 and 2 kHz and 2 kHz and 4 kHz respectively. In an exemplary embodiment, the input signal may be divided into subbands filters, the design of

which are well known in the art and detailed in U.S. Patent Application Serial No. 08/189,819 filed February 1, 1994, entitled "Frequency Selective Adaptive Filtering", and assigned to the assignee of the present invention, incorporated by reference herein.
5 The impulse responses of the subband filters are denoted hL(n), for
the iowpass filter, and hH(n), for the highpass filter. The energy of the resulting subband components of the signal can be computed to give the values RL(0) and RH(0)/ simply by summing the squares of the subband filter output samples, as is well known in the art.
10 In a preferred embodiment, when input signal S(n) is provided to
subband energy computation element 4, the energy value of the low frequency component of the input frame, RL(0), is computed as:
L-1
RL(0) = Rs(0)RhL(0)+2. ^RsWRhLO)' (D
i=l 15
where L is the number taps in the Iowpass filter with impulse response hL(n),
where Rs(i) is the autocorrelation function of the input signal, S(n), given by the equation: 20"
N
Rs(i)= XS(n)S(n-i), fori6[0,L-l] (2)
n=l
where N is the number of samples in the frame,
and where RhL is the autocorrelation function of the Iowpass filter hL{n) 25 given by:

L-1
RhL(i )= IhL(n).hL(n-i). n=0 fori€(0,L-l]
= 0 else

(3)

30 The high frequency energy, RH(0)/ is computed in a similar fashion in subband energy computation eleinent 6.
The values of the autocorrelation function of the subband filters can be computed ahead of time to reduce the computational load. In addition, some of the computed values of Rs(i) are used in other computatiorts in the

coding of the input signal, S(n), which further reduces the net
computational burden of the encoding rate selection method of the present
invention. For example, the derivation of LPC filter tap values requires the
computation of a set of input signal autocorrelation coefficients.
5 The computation of LPC filter tap values is well known in the art and
is detailed in the abovementioned U.S. Patent Application 08/004,484. If one were to code the speech with a method requiring a ten tap LPC filter only the values of Rs(i) for i values from 11 to L-1 need to be computed, in addition to those that are used in the coding of the signal, because Rs(i) for i values
10 from 0 to 10 are used in computing the LPC filter tap values. In the exemplary embodiment, the subband filters have 17 taps, L=17.
Subband energy computation element 4 provides the computed value of RL(0) to subband rate decision element 12, and subband energy computation element 6 provides the computed value of RH(0) to subband
15 rate decision element 14. Rate decision element 12 compares the value of RL(0) against two predetermined threshold values TLI/2 and TLMI and assigns a suggested encoding rate, RATEL, in accordance with the comparison. The rate assignment is conducted as follows:
20
RATEL = eighth rate RL(0)^TLI/2 (4)
RATEL= half rate TLl/2 RATEL= full rate RL(0) > TLfull (6)
Subband rate decision element 14 operates in a similar fashion and selects a 25 suggest encoding rate, RATEH/ in accordance with the high frequency energy value RH(0) and based upon a different set of threshold values THI/2 arid THfull- Subband rate decision element 12 provides its suggested encoding rate, RATEL, to encoding rate selection element 16, and subband rate decision element 14 provides its suggested encoding rate, RATEH/ to 30 encoding rate selection element 16. In the exemplary embodiment, encoding rate selection element 16 selects the higher of the two suggest rates and provides the higher rate as the selected ENCODING RATE.
Subband energy computation element 4 also provides the low frequency energy value, RL(0)/ to threshold adaptation element 8, where the 35 tlireshold values TLI/2 and TLfull for the next input frame are computed. Similarly, subband energy computation element 6 provides the high frequency energy value, RH(0), to threshold adaptation element 10, where the threshold values THI/2 and THfull for the next input frame are compute'^

Threshold adaptation element 8 receives the low frequency energy value, RL(0), and determines whether S(n) contains background noise or audio signal. In an exemplary implementation, the method by which threshold adaptation element 8 determines if an audio signal is present is by examining the normalized autocorrelation function NACF, which is given by the equation:
N-1 £e(n).e(n-T)
NACF = max rvr^^^ XT_, ^> (7)
1
2 N-1 , N-1

£e2(n)+ Xe^(n-T)

.n=0 n=0
10 where e(n) is the formant residual signal that results from filtering the input signal, S(n), by an LPC filter.
The design of and filtering of a signal by an LPC filter is well known in the art and is detailed in aforementioned U.S. Patent Application 08/004,484.
15 The input signal, S(n) is filtered by the LPC filter to remove interaction of the formants. NACF is compared against a threshold value to determine if an audio signal is present. If NACF is greater than a predetermined threshold value, it indicates that the input frame has a . periodic characteristic indicative of the presence of an audio signal such as speech or
20 music. Note that while parts of speech and music are not periodic and will exhibit low values of NACF, background noise typically never displays any periodicity and nearly always exhibits low values of NACF.
If it is determined that S(n) contains background noise, the value of NACF is less than a threshold value THl, then the value RL(0) is used to
25 update the value of the current background noise estimate BGNL- In the exemplary embodiment, THl is 0.35. RL(0) is compared against the current value of background noise estimate BGNL- If RL(0) is less than BGNL, then the background noise estimate BGNL is set equal to RL(0) regardless of the value of NACF.
30 The background noise estimate BGNL is only increased when NACF
is less than threshold value THl. li RL(0) is greater than BGNL and NACF is less than THl, then the background noise energy BGNL is set ai'BGXL, where ai is a number greater than 1. In the exemplary embodiment, ai is equal to 1.03. BGNL will continue to increase as long as XACF is less than
35 threshold value THl and RL(0) is greater than the current value of BGXL,
ID

10

until BGNL reaches a predetermined maximum value BGNmax at which point the background noise estimate BGNL is set to BGNmax-
If an audio signal is detected, signified by the value of NACF exceeding a second threshold value TH2, then the signal energy estimate, SL, is updated. In the exemplary embodiment, TH2 is set to 0.5. The value of RL(0) is compared against a current lowpass signal energy estimate, SL- If RL(0) is greater than the current value of SL, then SL is set equal to RL(0). If RL(0) is less than the current value of SL/ then SL is set equal to a2*SL/ again only if NACF is greater than TH2. In the exemplary embodiment, a2 is set to 0.96.
Threshold adaptation element 8 frien computes a signal to noise ratio estimate in accordance with equation 8 below:



SNRL=10 1og

SL
BGNL

(8)



15

Threshold adaptation element 8 then determines an index of the quantized signal to noise ratio ISNRL in accordance with equation 9-12 below:



20

ISNRL = nii^t

SNRL-20'

for 20
(9)



= 0, =7

for SNRL
(10)

25 where nint is a function that rounds the fractional value to the nearest integer.
Threshold adaptation element 8, then selects or computes two scaling factors, kLl/2 and kLfuU/ in accordance with the signal to noise ratio index, 30 ISNRL- An exemplary scaling value lookup table is provided in table 1 below:

ISNRL 0 1 2 3 4 5 6 7


FABLE 1
KLI/2 KLfull
7.0 9.0
7.0 12.6
8.0 17.0
8.6 18.5
8.9 19.4
9.4 20.9
11.0 25.5
15.8 39.8

These two values are used to compute the threshold values for rate selection in accordance with the equations below:
5
TL1/2= KLI/2-BGNL, and (11)
TLfull= KLfuU'BGNL, (12)
where TLI/2 IS low frequency half rate threshold value and
10 TLfuU is the low frequency full rate threshold value.
T]Kreshold adaptation element 8 provides the adapted threshold values TLI/2 and Tuull to rate decision element 12. Threshold adaptation element 10 operates in a similar fashion and provides the threshold values THI/2
15 and THfull to subband rate decision element 14.
The initial value of the audio signal energy estimate S, where S can be SL or SH, is set as follows. The initial signal energy estimate, SINTT/ is set to -18.0 dBmO, where 3.17 dBmO denotes the signal strength of a full sine wave, which in the exemplary embodiment is a digital sine wave with an
20 amplitude range from -8031 to 8031. SjNIT is used until it is determined that an acoustic signal is present.
The method by which an acoustic signal is initially detected is to compare the NACF value against a threshold, when the NACF exceeds the threshold for a predetermined number consecutive frames, then an acoustic
25 signal is determined to be present. In the exemplary embodiment, NACF must exceed the threshold for ten consecutive frames. .After this condition is met the signal energy estimate, S, is set to the maximum signal energy in the preceding ten frames.
The initial value of the background noise estimate BGN'L is initially
30 set to BGNmax- As soon as a subband frame energy is received that is less

than BGNmaXf the background noise estimate is reset to the value of the received subband energy level, and generation of the background noise BGNL estimate proceeds as described earlier.
In a preferred embodiment a hangover condition is actuated when
5 following a series of full rate speech frames, a frame of a lower rate is detected. In the exemplary embodiment, when four consecutive speech frames are encoded at full rate followed by a frame where ENCODING RATE is set to a rate less than full rate and the computed sigiul to noise ratios are less than a predetermined Minimum SNR, the
10 ENCODING RATE for that frame is set to full rate. In the exemplary embodiment the predetermined minimum SNR is 27.5 dBas defined in equation 8.
In the preferred embodiment, the number of hangover frames is a function of the signal to noise ratio. In the exemplary embodiment, the
15 number of hangover frames is determined as follows:

#hangover frames = 1 22.5 #hangover frames = 2 SNR ^ 22.5, (14)
#hangover frames = 0 SNR > 27.5. (15)
20
The present invention also provides a method with which to detect the presence of music, which as described before lacks the pauses which allow the background noise measures to reset. The method for detecting the presence of music assumes that music is not present at the start of the call.
25 This allows the encoding rate selection apparatus of the present invention to properly estimate and initial background noise energy, BGNinit- Because music unlike background noise has a periodic characteristic, the present invention examines the value of NACF to distinguish music from background noise. The music detection method of the present invention
30 computes an average NACF in accordance with the equation below:
1 "^
NACFAVE=:flNACF(i), (16)
i=l
where NACF is defined in equation 7, and 35 where T is the number of consecutive frames in which the estimated value of the background noise has been increasing from an initial background noise estimate BGNIJSJIT-
'3

If the background noise BGN has been increasing for the predetermined number of frames T and NACFAVE exceeds a predetermined threshold, then music is detected and the background noise
5 BGN is reset to BGNinit- It should be noted that to be effective the value T must be set low enough that the encoding rate doesn't drop below full rate. Therefore the value of T should be set as a function of the acoustic signal and BGNinit-
The previous description of the preferred embodiments is provided
10 to enable any person skilled in the art to make or use the present invention. The various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without the use of the inventive faculty. Thus, the present invention is not intended to be limited to the
15 embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.



WE CLAIM :
1. An apparatus torn detonating an encoding rate for an input signal in a
variable rate locoer comprising subband energy computation means
(4,6) tore receiving said input signal and determining a plurality of
subband energy values in accordance with a predetermined subband
energy computation format a plurality of subband rate determination
means (12, 14) wherein each of said plurality of subband rate
determination means is for receiving a corresponding one of said plurality
of subband energy values and determining a subband encoding rate in
accordance with said corresponding one of said plurality of subband
energy values to provide a plurality of subband encoding rates; and
encoding rate selection means (16) for receiving said plurality of said
subband encoding rates and for selecting said encoding rate for said input
signal in accordance with said plurality of subband encoding rates.
2. The apparatus as claimed in claim 1 wherein a threshold computation
means (8, 10) is disposed between said subband energy computation
means and said rate determination means for receiving said subband
energy values and for determining a set of encoding rate threshold values
in accordance with said plurality of subband energy values.
3. An apparatus for determining an encoding rate for an input signal in a
variable rate locoer, substantially as herein described, with reference to
the accompanying drawings.


Documents:

849-mas-95 abstract.jpg

849-mas-95 abstract.pdf

849-mas-95 claims.pdf

849-mas-95 correspondence-others.pdf

849-mas-95 correspondence-po.pdf

849-mas-95 description (complete).pdf

849-mas-95 drawings.pdf

849-mas-95 form-1.pdf

849-mas-95 form-26.pdf

849-mas-95 form-4.pdf

849-mas-95 form-9.pdf

849-mas-95 others document.pdf

849-mas-95 others.pdf


Patent Number 192681
Indian Patent Application Number 849/MAS/1995
PG Journal Number 30/2009
Publication Date 24-Jul-2009
Grant Date 03-Feb-2005
Date of Filing 07-Jul-1995
Name of Patentee M/S. QUALCOMM INCORPORATED
Applicant Address 6455 LUSK BOULEVARD SAN DIEGO, CALIFORNIA 92121
Inventors:
# Inventor's Name Inventor's Address
1 NE NE
PCT International Classification Number N/A
PCT International Application Number N/A
PCT International Filing date
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 NA