Indian Patents. 207734:"RESIDUAL ECHO ESTIMATION FOR ECHO CANCELLATION"

Title of Invention	"RESIDUAL ECHO ESTIMATION FOR ECHO CANCELLATION"
Abstract	An acoustic echo cancellation system comprising : an adaptive filter to create an echo estimate and an error term that includes a residual echo ; and a residual echo estimator to independently calculate estimates of a real portion and an imaginary portion of a discrete Fourier transform (DFT) of the residual echo.

Full Text	RESTDUAL ECHO ESTIMATION FOR ECHO CANCELLATION Field The present invention relates generally to echo cancellation systems, and more specifically to systems for reducing residual echo in echo cancellation systems. Background of the Invention In "spcakerphone" applications, the "near-talker" is the person using the. speakerphone, and the "far-talker" is The person on the far end of the telephone line. The far-talker's speech is broadcast (played through a speaker) into the room (or other acoustic enclosure) tnat houses the speakerphone and the near-talker, An echo is produced by the far-talker's speech propagating through the room and being 'subsequently received the the microphone. Acoustic echo cancelers (AEC) are used to cancel the echo received ;at the microphone. The acoustic echo canceler is typically an adaptive filler that models the various echo paths in the room, Extreme cases exist where the acoustic echo canceler briefly fails to cancel the echo, such as prolonged double-talk or rapid near-talker movement. In these cases, a far-talker might hear a short burst of his/her own voice as an echo. Residual echo suppression techniques attempt to remove the echo than remains after acoustic echo cancellation, thereby preventing the far-talker from hairing such bnrsis while still allowing the near-talker's voice to pass through undisturbed. Residual echo suppression techniques have been the subject of research, and papers have been published describing the work, Examples include: S. Giistafssou et.a,l"Combined Acoustic Echo Control and Noise Reduction for ll ands-Free Telephony." Signal Processing. Vol. 64, pp.-21-32, 1998, hereinafter referred lo as "Guslafsson;" and V. Turbin. et at,. "Using Psycho acoustic Criteria in Acoustic Echo Cancellation Algorithms," Proc. IWAENC'97, London, pp. 53-56. Sept.. 1997. hereinafter referred to as "Turbin." Despite ongoing research, current residual echo suppression techniques ate 1101 completely effective in removing residual echo. Brief Dcscription of the ACCOMPANYING DRAWINGS Figure I shows an acoustic echo cancellation system with residual echo estimation; Figure 2 shows a first portion of a cross-term calculator; Figure 3 shows a second portion of a cross-term calculator; and Figure 4 shows speech activity detectors in a speakerphone context. Description of Embodiments In the following detailed description of the embodiments, reference is made 1.0 the accompanying drawings that, show, hy way of illustration, specific embodiments in which the invention may he practiced, In the drawings, like numerals describe substantially similar components throughout the several views. These embodiments are described in sufficient detail to enable those skilled in the art i.o practice the invention. Other embodiments may be utilized and structural, logical, And electrical changes may be made without, departing from the scope of the present invention. Moreover, it is to be understood that the various embodiments of the invention, although different, are not necessarily mutually exclusive. For example, a particular feature. siruerure. or characteristic described in one embodiment may be included within other embodiments. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the. present invention is defined only by the appended claims, along with the full scope of equivalents to which such claims are entitled. The method and apparatus of the present, invention provide a mechanism 10 estimate residual echo in an acoustic echo cancellation system, An adaptive filler produces an estimate of the echo, and subtracts the estimate of the echo from the microphone input signal,. When the echo, and the estimate of the echo, do not closely match, an undesirable residual echo remains in the signal. The method and apparatus of the present invention provide a residual echo estimator that separately estimates real and imaginary components of the discrete Fourier transform (DFT) of' the residual echo wiihout assuming statistical independence between the various components of 'the microphone input signal. The residual echo estimator calculates the real and imaginary components using an approximation of the microphone input signal without the echo, A different approximation is used depending on whether the near-talker and/or the far-talker are talking. Figure 1 shows ; spenkerphone with echo cancellation and residual echo estimation. Spenkerphone 102 includes adaptive filter 112. residual echo estimator (REE) 1 16. summer 124. noise reduction (NR) circuit 120. speaker 106, and microphone 128. Spenkerphone 102 can include many other circuits such as processors, amplifiers, mid the like. Many of these other circuits are omitted from Figure 1 to more clearly show novel portions of the present invention, Speakerphone 102 includes speaker 106 that plays speech from the far-talker. Far-talker speech is received from the far end on mode 104, and is played through speaker 106, Speech data on node 104 is also delayed by delay circuit 108 prior to being operated upon by adaptive filter 112. In some embodiments, speech data on node 104 is digital data. In these embodiments, a digital-to-analog (D/A) converter converts digital data to an analog waveform for piny by speaker 106. Also in these embodiments, delay circuit JOB can be a digital delay element such as a register file or shift register, In other embodiments, data on node 104 is analog data. In these embodiments, delay circuit 108 can be an analog delay element, such as a, lumped or distributed analog delay tine. Delay circuit 108 can be a fixed delay or a variable delay. In general, when the echo path is larger, the delay in delay circuit 108 is larger. For example, when speakerphone 102 is used in a car, the echo path is short and the delay of delay circuit 108 is short. Also for example, when speakerpbone 102 is used in a large conference room, the echo path is longer, and the delay of delay circuit 108 is correspondingly longer,' In operation, speaker 106 plays the far-end speech into an acoustic environment. An echo signal y8 from the acoustic enclosure is received at microphone I2K, The effect of the acoustic enclosure on the echo is represented by block 130. which contains transfer function H1. That is to say, speech played from speaker 106 is opcraied upon by transfer function H1, and is then received at microphone 128. In addition to echo signal y0, microphone 128 also receives near-end speech signal and near-end noise signal vu. Near-end speech signal represents speech from uear-tatker, 136 as modeled by transfer function H?! shown in block 132. and noise signal Vurepresents noise 138 as modeled by transfer function shown in block 134, Microphone 128, therefore, receives the sum of and This is shown at node 126 as Zu. and , at least to some degree, represent undesirable components of For example, ya is the echo caused by the coupling of speaker 106 and microphone 128. When a large component of yo, does not get canceled, and becomes part of the signal sent back to the far-end, the far-talker can hear his/her voice after undergoing a delay, and this can cause confusion during a phone call. Also for example, if Vuis unduly large, iheti the far-talker hears a large amount of noise, and this can interrupt a phone conversation. In contrast to the above, the near-end speech signal is, in general, a desirable component of When the near-end speech is faithfully transmitted to the far end. then effective communications can take place, Adaptive filter 112 approximates the transfer function of the echo as , and generates an estimate yo of the echo signal from the delayed far-end speech signal xu. Summer 124 subtracts y from to create on node 122, Adoptive filter 112 can be any filter suitable for use in an adaptive echo cancellation system. Examples include, but are not limited to: S. Gay, S Tavathia, "The fast affine projection algorithm". Proc. IEEE ICASSP. Detroit, USA, 1995, pp. 3023-3026; and G, Glenitis, K. Berberidis. S. Theodoridis, "A unified view; Efficient least squares adaptive algorithms for FIR transversal filtering," IEEE Signal Processing Magazine, vol. 16. pp. 13-41. July 1999. and are input to residual echo estimator 116 to generate an estimate rnof the residual echo. The residual echo estimate and ire input to noise reduction (NR) circuit 120 to reduce the residual echo in er Ideally, " would match exactly and en would contain only the near-end speech signal s0 and perhaps some background noise vl(. However, this is practically impossible and e0 always contains. ... some amouni of residual echo When the far-talker is silent (xn - 0). the input to the noise reduction algorithm is simply speech plus background noise. When the far-talker is active, the "noise" (or undesirable components} includes the background noise vB along with echo yt. Noise reduction circuit 120 uses a noise spectral estimator to track the near-stationary background noise. The quasi-stationary residual echo is estimated separately via the residual echo estimator (REE) NR is then applied using a composite "noise" estimate. In some embodiments, noise reduction circuit 120 leaves a small amount of near-stationary background noise to mask residual echo. This can be accomplished using algorithms discussed in1 Turbin (see background, above), Any of a, large class of speech spectral estimators could be used to modify the noise power spectral estimate to include residual echo, Examples of suitable noise reduction algorithms and circuits can be found in; S.F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans, Acoust Speech Signal Process. ASSP-2? (April 1979) 113-120; and Y. Ephraim, D. Maiafa, "Speech enhancement using a. minimum mean-square error short-time spectral amplitude estimator," IEEE Trans. Acoust. Speech Signal Process. 32(6) (December 1984) 1109-1121. Residual Echo Estimation Prior work by Gustafsson (see background, above) assumes that the a priori residual echo statistics may be estimated via a rent-valued deterministic mapping function Fu applied to the adaptive filter output. The mapping function is where capital letters represent the discrete Fourier transform (DFT) components of the corresponding variables represented by small letters. For example, En(k) represents DFT components of eu. Gustafsson omits details on estimation of E{\|Za(k)j2}, E{\|E0(k)\|2}, and E{ \| Yn(k) \|2} from their work cited above, but they are typically estimated using a 'leaky average" of sample values. Experimental results obtained using Euclidean distances between the true and estimated residual echo power spectrum to measure ihe success of the estimator showed that instantaneous values give very good results, Also in Gusmfsson, F,, was divided into subbands and each Fu(k) was replaced by the average value in its suhband. although the number and width of the subhmids were not discussed. Where E{\| Yn (k)\|2} was too small, FB(k) was not included in the average. Experimental results indicate that low estimation error is achieved when the subband size is otie OFT bin. Equations (1) and (2), above, make two assumptions: that Fu is real-valued, and thai sn. v0. and y,, are statistically independent. Experimental results using equations (])and (2) for residua! echo estimation yielded signai-to-noise ratios of the estimated residual echo power spectrum of, at best, about 2.0 dB, The derivation of equation (2) is now discussed, with the two assumptions in mind. The two assumptions are removed, and the results are then presented, The SNR-Iike measurement was The residual echo spectral components are relaied to the echo Yn(k) by defining a real-valued transfer function Fo such that Assuming thai the mapping function Fu is real-valued, substituting (4) into (3) and yields which leads to Using the relationship of (4) in (6) yields Assuming that su, vu. mid yu are statistically independent and zero mean, the following relationships hold true. Subtracting (8b) from (8a) and substituting from (6) and (7) yields r - i Solving the quadratic equation for F,(k) yields two solutions, Fn(k) = t and the more meaningful solution: Experimental results with su = 0 showed that the error in the deterministic mapping is due lo the two assumptions in the derivation; that F is real-valued; and thill su. vu, and yu are statistically independent. Experimental results showed that a more accurate residual echo estimate was achieved with anon-real-vaiued mapping function F that, had been derived without the assumption of statistical independence. Removal of Assumption that F is Real-Valued Letting and carrying this definition through (4) mid (5), into (6) and (7). and finally into (9) leads to a family of solutions to the quadratic equation. When F, h(k) is forced to zero, the solution of {10} is realized. When Fru(k) is forced lo zero, the solution is Experimental results' showed that this choice was not necessarily better than that already proposed Restructuring the problem, however, leads to an unambiguous mapping. Instead of (4). separate mapping functions can be used for the real and imaginary pails of Rs(k). This leads to separate solutions for the real and imaginary parts. Equation (10) is replaced by two solutions of the form This approach lends to a significant improvement in the experimental results. The residual echo estimate improved to 45 dB SNR (noise free case with no near talker). The nunierien.ll)' well-behaved formulation shown in (16), below, was also used to attain this resuli. Removal of Assumption of Statistical Independence Removing the statistical independence assumption rums equation (10) inio the following: When combined wilh (13). this becomes two equations of the form The cross-term depends on two unknowns, the speech and noise spectral component. The uncertainty in the calculation of the mapping function has been isolated to the cross-ierm estimate. Several choices were investigated for approximation of the cross-term including + S .flfcYt Y (k). Distortion of the near-end was measured as the Itakura-Sailo distortion between ifl and s0, Far-end suppression was measured using the echo return loss enhancement (ERLE). Including a cross-term was found to measurably improve residual echo estimation during double-talk, However, the near-end was not free of distortion and some residual echo was still audible. Results (discussed below) are shown in Table 1 Avoiding Numerical Sensitivity Equation (7) is sensitive to values of Fu(k) near one. The alternative formula!imi above also suffers from this problem, In one embodiment, this situation is avoided using and set RB(k) lo zero when is small { 11 substantially improved ihe experimental resulis. The residual echo estimate as calculated in {I) improved from approximately 2 dB to approximately 11.5 dB SNR (noise free case with no near talker) with this modification. ISD - Near 1.219 0.124 0.456 0.156 0,220 ERLE - Far 53.6 dB 46.9 dB -22.7 dB -1.9 dB 43.0 dB Table I. Sample results for simulated near, far, and double talk scenarios with and without noise for various cross-terras. Table 1 shows experimental results for simulations with and without near-end noise. Speech distortion during near-end single-talk was measured as the Itakura-Saito distortion ISD between Sn and s0. These results are shown in rows labeled "ISD - Near." The clean signal s0 was available since a room simulator was used to generate the microphone input zB. The Itakura-Saito distortion was also used to measure dcwhte-talk performance. These results are shown in rows labeled "ISD -DT" An average ERLE was used to measure performance during far-end single-talk. These results are shown in rows labeled "ERLE - Far," Instead of normalizing by Ihe microphone input, the ERLE was normalized by the near-end signal, st. In both noisy and noise-free cases, the residual echo was well suppressed (-23 dB ERLE) during far-end single-talk when the cross term is set to zero. The cross terms appear io be important when the near-end is active In these cases, better performance was obtained using a guess at the cross term. During double-talk, substituting for the cross term provided very good results (10-14x improvement in ISD). During near-end single-talk, using in piace of the cross term gave very good results when background noise was present (8x improvement in ISD), In both these cases, residual echo suppression was mild and distortion of the far-end was audible. The results shown above in Table form are now shown in separate equations for each of the near-end single talk, far-end single talk, and double talk situations. For near-end single talk, the real mid imaginary parts of the residual echo are estimated by a system using the following two equations. A residual echo estimator capable of"calculating the cross-terms a.s shown above is shown in. and described with reference to, ihe remaining figures. Figure 2 shows a block diagram of a first portion of a residual echo estimator. Circuit 200 generates the real part of spectral components of f eu and zr Signals yn, en and zn are shown on nodes 114, 122: and 126, respectively, which are input to window functions 202. 204, and 206 respectively. The index "n" is the block index. Any suitable windowing function can be applied, including rectangular, Hamming, or Hann windows. In some embodiments, window functions 202,204, and 206 are omitted. In some embodiments. AEC and NR are performed in the time domain. In these embodiments. Fast Fourier Transform (FFT) blocks 208, 210, and 212 are used to generate frequency domain representations of the signals J> , eb and zu for use in the residual echo estimator. In other embodiments, AEC and NR are performed in the frequency domain, and window blocks 202, 204, and 206, as well as FFT blocks 208. 210. and 212 are not present. FFT block 208 generates on node 220, and provides the same to squaring function 214 which generates on node 222. FFT block 210 generates and squaring function 216 generates on node 224. FFT block 212 generates on node 232, and provides the same to squaring function 218 which generates on node 232. Summer 226 receives , and on nodes 222. 224. and 230, respectively, and produces on node 228. Circuit 200 generates the real part of spectral components of and complete residual echo estimator includes another circuit corresponding to circuit 200 that generates the imaginary parts of the same specttal components. Figure 3 shows a block diagram of a second portion of a residual echo estimator. Circuit 300 receives signals on nodes 232, 220, 228, and 222, which correspond to like-numbered nodes in Figure 2. Circuit 300 also receives a signal on node 302, which corresponds to the spectral components of a previous output from a noise reduction circuil, such as noise reduction circuit 120 (Figure 1). Summer 304 and multipliers 303 and 305 produce which is used as the cross-term when a double-talk event is present. Multiplier 308 produces which is used when there is near-end speech only. Multipliers 306 and 312 implement ihe switching between cross-terms when different speech activity is preseui. In some embodiments (soft, decision embodiments), the cross-terms produced on nodes 310 and 314 are produced in varying amplitudes, and are summed by summer 316. As a result, ihe numerator produced on node 318 includes components of both'iypes of cross-terms. In soft decision embodiments, multipliers 306 and 312 produce variable ouiput values that are a function of both the spectral components iind a and p. and the ouiputs are summed by summer 316. In other embodiments, (hard decision embodiments), a and 0 take on values of "I" or "0." and multipliers 306 and 312 act as switches that either pass the input through, or do not pass the input through. For example, when a is a "1," multiplier 306 acts JUS a switch and passes to node 310. In this case, multiplier 306 acts as a closed switch and multiplier 312 acts as an open switch. Also for example, when p is a "1 .' multiplier 312 passes to node 314. In this case, multiplier 312 acts as a closed switch and multiplier 306 acts as an open switch. The generation of a and (3 is discussed with reference to Figure 4. Divider 320 divides the numerator on node 318 by on node 222 to produce on node 322. on node 322 represents the estimate of the square of the real par! of ihe spectral component of the residual echo. Summers 304 mid 316, and multipliers 306, 308, and 312 form a cross-term calculator thai calculates the various cross-terms shown above in Table 1, One skilled in the an will understand that other embodiments can be used for cross-term calculators without departing from the scope of the present invention. For example, all or pan of circuit 300 can be implemented in software. The software can be 15 executed on general purpose computer, a digital signal processor (DSP), or other processor. Also for example, all or part of circuit 300 can be implemented in special purpose hardware, such as an application specific integrated circuit (ASIC). Figure 4 shows a block diagram of speech activity detectors in a speakerphone context. Double talk detector 402 receives the far-end audio on node 104. and also receives the near-end audio on node 126 captured by microphone 128. In hard decision embodiments, double-talk detector 402 sets a equal to one when double-talk is active and sets a equal to zero otherwise Speech activity detector 404 leceives the near-end audio on node 126 and determines if speech signals are present at the microphone. AND gate 406 receives the output of speech activity detector 404 and the logical inverse of a on node 408, Gate 406 generates variable ß on node 410, In hard decision embodiments, p is set to one if there is no double-talk but there is near-end speech activity. Otherwise ß is set to zero, In soft decision embodiments, a and p take on real values between zero and one, inclusively, For example, in one embodiment, a takes on a value between zero and one, and ß is generated by a multiplier rather than gale 406. In such an embodiment, the multiplier generates p as p-(l-a) *(speech activity detector output). Ii is to be understood that the above description is intended to be illustrative, and not restrictive. Many' other embodiments will be apparent to those of skill in the art upon reading and understanding the above description. The scope of the invention should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled, WE CLAIM: 1. An acoustic echo cancellation system comprising : an adaptive filter to create an echo estimate and an error term that comprises a residual echo; a residual echo estimator to independently calculate estimates of a real portion and an imaginary portion of a discrete Fourier transform (DFT) of the residual echo, wherein the residual echo estimator has: a first input node to receive a composite signal comprising near-end speech, noise, and an echo of far-end speech; a second input node to receive the echo estimate; and a third input node to receive a difference between the echo estimate and the composite signal; and wherein the residual echo estimator is adapted / configured to calculate the real and imaginary portions of the DFT of the residual echo using an average of the composite signal and a previous output sample as an estimate of the near-end speech and the noise when both the near-end speech and the far-end speech are active. 2. The acoustic echo cancellation system as claimed in claim 1, wherein the residual echo estimator is adapted / configured to calculate the real and imaginary portions of the DFT of the residual echo using the composite signal as an estimate of the near-end speech and the noise when the near-end speech is active and far-end speech is quiet. 3. The acoustic echo cancellation system as claimed in claim 1, wherein there is provided a noise reduction circuit responsive to the real and imaginary portions; of the DFT of the residual echo and the difference between the echo estimate and the composite signal. 4. An acoustic echo cancellation system, comprising : an adaptive filter to create an echo estimate and an error term that has a residual echo ; a residual echo estimator to independently calculate estimates of a real portion and an imaginary portion of a discrete Fourier transform (DFT) of the residual echo, wherein the residual echo estimator has: -'18- a first input node to receive a composite signal comprising near-end speech, noise, and an echo of far-end speech; a second input node to receive the echo estimate; and a third input node to receive a difference between the echo estimate and the composite signal; and wherein the residual echo estimator is adapted / configured to calculate the real and imaginary portions of the DFT of the residual echo using a value of substantially zero as an estimate of the near-end speech and the noise when the near-end speech is quiet and the far-end speech is active. 5. The acoustic echo cancellation system as claimed in claim 4, wherein the residual echo estimator is adapted / configured to calculate the real and imaginary portions of the DFT of the residual echo using the composite signal as an estimate of the Bear-end speech and the noise when the near-end speech is active and far-end speech is quiet. 6. The acoustic echo.cancellation system as claimed in claim 4, wherein there is provided a noise reduction circuit responsive to the real and imaginary portions of the DFT of the residual echo and the difference between the echo estimate and the composite signal. 7. A speakerphone comprising : an output port to be coupled to a speaker to project far-end speech into an acoustic environment; an input port to be coupled to a microphone to receive from the acoustic environment a composite signal comprising near-end speech, noise, and an echo of the far-end speech; an adaptive filter to produce an estimate of the echo of the far-end speech; and a residual echo estimator to independently estimate real and imaginary parts of the DFT of the difference between the echo of the far-end speech and the estimate of the echo of the far-end speech, wherein the residual echo estimator has a cross-term calculator to calculate an estimate of a cross-term resulting from a lack of an assumption that the near-end speech, the noise, and the echo of the far-end speech are statistically independent and to estimate a sum of the near-end speech and the noise differently based on whether the far-end speech or the near-end speech have energy above a threshold. 8. The speakerphone as claimed in claim 7, wherein the cross-term calculator is adapted / configured to estimate the sum as the composite signal when a near-end speech activity detector indicates that the near-end speech has energy above the threshold and a double-talk detector indicates the far-end speech has energy below a threshold. 9. A speakerphone comprising : an output port to be coupled to a speaker to project far-end speech into an acoustic environment; an input port to be coupled to a microphone to receive from the acoustic environment a composite signal comprising near-end speech, noise, and an echo of the far-end speech; an adaptive filter to produce an estimate of the echo of the far-end speech; a residual echo estimator to independently estimate real and imaginary parts of the DFT of the difference between the echo of the far-end speech and the estimate of the echo of the far-end speech, wherein the residual echo estimator has a cross-term calculator to calculate an estimate of a cross-term resulting from a lack of an assumption that the near-end speech, the noise, and the echo of the far-end speech are statistically independent; a noise reduction circuit responsive to the real and imaginary parts of the residual echo to produce estimates of the near-end speech ; and a circuit to estimate a sum of the near-end speech and the noise, the circuit being responsive to a double-talk indicator, and being adapted / configured to estimate the sum as an average of the composite signal and a previous estimate of the near-end speech. 10. The speakerphone as claimed in claim 7, which is implemented in a computer. 11. A method of estimating residual echo in an acoustic echo cancellation system, comprising separately estimating real and imaginary parts of the DFT of the residual echo, by the steps of: estimating the real part of the DFT of the residual echo from a real part of a DFT of a near-end composite signal, a real part of a DFT of an adaptive filter output signal, a real part of a DFT of an error signal generated by a difference between the near-end composite signal and the adaptive filter output signal, and an approximation of a sum of a real part of a DFT of a near-end speech signal and a real part of a DFT of a near-end noise signal. 12. The method as claimed in claim 11, wherein the square of the real part of the DFT of the residual echo is estimated according to an equation : wherein: is the real part of the DFT of the residual echo; is the real part of the DFT of the error signal; is the real part of the DFT of the near-end composite signal; is the real part of the DFT of the adaptive filter output signal; and represents the approximation of the sum of the real part of the DFT of the near-end speech signal and the real part of the DFT of the near-end noise signal. 13. The method as claimed in claim 12, which involves : using the real part of the DFT of the near-end composite signal to approximate the sum of the real part of the DFT of the near-end speech signal and the real part of the DFT of the near-end noise signal when a near-talker is quiet. 14. The method as claimed in claim 13, wherein the square of the real part of the DFT of the residual echo is estimated according to an equation: wherein : and fm(k) are as defined in claim 12. 15. The method as claimed in claim 12, which involves : using a value of substantially zero to approximate the sum of the real part of the DFT of the near-end speech signal and the real part of the DFT of the near-end noise signal only when a near-end talker is quiet. 16. The method as claimed in claim 15, wherein the square of the real part of the DFT of the residual echo is estimated according to an equation : wherein : and are as defined in claim 12. 17. The method as claimed in claim 11, which involves : using a sum of the real part of the DFT of the near-end composite signal and a DFT of a previous output from the acoustic echo cancellation system to approximate the sum of the real part of the DFT of the near-end speech signal and the real part of the DFT of the near-end noise signal when a double-talk event is present. 18. The method as claimed in claim 17, wherein the square of the real part of the DFT of the residual echo is estimated according to an equation: wherein : and are as defined in claim 12, and represents the real part of the DFT of the previous output from the acoustic echo cancellation system. 19. The method as claimed in claim 11, wherein the step of separately estimating involves : estimating the imaginary part of the DFT of the residual echo using an equation that consists a cross-term resulting from a lack of an assumption that signal components in a near-end composite signal received at a microphone are statistically independent. 20. A method of estimating residual echo in an acoustic echo cancellation system, comprising separately estimating real and imaginary parts of the DFT of the residual echo, wherein the step of separately estimating involves estimating the imaginary part of the DFT of the residual echo using an equation that consists a cross-term resulting from a lack of an assumption that signal components in a near-end composite signal received at a microphone are statistically independent, wherein the -21- cross-term comprises a product of an imaginary part of a DFT of an echo estimate and a sum of an imaginary part of a DFT of a near-end speech signal and an imaginary part of a DFT of a near-end noise signal, and wherein estimating the imaginary part of the DFT of the residual echo involves : substituting a different product for the cross-term as a function of whether the near-end speech signal contains sufficient energy and whether the far-end speech signal contains sufficient energy. 21. A method of estimating residual echo in an acoustic echo cancellation system, comprising separately estimating real and imaginary parts of the DFT of the residual echo, wherein the step of separately estimating involves estimating the imaginary part of the DFT of the residual echo using an equation that consists a cross-term resulting from a lack of an assumption that signal components in a near-end composite signal received at a microphone are statistically independent, wherein the square of the imaginary part of the DFT of the residual echo is estimated according to an equation wherein: is the imaginary part of the DFT of the residual echo; is the imaginary part of the DFT of the error signal; is the imaginary part of the DFT of the near-end composite signal; is the imaginary part of the DFT of the adaptive filter output signal; and represents the approximation of the sum of the imaginary part of the DFT of the near-end speech signal and the imaginary part of the DFT of the near-end noise signal. 22. The method as claimed in claim 21, which involves : using the imaginary part of the DFT of the near-end composite signal to approximate the sum of the imaginary part of the DFT of the near-end speech signal and the imaginary part of the DFT of the near-end noise signal when a near-talker is quiet. 23. The method as claimed in claim 22, wherein the square of the imaginary part of the DFT of -22- the residual echo is estimated according to an equation : wherein ; are as defined in claim 21. 24. The method as claimed in claim 21, which involves : using a value of substantially zero to approximate the sum of the imaginary part of the DFT of the near-end speech signal and the imaginary part of the DFT of the near-end noise signal only when a near-end talker is quiet, 25. The method of claim 24, wherein the square of the imaginary part of the DFT of the residual echo is estimated according to an equation: wherein : are as defined in claim 21. 26. A method of estimating residual echo in an acoustic echo cancellation system, comprising : separately estimating real and imaginary parts of the DFT of the residual echo, wherein the step of separately estimating involves estimating the imaginary part of the DFT of the residual echo using an equation that consists a cross-term resulting from a lack of an assumption that signal components in a near-end composite signal received at a microphone are statistically independent, and using a sum of the imaginary part of the DFT of the near-end composite signal and a DFT of a previous output from the acoustic echo cancellation system to approximate the sum of the imaginary part of the DFT of the near-end speech signal and the imaginary part of the DFT of the near-end noise signal when a double-talk event is present. 27. The method as claimed in claim 26, wherein the square of the imaginary part of the DFT of the residual echo is estimated according to an equation: -23- wherein: are as defined in claim 21, and Si(n-1) (k)represents the imaginary part of the DFT of the previous output from the acoustic echo cancellation system,. 28. An acoustic echo cancellation system, substantially as herein described, particularly with reference to the accompanying drawings. 29. A speakerphone, substantially as herein described, particularly with reference to the accompanying drawings. 30. A method of estimating residual echo in an acoustic echo cancellation system, substantially as herein described, particularly with reference to the accompanying drawings. -24- An acoustic echo cancellation system comprising : an adaptive filter to create an echo estimate and an error term that includes a residual echo ; and a residual echo estimator to independently calculate estimates of a real portion and an imaginary portion of a discrete Fourier transform (DFT) of the residual echo.

Full Text

RESTDUAL ECHO ESTIMATION FOR ECHO CANCELLATION
Field
The present invention relates generally to echo cancellation systems, and more specifically to systems for reducing residual echo in echo cancellation systems.
Background of the Invention
In "spcakerphone" applications, the "near-talker" is the person using the. speakerphone, and the "far-talker" is The person on the far end of the telephone line. The far-talker's speech is broadcast (played through a speaker) into the room (or other acoustic enclosure) tnat houses the speakerphone and the near-talker, An echo is produced by the far-talker's speech propagating through the room and being 'subsequently received the the microphone. Acoustic echo cancelers (AEC) are used to cancel the echo received ;at the microphone. The acoustic echo canceler is typically an adaptive filler that models the various echo paths in the room, Extreme cases exist where the acoustic echo canceler briefly fails to cancel the echo, such as prolonged double-talk or rapid near-talker movement. In these cases, a far-talker might hear a short burst of his/her own voice as an echo.
Residual echo suppression techniques attempt to remove the echo than remains after acoustic echo cancellation, thereby preventing the far-talker from hairing such bnrsis while still allowing the near-talker's voice to pass through undisturbed. Residual echo suppression techniques have been the subject of research, and papers have been published describing the work, Examples include: S. Giistafssou et.a,l"Combined Acoustic Echo Control and Noise Reduction for
ll ands-Free Telephony." Signal Processing. Vol. 64, pp.-21-32, 1998, hereinafter referred lo as "Guslafsson;" and V. Turbin. et at,. "Using Psycho acoustic Criteria in Acoustic Echo Cancellation Algorithms," Proc. IWAENC'97, London, pp. 53-56.

Sept.. 1997. hereinafter referred to as "Turbin." Despite ongoing research, current residual echo suppression techniques ate 1101 completely effective in removing residual echo.
Brief Dcscription of the ACCOMPANYING DRAWINGS
Figure I shows an acoustic echo cancellation system with residual echo estimation;
Figure 2 shows a first portion of a cross-term calculator; Figure 3 shows a second portion of a cross-term calculator; and Figure 4 shows speech activity detectors in a speakerphone context.
Description of Embodiments
In the following detailed description of the embodiments, reference is made 1.0 the accompanying drawings that, show, hy way of illustration, specific embodiments in which the invention may he practiced, In the drawings, like numerals describe substantially similar components throughout the several views. These embodiments are described in sufficient detail to enable those skilled in the art i.o practice the invention. Other embodiments may be utilized and structural, logical, And electrical changes may be made without, departing from the scope of the present invention. Moreover, it is to be understood that the various embodiments of the invention, although different, are not necessarily mutually exclusive. For example, a particular feature. siruerure. or characteristic described in one embodiment may be included within other embodiments. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the. present invention is defined only by the appended claims, along with the full scope of equivalents to which such claims are entitled.
The method and apparatus of the present, invention provide a mechanism 10 estimate residual echo in an acoustic echo cancellation system, An adaptive filler

produces an estimate of the echo, and subtracts the estimate of the echo from the microphone input signal,. When the echo, and the estimate of the echo, do not closely match, an undesirable residual echo remains in the signal. The method and apparatus of the present invention provide a residual echo estimator that separately estimates real and imaginary components of the discrete Fourier transform (DFT) of' the residual echo wiihout assuming statistical independence between the various components of 'the microphone input signal. The residual echo estimator calculates the real and imaginary components using an approximation of the microphone input signal without the echo, A different approximation is used depending on whether the near-talker and/or the far-talker are talking.
Figure 1 shows ; spenkerphone with echo cancellation and residual echo estimation. Spenkerphone 102 includes adaptive filter 112. residual echo estimator (REE) 1 16. summer 124. noise reduction (NR) circuit 120. speaker 106, and microphone 128. Spenkerphone 102 can include many other circuits such as processors, amplifiers, mid the like. Many of these other circuits are omitted from Figure 1 to more clearly show novel portions of the present invention, Speakerphone 102 includes speaker 106 that plays speech from the far-talker. Far-talker speech is received from the far end on mode 104, and is played through speaker 106, Speech data on node 104 is also delayed by delay circuit 108 prior to being operated upon by adaptive filter 112.
In some embodiments, speech data on node 104 is digital data. In these embodiments, a digital-to-analog (D/A) converter converts digital data to an analog waveform for piny by speaker 106. Also in these embodiments, delay circuit JOB can be a digital delay element such as a register file or shift register, In other embodiments, data on node 104 is analog data. In these embodiments, delay circuit 108 can be an analog delay element, such as a, lumped or distributed analog delay tine. Delay circuit 108 can be a fixed delay or a variable delay. In general, when the echo path is larger, the delay in delay circuit 108 is larger. For example, when speakerphone 102 is used in a car, the echo path is short and the delay of delay circuit

108 is short. Also for example, when speakerpbone 102 is used in a large conference room, the echo path is longer, and the delay of delay circuit 108 is correspondingly longer,'
In operation, speaker 106 plays the far-end speech into an acoustic environment. An echo signal y8 from the acoustic enclosure is received at microphone I2K, The effect of the acoustic enclosure on the echo is represented by block 130. which contains transfer function H1. That is to say, speech played from speaker 106 is opcraied upon by transfer function H1, and is then received at microphone 128. In addition to echo signal y0, microphone 128 also receives near-end speech signal and near-end noise signal vu. Near-end speech signal represents speech from uear-tatker, 136 as modeled by transfer function H?! shown in block 132. and noise signal Vurepresents noise 138 as modeled by transfer function
shown in block 134, Microphone 128, therefore, receives the sum of and
This is shown at node 126 as Zu.
and , at least to some degree, represent undesirable components of For example, ya is the echo caused by the coupling of speaker 106 and microphone 128. When a large component of yo, does not get canceled, and becomes part of the signal sent back to the far-end, the far-talker can hear his/her voice after undergoing a delay, and this can cause confusion during a phone call. Also for example, if Vuis unduly large, iheti the far-talker hears a large amount of noise, and this can interrupt a phone conversation. In contrast to the above, the near-end speech signal is, in general, a desirable component of When the near-end speech is faithfully transmitted to the far end. then effective communications can take place,
Adaptive filter 112 approximates the transfer function of the echo as , and generates an estimate yo of the echo signal from the delayed far-end speech signal xu. Summer 124 subtracts y from to create on node 122, Adoptive filter 112 can be any filter suitable for use in an adaptive echo cancellation system. Examples include, but are not limited to: S. Gay, S Tavathia, "The fast affine projection algorithm". Proc. IEEE ICASSP. Detroit, USA, 1995, pp. 3023-3026; and G,

Glenitis, K. Berberidis. S. Theodoridis, "A unified view; Efficient least squares adaptive algorithms for FIR transversal filtering," IEEE Signal Processing Magazine, vol. 16. pp. 13-41. July 1999.
and are input to residual echo estimator 116 to generate an estimate rnof the residual echo. The residual echo estimate and ire input to noise reduction (NR) circuit 120 to reduce the residual echo in er Ideally, " would match exactly and en would contain only the near-end speech signal s0 and perhaps some background noise vl(. However, this is practically impossible and e0 always contains. ... some amouni of residual echo When the far-talker
is silent (xn - 0). the input to the noise reduction algorithm is simply speech plus background noise. When the far-talker is active, the "noise" (or undesirable components} includes the background noise vB along with echo yt. Noise reduction circuit 120 uses a noise spectral estimator to track the near-stationary background noise. The quasi-stationary residual echo is estimated separately via the residual echo estimator (REE) NR is then applied using a composite "noise" estimate.
In some embodiments, noise reduction circuit 120 leaves a small amount of near-stationary background noise to mask residual echo. This can be accomplished using algorithms discussed in1 Turbin (see background, above),
Any of a, large class of speech spectral estimators could be used to modify the noise power spectral estimate to include residual echo, Examples of suitable noise reduction algorithms and circuits can be found in; S.F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans, Acoust Speech Signal Process. ASSP-2? (April 1979) 113-120; and Y. Ephraim, D. Maiafa, "Speech enhancement using a. minimum mean-square error short-time spectral amplitude estimator," IEEE Trans. Acoust. Speech Signal Process. 32(6) (December 1984) 1109-1121.

Residual Echo Estimation
Prior work by Gustafsson (see background, above) assumes that the a priori residual echo statistics may be estimated via a rent-valued deterministic mapping function Fu applied to the adaptive filter output.

The mapping function is

where capital letters represent the discrete Fourier transform (DFT) components of the corresponding variables represented by small letters. For example, En(k) represents DFT components of eu.

Gustafsson omits details on estimation of E{|Za(k)j2}, E{|E0(k)|2}, and E{ | Yn(k) |2} from their work cited above, but they are typically estimated using a 'leaky average" of sample values. Experimental results obtained using Euclidean distances between the true and estimated residual echo power spectrum to measure ihe success of the estimator showed that instantaneous values give very good results, Also in Gusmfsson, F,, was divided into subbands and each Fu(k) was replaced by the average value in its suhband. although the number and width of the subhmids were not discussed. Where E{| Yn (k)|2} was too small, FB(k) was not included in the average. Experimental results indicate that low estimation error is achieved when the subband size is otie OFT bin.
Equations (1) and (2), above, make two assumptions: that Fu is real-valued, and thai sn. v0. and y,, are statistically independent. Experimental results using equations (])and (2) for residua! echo estimation yielded signai-to-noise ratios of the

estimated residual echo power spectrum of, at best, about 2.0 dB, The derivation of equation (2) is now discussed, with the two assumptions in mind. The two assumptions are removed, and the results are then presented,
The SNR-Iike measurement was

The residual echo spectral components

are relaied to the echo Yn(k) by defining a real-valued transfer function Fo such that

Assuming thai the mapping function Fu is real-valued, substituting (4) into (3) and yields

which leads to

Using the relationship of (4) in (6) yields

Assuming that su, vu. mid yu are statistically independent and zero mean, the following relationships hold true.

Subtracting (8b) from (8a) and substituting from (6) and (7) yields
r - i

Solving the quadratic equation for F,(k) yields two solutions, Fn(k) = t and the more meaningful solution:

Experimental results with su = 0 showed that the error in the deterministic mapping is due lo the two assumptions in the derivation; that F is real-valued; and thill su. vu, and yu are statistically independent. Experimental results showed that a more accurate residual echo estimate was achieved with anon-real-vaiued mapping function F that, had been derived without the assumption of statistical independence.
Removal of Assumption that F is Real-Valued
Letting and carrying this definition through (4) mid
(5), into (6) and (7). and finally into (9) leads to a family of solutions to the quadratic equation. When F, h(k) is forced to zero, the solution of {10} is realized. When Fru(k) is forced lo zero, the solution is

Experimental results' showed that this choice was not necessarily better than that already proposed Restructuring the problem, however, leads to an unambiguous mapping. Instead of (4). separate mapping functions can be used for the real and imaginary pails of Rs(k).

This leads to separate solutions for the real and imaginary parts. Equation (10) is replaced by two solutions of the form

This approach lends to a significant improvement in the experimental results. The residual echo estimate improved to 45 dB SNR (noise free case with no near talker). The nunierien.ll)' well-behaved formulation shown in (16), below, was also used to attain this resuli.
Removal of Assumption of Statistical Independence Removing the statistical independence assumption rums equation (10) inio the following:

When combined wilh (13). this becomes two equations of the form

The cross-term depends on two unknowns, the speech and noise spectral component. The uncertainty in the calculation of the mapping function has been isolated to the cross-ierm estimate. Several choices were investigated for approximation of the cross-term including + S .flfcYt Y (k). Distortion of the near-end was measured as the Itakura-Sailo distortion between ifl and s0, Far-end suppression was measured using the echo

return loss enhancement (ERLE). Including a cross-term was found to measurably improve residual echo estimation during double-talk, However, the near-end was not free of distortion and some residual echo was still audible. Results (discussed below) are shown in Table 1
Avoiding Numerical Sensitivity
Equation (7) is sensitive to values of Fu(k) near one. The alternative formula!imi above also suffers from this problem, In one embodiment, this situation is avoided using

and set RB(k) lo zero when is small { 11
substantially improved ihe experimental resulis. The residual echo estimate as calculated in {I) improved from approximately 2 dB to approximately 11.5 dB SNR (noise free case with no near talker) with this modification.

ISD - Near 1.219 0.124 0.456 0.156 0,220
ERLE - Far 53.6 dB 46.9 dB -22.7 dB -1.9 dB 43.0 dB
Table I.
Sample results for simulated near, far, and double talk scenarios with and without noise for various cross-terras.
Table 1 shows experimental results for simulations with and without near-end noise. Speech distortion during near-end single-talk was measured as the Itakura-Saito distortion ISD between Sn and s0. These results are shown in rows labeled "ISD - Near." The clean signal s0 was available since a room simulator was used to generate the microphone input zB. The Itakura-Saito distortion was also used to measure dcwhte-talk performance. These results are shown in rows labeled "ISD -DT" An average ERLE was used to measure performance during far-end single-talk. These results are shown in rows labeled "ERLE - Far," Instead of normalizing by Ihe microphone input, the ERLE was normalized by the near-end signal, st. In both noisy and noise-free cases, the residual echo was well suppressed (-23 dB ERLE) during far-end single-talk when the cross term is set to zero. The cross terms appear io be important when the near-end is active In these cases, better performance was obtained using a guess at the cross term. During double-talk, substituting for the cross term provided very good results
(10-14x improvement in ISD). During near-end single-talk, using in
piace of the cross term gave very good results when background noise was present (8x improvement in ISD), In both these cases, residual echo suppression was mild and distortion of the far-end was audible.
The results shown above in Table form are now shown in separate equations for each of the near-end single talk, far-end single talk, and double talk situations. For near-end single talk, the real mid imaginary parts of the residual echo are estimated by a system using the following two equations.

A residual echo estimator capable of"calculating the cross-terms a.s shown above is shown in. and described with reference to, ihe remaining figures.
Figure 2 shows a block diagram of a first portion of a residual echo estimator. Circuit 200 generates the real part of spectral components of f eu and zr Signals yn, en and zn are shown on nodes 114, 122: and 126, respectively, which are input to window functions 202. 204, and 206 respectively. The index "n" is the block index. Any suitable windowing function can be applied, including rectangular, Hamming, or Hann windows. In some embodiments, window functions 202,204, and 206 are omitted.
In some embodiments. AEC and NR are performed in the time domain. In these embodiments. Fast Fourier Transform (FFT) blocks 208, 210, and 212 are used to generate frequency domain representations of the signals J> , eb and zu for use in the residual echo estimator. In other embodiments, AEC and NR are performed in the frequency domain, and window blocks 202, 204, and 206, as well as FFT blocks 208. 210. and 212 are not present.
FFT block 208 generates on node 220, and provides the same to
squaring function 214 which generates on node 222. FFT block 210
generates and squaring function 216 generates on node 224. FFT
block 212 generates on node 232, and provides the same to squaring function
218 which generates on node 232. Summer 226 receives , and
on nodes 222. 224. and 230, respectively, and produces on node 228.
Circuit 200 generates the real part of spectral components of and
complete residual echo estimator includes another circuit corresponding to circuit 200 that generates the imaginary parts of the same specttal components.
Figure 3 shows a block diagram of a second portion of a residual echo estimator. Circuit 300 receives signals on nodes 232, 220, 228, and 222, which correspond to like-numbered nodes in Figure 2. Circuit 300 also receives a signal on

node 302, which corresponds to the spectral components of a previous output from a noise reduction circuil, such as noise reduction circuit 120 (Figure 1). Summer 304 and multipliers 303 and 305 produce which is used as
the cross-term when a double-talk event is present. Multiplier 308 produces
which is used when there is near-end speech only. Multipliers 306 and 312 implement ihe switching between cross-terms when different speech activity is preseui.
In some embodiments (soft, decision embodiments), the cross-terms produced on nodes 310 and 314 are produced in varying amplitudes, and are summed by summer 316. As a result, ihe numerator produced on node 318 includes components of both'iypes of cross-terms. In soft decision embodiments, multipliers 306 and 312 produce variable ouiput values that are a function of both the spectral components iind a and p. and the ouiputs are summed by summer 316.
In other embodiments, (hard decision embodiments), a and 0 take on values of "I" or "0." and multipliers 306 and 312 act as switches that either pass the input through, or do not pass the input through. For example, when a is a "1," multiplier 306 acts JUS a switch and passes to node 310. In this
case, multiplier 306 acts as a closed switch and multiplier 312 acts as an open switch. Also for example, when p is a "1 .' multiplier 312 passes to node 314.
In this case, multiplier 312 acts as a closed switch and multiplier 306 acts as an open switch. The generation of a and (3 is discussed with reference to Figure 4.
Divider 320 divides the numerator on node 318 by on node 222 to
produce on node 322. on node 322 represents the estimate of the
square of the real par! of ihe spectral component of the residual echo.
Summers 304 mid 316, and multipliers 306, 308, and 312 form a cross-term calculator thai calculates the various cross-terms shown above in Table 1, One skilled in the an will understand that other embodiments can be used for cross-term calculators without departing from the scope of the present invention. For example, all or pan of circuit 300 can be implemented in software. The software can be
15

executed on general purpose computer, a digital signal processor (DSP), or other processor. Also for example, all or part of circuit 300 can be implemented in special purpose hardware, such as an application specific integrated circuit (ASIC).

Figure 4 shows a block diagram of speech activity detectors in a speakerphone context. Double talk detector 402 receives the far-end audio on node 104. and also receives the near-end audio on node 126 captured by microphone 128. In hard decision embodiments, double-talk detector 402 sets a equal to one when double-talk is active and sets a equal to zero otherwise Speech activity detector 404 leceives the near-end audio on node 126 and determines if speech signals are present at the microphone. AND gate 406 receives the output of speech activity detector 404 and the logical inverse of a on node 408, Gate 406 generates variable ß on node 410, In hard decision embodiments, p is set to one if there is no double-talk but there is near-end speech activity. Otherwise ß is set to zero, In soft decision embodiments, a and p take on real values between zero and one, inclusively, For example, in one embodiment, a takes on a value between zero and one, and ß is generated by a multiplier rather than gale 406. In such an embodiment, the multiplier generates p as p-(l-a) *(speech activity detector output).
Ii is to be understood that the above description is intended to be illustrative, and not restrictive. Many' other embodiments will be apparent to those of skill in the art upon reading and understanding the above description. The scope of the invention should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled,

WE CLAIM:
1. An acoustic echo cancellation system comprising :
an adaptive filter to create an echo estimate and an error term that comprises a residual echo;
a residual echo estimator to independently calculate estimates of a real portion and an imaginary portion of a discrete Fourier transform (DFT) of the residual echo, wherein the residual echo estimator has:
a first input node to receive a composite signal comprising near-end speech, noise, and an echo of far-end speech;
a second input node to receive the echo estimate; and
a third input node to receive a difference between the echo estimate and the composite signal; and
wherein the residual echo estimator is adapted / configured to calculate the real and imaginary portions of the DFT of the residual echo using an average of the composite signal and a previous output sample as an estimate of the near-end speech and the noise when both the near-end speech and the far-end speech are active.
2. The acoustic echo cancellation system as claimed in claim 1, wherein the residual echo
estimator is adapted / configured to calculate the real and imaginary portions of the DFT of the
residual echo using the composite signal as an estimate of the near-end speech and the noise when
the near-end speech is active and far-end speech is quiet.
3. The acoustic echo cancellation system as claimed in claim 1, wherein there is provided a noise
reduction circuit responsive to the real and imaginary portions; of the DFT of the residual echo and
the difference between the echo estimate and the composite signal.
4. An acoustic echo cancellation system, comprising :
an adaptive filter to create an echo estimate and an error term that has a residual echo ; a residual echo estimator to independently calculate estimates of a real portion and an imaginary portion of a discrete Fourier transform (DFT) of the residual echo, wherein the residual echo estimator has:
-'18-

a first input node to receive a composite signal comprising near-end speech, noise, and an echo of far-end speech;
a second input node to receive the echo estimate; and
a third input node to receive a difference between the echo estimate and the composite signal;
and
wherein the residual echo estimator is adapted / configured to calculate the real and imaginary portions of the DFT of the residual echo using a value of substantially zero as an estimate of the near-end speech and the noise when the near-end speech is quiet and the far-end speech is active.
5. The acoustic echo cancellation system as claimed in claim 4, wherein the residual echo
estimator is adapted / configured to calculate the real and imaginary portions of the DFT of the
residual echo using the composite signal as an estimate of the Bear-end speech and the noise when
the near-end speech is active and far-end speech is quiet.
6. The acoustic echo.cancellation system as claimed in claim 4, wherein there is provided a noise
reduction circuit responsive to the real and imaginary portions of the DFT of the residual echo and the
difference between the echo estimate and the composite signal.
7. A speakerphone comprising :
an output port to be coupled to a speaker to project far-end speech into an acoustic environment;
an input port to be coupled to a microphone to receive from the acoustic environment a composite signal comprising near-end speech, noise, and an echo of the far-end speech;
an adaptive filter to produce an estimate of the echo of the far-end speech; and
a residual echo estimator to independently estimate real and imaginary parts of the DFT of the difference between the echo of the far-end speech and the estimate of the echo of the far-end speech, wherein the residual echo estimator has a cross-term calculator to calculate an estimate of a cross-term resulting from a lack of an assumption that the near-end speech, the noise, and the echo of the far-end speech are statistically independent and to estimate a sum of the near-end speech and the noise differently based on whether the far-end speech or the near-end speech have energy above a threshold.

8. The speakerphone as claimed in claim 7, wherein the cross-term calculator is adapted /
configured to estimate the sum as the composite signal when a near-end speech activity detector
indicates that the near-end speech has energy above the threshold and a double-talk detector
indicates the far-end speech has energy below a threshold.
9. A speakerphone comprising :
an output port to be coupled to a speaker to project far-end speech into an acoustic environment;
an input port to be coupled to a microphone to receive from the acoustic environment a composite signal comprising near-end speech, noise, and an echo of the far-end speech;
an adaptive filter to produce an estimate of the echo of the far-end speech;
a residual echo estimator to independently estimate real and imaginary parts of the DFT of the difference between the echo of the far-end speech and the estimate of the echo of the far-end speech, wherein the residual echo estimator has a cross-term calculator to calculate an estimate of a cross-term resulting from a lack of an assumption that the near-end speech, the noise, and the echo of the far-end speech are statistically independent;
a noise reduction circuit responsive to the real and imaginary parts of the residual echo to produce estimates of the near-end speech ; and
a circuit to estimate a sum of the near-end speech and the noise, the circuit being responsive to a double-talk indicator, and being adapted / configured to estimate the sum as an average of the composite signal and a previous estimate of the near-end speech.
10. The speakerphone as claimed in claim 7, which is implemented in a computer.
11. A method of estimating residual echo in an acoustic echo cancellation system, comprising separately estimating real and imaginary parts of the DFT of the residual echo, by the steps of:
estimating the real part of the DFT of the residual echo from a real part of a DFT of a near-end composite signal, a real part of a DFT of an adaptive filter output signal, a real part of a DFT of an error signal generated by a difference between the near-end composite signal and the adaptive filter output signal, and an approximation of a sum of a real part of a DFT of a near-end speech signal and a real part of a DFT of a near-end noise signal.

12. The method as claimed in claim 11, wherein the square of the real part of the DFT of the
residual echo is estimated according to an equation :

wherein:
is the real part of the DFT of the residual echo;
is the real part of the DFT of the error signal;
is the real part of the DFT of the near-end composite signal;
is the real part of the DFT of the adaptive filter output signal; and
represents the approximation of the sum of the real part of the DFT of the near-end speech signal and the real part of the DFT of the near-end noise signal.
13. The method as claimed in claim 12, which involves :
using the real part of the DFT of the near-end composite signal to approximate the sum of the real part of the DFT of the near-end speech signal and the real part of the DFT of the near-end noise signal when a near-talker is quiet.
14. The method as claimed in claim 13, wherein the square of the real part of the DFT of the
residual echo is estimated according to an equation:

wherein : and fm(k) are as defined in claim 12.
15. The method as claimed in claim 12, which involves :
using a value of substantially zero to approximate the sum of the real part of the DFT of the near-end speech signal and the real part of the DFT of the near-end noise signal only when a near-end talker is quiet.

16. The method as claimed in claim 15, wherein the square of the real part of the DFT of the
residual echo is estimated according to an equation :
wherein : and are as defined in claim 12.
17. The method as claimed in claim 11, which involves :
using a sum of the real part of the DFT of the near-end composite signal and a DFT of a previous output from the acoustic echo cancellation system to approximate the sum of the real part of the DFT of the near-end speech signal and the real part of the DFT of the near-end noise signal when a double-talk event is present.
18. The method as claimed in claim 17, wherein the square of the real part of the DFT of the
residual echo is estimated according to an equation:

wherein : and are as defined in claim 12, and
represents the real part of the DFT of the previous output from the acoustic echo cancellation system.
19. The method as claimed in claim 11, wherein the step of separately estimating involves :
estimating the imaginary part of the DFT of the residual echo using an equation that consists a
cross-term resulting from a lack of an assumption that signal components in a near-end composite signal received at a microphone are statistically independent.
20. A method of estimating residual echo in an acoustic echo cancellation system, comprising
separately estimating real and imaginary parts of the DFT of the residual echo, wherein the step of
separately estimating involves estimating the imaginary part of the DFT of the residual echo using
an equation that consists a cross-term resulting from a lack of an assumption that signal components
in a near-end composite signal received at a microphone are statistically independent, wherein the
-21-

cross-term comprises a product of an imaginary part of a DFT of an echo estimate and a sum of an imaginary part of a DFT of a near-end speech signal and an imaginary part of a DFT of a near-end noise signal, and wherein estimating the imaginary part of the DFT of the residual echo involves :
substituting a different product for the cross-term as a function of whether the near-end speech signal contains sufficient energy and whether the far-end speech signal contains sufficient energy.
21. A method of estimating residual echo in an acoustic echo cancellation system, comprising
separately estimating real and imaginary parts of the DFT of the residual echo, wherein the step of
separately estimating involves estimating the imaginary part of the DFT of the residual echo using
an equation that consists a cross-term resulting from a lack of an assumption that signal components
in a near-end composite signal received at a microphone are statistically independent, wherein the
square of the imaginary part of the DFT of the residual echo is estimated according to an equation

wherein:
is the imaginary part of the DFT of the residual echo;
is the imaginary part of the DFT of the error signal;
is the imaginary part of the DFT of the near-end composite signal;
is the imaginary part of the DFT of the adaptive filter output signal; and
represents the approximation of the sum of the imaginary part of the DFT of the near-end speech signal and the imaginary part of the DFT of the near-end noise signal.
22. The method as claimed in claim 21, which involves :
using the imaginary part of the DFT of the near-end composite signal to approximate the sum of the imaginary part of the DFT of the near-end speech signal and the imaginary part of the DFT of the near-end noise signal when a near-talker is quiet.
23. The method as claimed in claim 22, wherein the square of the imaginary part of the DFT of
-22-

the residual echo is estimated according to an equation :

wherein ; are as defined in claim 21.
24. The method as claimed in claim 21, which involves :
using a value of substantially zero to approximate the sum of the imaginary part of the DFT of the near-end speech signal and the imaginary part of the DFT of the near-end noise signal only when a near-end talker is quiet,
25. The method of claim 24, wherein the square of the imaginary part of the DFT of the residual
echo is estimated according to an equation:

wherein : are as defined in claim 21.
26. A method of estimating residual echo in an acoustic echo cancellation system, comprising :
separately estimating real and imaginary parts of the DFT of the residual echo, wherein the
step of separately estimating involves estimating the imaginary part of the DFT of the residual echo using an equation that consists a cross-term resulting from a lack of an assumption that signal components in a near-end composite signal received at a microphone are statistically independent, and
using a sum of the imaginary part of the DFT of the near-end composite signal and a DFT of a previous output from the acoustic echo cancellation system to approximate the sum of the imaginary part of the DFT of the near-end speech signal and the imaginary part of the DFT of the near-end noise signal when a double-talk event is present.
27. The method as claimed in claim 26, wherein the square of the imaginary part of the DFT of
the residual echo is estimated according to an equation:
-23-

wherein: are as defined in claim 21, and
Si(n-1) (k)represents the imaginary part of the DFT of the previous output from the acoustic echo cancellation system,.
28. An acoustic echo cancellation system, substantially as herein described, particularly with
reference to the accompanying drawings.
29. A speakerphone, substantially as herein described, particularly with reference to the
accompanying drawings.
30. A method of estimating residual echo in an acoustic echo cancellation system, substantially as
herein described, particularly with reference to the accompanying drawings.
-24-
An acoustic echo cancellation system comprising : an adaptive filter to create an echo estimate and an error term that includes a residual echo ; and a residual echo estimator to independently calculate estimates of a real portion and an imaginary portion of a discrete Fourier transform (DFT) of the residual echo.

Documents:

« Previous Patent

Next Patent »

Patent Number

207734

Indian Patent Application Number

00304/KOLNP/2003

PG Journal Number

25/2007

Publication Date

22-Jun-2007

Grant Date

21-Jun-2007

Date of Filing

12-Mar-2003

Name of Patentee

INTEL CORPORATION

Applicant Address

2200 MISSION COLLEGE BOULEVARD , SANTA CLARA, CA-95052,

Inventors:

#	Inventor's Name	Inventor's Address
1	DEISHER MICHAEL E	2665 N.E.ANNA AVENUE HILLSBORO OR 97124

PCT International Classification Number

H 04 M 9/08

PCT International Application Number

PCT/US01/28825

PCT International Filing date

2001-09-14

PCT Conventions:

#	PCT Application Number	Date of Convention	Priority Country
1	09/663, 748	2000-09-15	U.S.A.