Title of Invention

PACKET BASED ECHO CANCELLATION AND SUPPRESSION

Abstract PACKET BASED ECHO CANCELLATION AND SUPPRESSION ABSTRACT OF THE DISCLOSURE In a method for echo suppression or cancellation, a reference voice packet is selected from a plurality of reference voice packets based on at least one encoded voice parameter associated with each of the plurality of reference voice packets and the targeted voice packet. Echo in the targeted packet is suppressed or cancelled based on the selected reference voice packet.
Full Text

PACKET BASED ECHO CANCELLATION AND SUPPRESSION
BACKGROUND OF THE INVENTION
In conventional communication systems, an encoder generates a stream of information bits representing voice or data traffic. This stream of bits is subdivided and grouped, concatenated with various control bits, and packed into a suitable format for transmission. Voice and data traffic may be transmitted in various formats according to the appropriate communication mechanism, such as, for example, frames, packets, subpackets, etc. For the sake of clarity, the term "transmission frame" will be used herein to describe the transmission format in which traffic is actually transmitted. The term "packet" will be used herein to describe the output of a speech coder. Speech coders are also referred to as voice coders, or "vocoders," and the terms will be used interchangeably herein.
A vocoder extracts parameters relating to a model of voice information (such as human speech) generation and uses the extracted parameters to compress the voice information for transmission. Vocoders typically comprise an encoder and a decoder. A vocoder segments incoming voice information (e.g., an analog voice signal) into blocks, analyzes the incoming speech block to extract certain relevant parameters, and quantizes the parameters into binary or bit representation. The bit representation is packed into a packet, the packets are formatted into transmission frames and the

* transmission frames are transmitted over a communication channel to a receiver with a decoder. At the receiver, the packets are extracted from the transmission frames, and tJie decoder unquantizes the bit representations carried in the packets to produce a set of coding parameters. The decoder then re-synthesizes the voice segments, and subsequently, the original voice information using the unquantized parameters.
Different types of vocoders are deployed in various existing wireless and wireline communication systems, often using various compression techniques. Moreover, transmission frame formats and processing defined by one particular standard may be rather significantly different from those of other standards. For example, CDMA standards support the use of variable-rate vocoder frames In a spread spectrum environment while GSM standards support the use of fixed-rate vocoder frames and multi-rate vocoder frames. Similarly, Universal Mobile Telecommunications Systems (UMTS) standards also support fixed-rate and multi-rate vocoders, but not variable-rate vocoders. For compatibility and interoperability between these communication systems, it may be desirable to enable the support of I variable-rate vocoder frames within GSM and UMTS systems, and the support of non-variable rate vocoder frames within CDMA systems. One common occurrence throughout all communications systems is the occurrence of echo. Acoustic echo and electrical echo are example types of echo.

Acoustic echo is produced by poor voice coupling between an earpiece and a microphone in handsets and/or hands-free devices. Electrical echo results from A-io-2 wire coupling within PSTN networks. Voice -compressing vocoders pi-ocess voice including echo within the handsets and in wireless networks, which results in returned echo signals with highly variable properties. The echoed signals degrade voice call quality.
In one example of acoustic echo, sound from a loudspeaker is heard by a listener at a near end, as intended. However, this same sound at the near end is also picked up by the microphone, both directly and indirectly, after being reflected. The result of this reflection is the creation of echo, which, unless eliminated, is transmitted back to the far end and heard by the talker at the far end as echo.
FIG. 1 illustrates a voice over packet network diagram including a conventional echo canceller/suppressor used to cancel echoed signals.
If the conventional echo canceller/suppressor 100 is used in a packet switched network, the conventional echo canceller must completely decode the vocoder packets associated with voice signals transinitted in both directions to obtain echo cancellation parameters because all conventional echo cancellation operations work with linear uncoinpressed speech. That is, the conventional echo canceller/suppressor 100 must extract packet from the transmission frames, unquantize the bit representations carried in the packets to

produce a set of coding parameters, and re-SAmthesize the voice segments before canceling eclio. The conventional echo cancelleiVsuppressor then cancels echo vising the re-synthesized voice segments.
Because transmitted voice informadon is encoded into parameters (e.g., in the parametric domain) before transmission and conventional echo suppressors/cancellers operate in the linear speech domain, conventional echo cancellation/suppression in a packet switched network becomes relatively difficult, complex, may add encoding and/or decoding delay and/or degrade voice quality because of, for example, the additional tandeming coding involved.
SUMMARY OF THE INVENTION
Example embodiments are directed to methods and apparatuses for packet-based echo suppression/cancellation. One example embodiment provides a method for suppressing/cancelling echo. In this example embodiment, a reference voice packet is selected from a plurality of reference voice packets based on at least one encoded voice parameter associated with each of the plurality of reference voice packets and a targeted voice packet. Echo in the targeted voice packet is suppressed/cancelled based on the selected reference voice packet.

BRIEF DESCRIPTION OF THE DRAWINGS
The present invention will become nioi"e fully understood from the detailed description given herein below and the accompanying drawings, wherein like elements are represented by like reference numerals, which are given by way of illustration only and thus are not limiting of the present invention and wherein:
FIG. 1 is a diagram of a voice over packet network Including a conventional echo canceller/suppressor;
FIG. 2 illustrates an echo canceller/suppressor, according to an example embodiment; and
FIG. 3 illustrates a method for echo cancellation/suppression, according to an example embodiment.
DETAILED DESCRIPTION OF THE EXAMPLE EMBODIMENTS
Methods and apparatuses, according to example embodiments, may perform echo cancellation and/or echo suppression depending on, for example, the particular application within a packet switched communication system. Example embodiments will be described herein as echo cancellation/suppression, an echo canceller/suppressor, etc.
Hereinafter, for example purposes, vocoder packets suspected of canying echoed voice information (e.g., voice information received at

the near end and echoed back to the far end) will be refeiTed to as targeted packets, and coding parameters associated with these targeted packets will be referred to as targeted packet parameters. Vocoder or parameter packets associated with originally transmitted voice information (e.g., potentially echoed voice information) from the far end used to determine whether targeted packets include echoed voice information will be referred to as reference packets. The coding parameters associated with the reference packets will be referred to as reference packet parameters.
As discussed above, FIG. 1 illustrates a voice over packet network diagram including a conventional echo canceller/suppressor. Methods according to example embodiments may be implemented at existing echo cancellers/suppressors, such as the echo canceller/suppressor 100 shown in FIG. 1. For example, example embodiments may be implemented on existing Digital Signal Processors (DSPs), Field Programmable Gate Arrays (FPGAs), etc. In addition, example embodiments may be used in conjunction with any type of terrestrial or wireless packet switched network, such as, a VoIP network, a VoATM network, TrFO networks, etc.
One example vocoder used to encode voice information is a Code Excited Linear Prediction (CELP) based vocoder. CELP-based vocoders encode digital voice information into a set of coding parameters. These parameters include, for example, adaptive codebook and fixed codebook gains, pitch/adaptive codebook, linear spectrum pairs (IvSPs) and fixed codebooks. Each of these parameters may be

]"epresented by a number of bils. For example, for a full-rate packet of Enhanced Variable Rate CODEC (EVRC) vocoder, w/hich is a well-known vocoder, the LSP is i-epresented by 28 bits, the pitch and its corresponding delta are represented by 12 bits, the adaptive codebook gain is represented by 9 bits and the fixed codebook gain is represented by 15 bits. The fixed codebook is represented by 120 bits.
Referring still to FIG. 1, if echoed speech signals are present during encoding of voice infoniiation by the CELP vocoder at the near end, at least a portion of the transmitted vocoder packets may include echoed voice information. The echoed voice information may be the same as or similar to originally transmitted voice information, and thus, vocoder packets cariying the transmitted voice information from the near end to the far end may be similar, substantially similar to or the same as vocoder packets cariying originally encoded voice information from the far end to the near end. That is, for example, the bits in the original vocoder packet may be similar, substantially similar, or the same as the bits in the corresponding vocoder packet cariying the echoed voice information.
Packet domain echo cancellers/suppressors and/or methods for the same, according to example embodiments, utilize this similarity in cancelling/suppressing echo in transmitted signals by adaptively adjusting coding parameters associated with transmitted packets.
For example purposes, example embodiments will be described with regard to a CELP-based vocoder such as an EVRC vocoder. However, methods and/or apparatuses, according to example

embodiments, may be used and/or adapted to be used in conjunction with any suitable vocoder.
FIG. 2 illustrates an echo canceller/suppi'cssor, ticcording to an example eiiibodiment. As shown, the echo canceller/suppressor of FIG. 2 may buffer received original vocoder packets (reference packets) from the far end in a reference packet buffer memoiy 202. The echo canceller/suppi"essor may buffer targeted packets from the near end in a targeted packet buffer memoiy 204. The echo canceller/suppressor of FIG. 2 inay further include an echo cancellation/suppression module 206 and a memoiy 208.
The echo cancellation/suppression module 206 may cancel/suppress echo from a signal (e.g., transmitted and/or received) signal based on at least one encoded voice parameter associated with at least one reference packet stored in the reference packet buffer memory 202 and at least one targeted packet stored in the targeted packet buffer 204. The echo cancellation/suppression module 206, and methods performed therein, will be discussed in more detail below.
The memory 208 may store Intermediate values and/or voice packets such as voice packet similarity metrics, coiTesponding reference voice packets, targeted voice packets, etc. In at least on example embodiment, the memoiy 208 may store individual similarity metrics and/or overall similarity metrics. The memoiy 208 will be described in more detail below.

Retiirning to FIG. 2, the length of the buffer memory 204 may be determined based on a trajectoiy match length for a trajectory scarcliing/matching operation, which will be described in more detail below. For example, if each vocoder packet caiTies a 20 ms voice segment and the trajectoiy match length is 120 ms, the buffer memoiy 204 may hold 6 targeted packets.
The length of the buffer memoiy 202 may be determined based on the length of the echo tail, network delay and the trajectory match length. For example, if each vocoder packet carries a 20 ms voice segment, the echo tail length is equal to 180 ms and the trajectory match length is 120 ms (e.g., 6 packets), the buffer memoiy 202 may hold 15 reference packets. The maximum number of packets that may be stored in buffer 202 for reference packets may be represented by m.
Although FIG. 2 ijlustrates two buffers 202 and 204, these buffers may be combined into a single memory.
In at least one example, the echo tail length may be determined and/or defined by known network parameters of echo path or obtained using an actual searching process. Methods for determining echo tail length are well-known in the art. After having determined the echo tail length, methods according to at least some example embodiments may be performed within a time window equal to the echo tail length. The time window width may be equivalent to, for example, one or several transmission frames in length, or one or several packets in length. For example purposes, example

embodiments will be described assuming that the echo tail length is ec[uivalent to the lengtli of a speech signal transmitted in a single transmission frame.
Example embodiments may be applicable to any echo tail length by matching reference packets stored in buffer 202 with targeted jaackets canying echoed voice information. Wliether a targeted packet contains echoed voice information may be determined by compai'ing a targeted packet with each of m reference packets stored in the buffer 202.
FIG. 3 is a flow chart illustrating a method for echo cancellation/suppression, according to an example embodiment. The method shown in FIG. 3 may be performed by the echo cancellation/suppression module 206 shovim in FIG. 2.
Referring to FIG. 3, at S302, a counter value j may be initialized to 1. At S304, a reference packet Rj may be retrieved from the buffer 202. At S306, the echo cancellation/suppression module 206 may compare the counter value j to a threshold value m. As discussed above, m may be equal to the number of reference packets stored in the buffer 202. In this example, because the number of reference packets ni stored in the buffer 202 is equal to the number of reference packets transmitted in a single transmission frame, the threshold value m may be equal to the number of packets transmitted in a single transmission frame. In this case, the value m may be extracted from the transmission frame header included in the transmission frame as is well-known in the art.

At S306, if the counter value j is less than or equal to threshold value m, the echo cancellation/suppression module 206 extracts the encoded parameters fi'om refeience packet Rj at S308. Concurrently, at S308, the echo cancellation/suppression module 206 extracts encoded coding parameters from the targeted packet T. Methods for extracting these parameters are well-known in the art. Thus, a detailed discussion has been omitted for the sake of brevity. As discussed above, example embodiments are described herein with regard to a CELP-based vocoder. For a CELP-based encoder, the reference packet parameters and the targeted packet parameters may include fixed codebook gains Gr, adaptive codebook gains Ga, pitch P and an LSP.
Still referring to FIG. 3, at S309, the echo cancellation/suppression module 206 may perform double talk detection based on a portion of the encoded coding parameters extracted from the targeted packet T and the reference packet Rj to determine whether double talk is present in the reference packet Rj. During voice segments including double talk, echo cancellation/suppression need not be performed because echoed far end voice information is buried in the near end voice information, and thus, is imperceptible at the far end.
Double talk detection may be used to determine whether a reference packet Rj includes double talk. In an example embodiment, double talk may be detected by comparing encoded parameters 2xtracted from the targeted packet T and encoded parameters

extracted from the reference packet R|. Tn the above-discussed CELP vocoder exainple, the encoded parameters may be fixed codebook gains Gj and adaptive codeljook gains Ga-
Tlie eclio cancellation/suppression module 206 may determine whether double talk is present according to the conditions shown in Equation (1):
Dr-l, //■ G^.,-G,r DT = 0, otherwise
According to Equation (1), if the difference between the fixed codebook gain Gjn for the reference packet Rj and the fixed codebook gain GjT for the targeted packet T is less than a fixed codebook gain threshold value Af, double talk is present in the reference packet Rj and the double talk detection flag DT may be set to 1 (e.g., DT = 1). Similarly, if the difference between the adaptive codebook gain GOR for the reference packet Rj and the adaptive codebook gain Gar for the targeted packet T is less than an adaptive codebook gain threshold value Aa, double talk is present in the reference packet Rj and the double talk detection flag DT may be set to 1 (e.g., DT =1). Otherwise, double talk is not present in the reference packet Rj and the double talk detection flag may not be set (e.g., DT = 0).
Referring back to FIG. 3, if the double talk detection flag DT is not set (e.g., DT = 0) at S310, a similarity evaluation between the encoded parameters extracted from the targeted packet T and the encoded parameters extracted from the reference packet R, may be











Tlie bandwidth similarity Sm for each of i formants may be calculated according to Equation (8):

As shown in Equation (8) and as discussed above, Bn is the bandwidth of i-th formant for targeted packet T, and BRI is the bandwidth of i-th formant for reference packet Rj.
Similarly, the center frequency similarity Sn for each of i foniiants may be calculated according to equation (9):
(9)
As shown in Equation (9) and as discussed above, Fn is the center frequency for the i-th formant for the targeted packet T and FRI is the center frequency of the i-th formant for the reference packet Rj.
After obtaining the plurality of individual similarity metrics, the overall similarity matching metric Sj may be calculated according to 1 Equation (10):
(10)




memoiy, such as, a l3ulTer memoiy. Tlie connter value j is incremented j = j+1 at S320, and the method returns to S304.
Returning to S314 of FIG. 3, if any of the paiameter similarity Hags are not set, the echo cancellation/suppression module 206 determines that the reference packet Rj is not similar to the targeted packet T, and thus, the targeted packet T is not canying echoed voice information corresponding to the original voice information earned by reference packet Rj. hi this case, the counter value j may be incremented (j = j+1), and the method proceeds as discussed above.
Returning to S310 of FIG. 3, if double talk is detected in the reference packet Rj, the reference packet Rj may be discarded at S311, the counter value j may be incremented j = j+1 at S320 and the echo cancellation/suppression module 206 retrieves the next reference packet R| from buffer 202, at S304. After retrieving the next reference packet R| fi'om the buffer 202, the process may procejed to S306 and repeat.
Returning to S306, if the counter value j is greater than threshold m, a vector trajectoiy matching operation may be performed at S321. Trajectoiy matching may be used to locate a correlation between a fixed codebook gain for the targeted packet and each fixed codebook gain for the stored reference packets. Trajectoiy matching may also be used to locate a correlation between the adaptive codebook gain for the targeted packet and the adaptive codebook gain for each reference packet vector. According to at least one example embodiment, vector trajectoiy matching may be performed using a

Least Mean Square (LMvS) and/or cross-correlation algorithm to determine a correlation between the targeted packet and each similar reference packet. Because LMS and cross-correlat ion algoiithms are well-known in the art, a detailed discussion thereof has been omitted for the sake of brevity.
In at least one example embodiment, the vector ti^ajectoiy matching may be used to verify the similarity between the targeted packet and each of the stored similar reference packets. In at least one example embodiment, the trajectoi-y vector matching at S321 may be used to filter out similar reference packets failing a correlation threshold. Overall similarity metrics Sj associated with stored similar reference packets failing the correlation threshold may be removed from the memoiy 208. The correlation threshold may be determined based on experimental data as is well-lcnown in the art.
Although the method of FIG. 3 illustrates a vector trajectory matching step at S321, this step may be omitted as desired by one of ordinary skill in the art.
At S322, the remaining stored overall similarity metrics Sj in the memoiy 208 may be searched to determine which of the similar reference packets Includes echoed voice information. In other words, the similar reference packets may be searched to determine which reference packet matches the targeted packet. In example embodiments, the reference packet matching the targeted packet may be the reference packet with the minimum associated overall similarity metric Sj.

If the similarity metrics Sj ai^e indexed in the memory (methods for doing which are well-kno\^m, and omitted for the sake of brevity) by (arii^eLed packet T and reference packet Rj, the overall similarity metrics may be expressed as S(T,R|), for j =1,2, 3...m.
Representing the overall similarity metrics as S(T,R|), for j =1,2, 3...m, the minimum overall similarity metric Smin may be obtained using Equation (13):

Returning again to FIG. 3, after locating the matching reference packet, the echo cancellation/suppression module 206 may cancel/suppress echo based on a portion of the encoded parameters extracted from the matching reference packet at S324. For example, echo may be cancelled/suppressed by adjusting (e.g., attenuating) gains associated with the targeted packet T. The gain adjustment may be performed based on gains associated with the matched reference packet, a gain weighting constant and the overall similarity metric associated with the matching reference packet.
For example, echo may be cancelled/suppressed by attenuating adaptive codebook gains as shown in Equation (14):

and/or fixed codebook gains as shown in Equation (15):


As shown in Equation (14), Gju' is an adjusted gain ibr a fixed codebook associated with a reference packet, and Wj is tlie gain weighting for the lixed codebook.
As shown in Equation (15), GOR' is the adjusted gain for the adaptive codebook associated with the reference packet and Wa is the gain weighting for the adaptive codebook. Initially, both Wj and Wa may be equal to 1. However, these values may be adaptively adjusted according to, for example, speech characteristics (e.g., voiced or unvoiced) and/or the proportion of echo in targeted packets relative to reference packets.
According to example embodiments, adaptive codebook gains and fixed codebook gains of targeted packets are attenuated. For example, based on the similarity of a reiference and targeted packet, gains of adaptive and fixed codebooks in targeted packets may be adjusted.
According to example embodiments, echo may be canceled/suppressed using extracted parameters in the parametric domain without decoding and re-encoding the targeted voice signal.
Although only a single iteration of the method shown in FIG. 3 is discvissed above, the method of FIG. 3 may be performed for each reference packet Rj stored in the buffer 202 and each targeted packet T stored in the buffer 204. That is, for example, the plurality of

relerence packets stored in (he buffer 202 may be searc-liecl to find a reference packet matching each of the targeted pac^kets in the bnffer 204.
Tlie invt:ntion being thus described, it will be obvious that the same may be varied in many ways. Snch variations are not to be regarded as a departure from the invention, and all such modifications are intended to be included within the scope of the invention.

WE CLAIM:
1. A method for suppressing echo, the method comprising:
selecting, from a plurality of reference voice ]:)ackets, a reference
voice packet based on at least one encoded voice parameter associated with each of the plurality of reference voice packets and a targeted voice packet; and
suppressing echo in the targeted voice packet based on the selected reference voice packet.
2. The method of claim 1, wherein the echo is suppressed by adjusting the at least one encoded voice parameter associated with the targeted voice packet based on the at least one encoded voice parameter associated with the selected reference voice packet.
3. The method of claim 2, wherein the echo is suppressed by adjusting a plurality of encoded voice parameters associated with the targeted voice packet based on a corresponding plurality of encoded voice parameters associated with the selected reference voice packet.
4. The method of claim 1, wherein the echo is suppressed by adjusting a gain of the at least one encoded voice parameter associated with the targeted voice packet based on a coiTCspondlng at least one encoded voice parameters associated with the selected reference voice packet.

5. The method of claim 1, wherein tlie selecting step comprises:
extracting at least one encoded voice parameter from the
targeted packet and each of the phiralily of reference voice packets;
calculating, for each of a number of reference voice packets within the plurality of reference voice packets, at least one voice packet similarity metric based on the encoded voice parameter extracted from the reference voice packet and the targeted voice packet; and
selecting the reference voice packet based on the calculated voice packet similarity metric.
6. The method of claim 5, further comprising:
determining which of the plurality of reference voice packets are similar to the targeted voice packet based on the encoded voice parameter associated with each reference voice packet and the targeted voice packet to generate the number of reference voice packets for which to calculate the at least one voice packet similarity metric.
7. The method of claim 1, wherein the selecting step comprises:
determining which of the plurality of reference voice packets are
similar to the targeted voice packet based on the at least one encoded voice parameter associated with each of the plurality of reference voice packets and the targeted voice packet to generate a set of reference voice packets; and

selecting Ihe reference voice pacl^et from ihe set of reference voice jDackets.
8. The method of claim 7, whei-ein Lhe determining step comprises:
for each reference voice packet,
setting at least one similarity indicator based on the at least one encoded voice parameter associated with the targeted voice packet and the at least one encoded voice parameter associated with the reference voice packet; and
determining whether the reference voice packet is similar to the targeted voice packet based on the similarity indicator.
9. The method of claim 1, wherein the selecting step comprises:
extracting a plurality of encoded voice parameters from the
targeted voice packet and each of the reference voice packets;
for each encoded voice parameter associated with each reference voice packet,
determining an individual similarity metric based on the
encoded voice parameter for the reference voice packet and the
targeted voice packet;
for each reference voice packet,
determining an overall similarity metric based on the
individual similarity metrics associated with the reference voice
packet; and

selecting the I'eference voice packet based on the overall similarity metric associated with each reierence voice packet.
10. The method oi" claim 9, wherein the selecting step further comprises:
comparing the overall similarity metrics to determine the ininimum overall similarity metric; and
selecting the reference voice packet associated with the minimum overall similarity metric.



Documents:

http://ipindiaonline.gov.in/patentsearch/GrantedSearch/viewdoc.aspx?id=dXXhq/F5cGg3kK+RKwyAUg==&loc=egcICQiyoj82NGgGrC5ChA==


Patent Number 280079
Indian Patent Application Number 1389/CHENP/2009
PG Journal Number 06/2017
Publication Date 10-Feb-2017
Grant Date 09-Feb-2017
Date of Filing 11-Mar-2009
Name of Patentee LUCENT TECHNOLOGIES INC.
Applicant Address 600-700 MOUNTAIN AVENUE, MURRAY HILL, NEW JERSEY 07974-0636
Inventors:
# Inventor's Name Inventor's Address
1 CAO, BINSHI 273 MARCIA WAY, BRIDGEWATER, NEW JERSEY 08807
2 KIM, DOH-SUK 42 HUNTINGTON ROAD, BASKING RIDGE, NEW JERSEY 07920
3 TARRAF, AHMED, A., 60 ISABELLA AVENUE, BAYONNE, NEW JERSEY 07002
4 YOUTKUS, DONALD, JOSEPH 82 JAMESTOWN ROAD, BASKING RIDGE, NEW JERSEY 07920
PCT International Classification Number G10L21/02
PCT International Application Number PCT/US07/20162
PCT International Filing date 2007-09-18
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 11/523,051 2006-09-19 U.S.A.