Title of Invention

AUDIO PROCESSING USING AUDITORY SCENE ANALYSIS AND SPECTRAL SKEWNESS

Abstract A method for controlling the loudness of auditory events in an audio signal. In an embodiment, the method includes weighting the auditory events (an auditory event having a spectrum and a loudness), using skewness in the spectra and controlling loudness of the auditory events, using the weights. Various embodiments of the invention are as follows: The weighting being proportionate to the measure of skewness in the spectra; the measure of skewness is a measure of smoothed skewness; the weighting is insensitive to amplitude of the audio signal; the weighting is insensitive to power; the weighting is insensitive to loudness; and any relationship between signal measure and absolute reproduction level is not known at the time of weighting; the weighting includes weighting auditory-event-boundary importance, using skewness in the spectra.
Full Text Description
Audio Processing using Auditory Scene Analysis
and Spectral Skewness
Technical Field
The invention relates to audio processing, in general, and to auditory scene
analysis and spectral skewness, in particular.
References and Incorporation by Reference
The following documents are hereby incorporated by reference in their entirety:
Crockett and Seefeldt, International Application under the Patent Cooperation
Treaty, S.N. PCT/US2007/008313, entitled, "Controlling Dynamic Gain Parameters of
Audio using Auditory Scene Analysis and Specific-Loudness-Based Detection of
Auditory Events," naming Brett Graham Crockett and Alan Jeffrey Seefeldt as inventors,
filed March 30, 2007, with Attorney Docket DOL186 PCT, and published on November
8, 2007 as WO 2007/127023;
Seefeldt et al., International Application under the Patent Cooperation Treaty,
S.N. PCT/US 2004/016964, entitled, "Method, Apparatus and Computer Program for
Calculating and Adjusting the Perceived Loudness of an Audio Signal," naming Alan
Jeffrey Seefeldt et al. as inventors, filed May 27, 2004, with Attorney Docket No.
DOL119 PCT, and published on December 23, 2004 as WO 2004/111994 A2;
Seefeldt, International Application under the Patent Cooperation Treaty, S.N.
PCT/US2005/038579, entitled "Calculating and Adjusting the Perceived Loudness and/or
the Perceived Spectral Balance of an Audio Signal," naming Alan Jeffrey Seefeldt as the
inventor, filed October 25, 2005, with Attorney Docket No. DOL15202 PCT, and
published on May 4, 2006 as WO 2006/047600;
Crockett, U. S. Patent Application S.N. 10/474,387, entitled, "High Quality Time-
Scaling and Pitch-Scaling of Audio Signals," naming Brett Graham Crockett as the
inventor, filed October 10, 2003, with Attorney Docket No. DOL07503, and published on
June 24, 2004 as US 2004/0122662 A1;
Crockett et al., U.S. Patent Application S.N. 10/478,398, entitled, "Method for
Time Aligning Audio Signals Using Characterizations Based on Auditory Events,"
naming Brett G. Crockett et a!, as inventors, filed November 20, 2003, with Attorney
Docket No. DOL0920I, and published July 29, 2004 as US 2004/0148159 Al;
Crockett, U.S. Patent Application S.N. 10/478,538, entitled, 'Segmenting Audio
Signals Into Auditory Events," naming Brett G. Crockett as the inventor, filed November
20, 2003, with Attorney Docket No. DOL098, and published August 26, 2004 as US
2004/0165730 Al;
Crockett et al ., U.S. Patent Application S.N. 10/478,397, entitled, "Comparing
Audio Using Characterizations Based on Auditory Events," naming Brett G. Crockett et
al. as inventors, filed November 20, 2003, with Attorney Docket No. DOL092, and
published September 2, 2004 as US 2004/0172240 Al;
Smithers, International Application under the Patent Cooperation Treaty S.N.
PCT/US 05/24630, entitled, "Method for Combining Audio Signals Using Auditory
Scene Analysis," naming Michael John Smithers as the inventor, filed July 13, 2005, with
Attorney Docket No. DOL148 PCT, and published March 9, 2006 as WO 2006/026161;
Crockett, B. and Smithers, M., "A Method for Characterizing and Identifying
Audio Based on Auditory Scene Analysis," Audio Engineering Society Convention Paper
6416, 118th Convention, Barcelona, May 28-31, 2005;
Crockett, B., "High Quality Multichannel Time Scaling and Pitch-Shifting using
Auditory Scene Analysis," Audio Engineering Society Convention Paper 5948, New
York, October 2003; and
Seefeldt et al., "A New Objective Measure of Perceived Loudness," Audio
Engineering Society Convention Paper 6236, San Francisco, October 28, 2004.
Background Art
Auditory Events and Auditory Event Detection
The division of sounds into units or segments perceived as separate and distinct is
sometimes referred to as "auditor}' event analysis" or "auditory scene analysis" ("ASA").
The segments are sometimes referred to as "auditory events" or "audio events." Albert S.
Bregman, "Auditory Scene Analysis—The Perceptual Organization of Sound"
(Massachusetts Institute of Technology, 1991, Fourth printing, 2001, Second MIT Press
paperback edition) extensively discusses auditory scene analysis. In addition,
Bhadkamkar et al., U.S. Pat. No. 6,002,776 (Dec. 14, 1999) cites publications dating back
to 1976 as "prior art work related to sound separation by auditory scene analysis."
However, Bhadkanikaretal. discourages the practical use of auditory scene analysis,
concluding that "[t]echniques involving auditory scene analysis, although interesting from
a scientific point of view as models of human auditory processing, are currently far too
computationally demanding and specialized to be considered practical techniques for
sound separation until fundamental progress is made."
Crockett and Crocket et a I. in the various patent applications and papers listed
above identify auditory events. Those documents teach dividing an audio signal into
auditory events (each tending to be perceived as separate and distinct) by detecting
changes in spectral composition (amplitude as a function of frequency) with respect to
time. This may be done, for example, by calculating the spectral content of successive
time blocks of the audio signal, comparing the spectral content between successive time
blocks and identifying an auditory event boundary as the boundary between blocks where
the difference in the spectral content exceeds a threshold. Alternatively, changes in
amplitude with respect to time may be calculated instead of or in addition to changes in
spectral composition with respect to time.
The auditory event boundary markers are often arranged into a temporal control
signal whereby the range, typically zero to one, indicates the strength of the event
boundary. Furthermore this control signal is often filtered such that event boundary
strength remains, and time intervals between the events boundaries are calculated as
decaying values of the preceding event boundary. This filtered auditory event strength is
then used by other audio processing methods including automatic gain control and
dynamic range control.
Dynamics Processing of Audio
The techniques of automatic gain control (AGO and dynamic range control
(DRC) are well known and common in many audio signal paths. In an abstract sense,
both techniques measure the level of an audio signal and then gain-modify the signal by
an amount that is a function of the measured level. In a linear, 1:1 dynamics processing
system, the input audio is not processed and the output audio signal ideally matches the
input audio signal. Additionally, imagine an audio dynamics processing system that
automatically measures the input signal and controls the output signal with that
measurement. If the input signal rises in level by 6 dB and the processed output signal
rises in level by only 3 dB, then the output signal has been compressed by a ratio of 2:1
with respect to the input signal.
In Crockett and Seefeldt, auditory scene analysis improves the performance of
AGC and DRC methods by minimizing the change in gain between auditory event
boundaries, and confining much of the gain change to the neighborhood of an event
boundary. It does this by modifying the dynamics-processing release behavior. In this
way, auditory events sound consistent and natural.
Notes played on a piano are an example. With conventional AGC or DRC
methods, the gain applied to the audio signal increases during the tail of each note,
causing each note to swell unnaturally. With auditory scene analysis, the AGC or DRC
gain is held constant within each note and changes only near the onset of each note where
an auditory event boundary is detected. The resulting gain-adjusted audio signal sounds
natural as the tail of each note dies away.
Typical implementations of auditory scene analysis (as in the references above)
are deliberately level invariant. That is, they detect auditory event boundaries regardless
of absolute signal level. While level invariance is useful in many applications, some
auditory scene analyses benefit from some level dependence.
One such case is the method described in Crockett and Seefeldt. There, ASA
control of AGC and DRC prevents large gain changes between auditory event boundaries.
However, longer-term gain changes can still be undesirable on some types of audio
signals. When an audio signal goes from a louder to a quieter section, the AGC or DRC
gain, constrained to change only near event boundaries, may allow the level of the
processing audio signal to rise undesirably and unnaturally during the quiet section. This
situation occurs frequently in films where sporadic dialog alternates with quiet
background sounds. Because the quiet background audio signal also contains auditory
events, the AGC or DRC gain is changed near those event boundaries, and the overall
audio signal level rises.
Simply weighting the importance of auditory events by a measure of the audio
signal level, power or loudness is undesirable. In many situations the relationship
between the signal measure and absolute reproduction level is not known. Ideally, a
measure discriminating or detecting perceptually quieter audio signals independent of the
absolute level of the audio signal would be useful.
Here, "perceptually quieter" refers not to quieter on an objective loudness measure
(as in Seefeldt et al. and Seefeldt) but rather quieter based on the expected loudness of the
content. For example, human experience indicates that a whisper is a quiet sound. If a
dynamics processing system measures this to be quiet and consequently increases the
AGC gain to achieve some nominal oulput loudness or level, the resulting gain-adjusted
whisper would be louder than experience says it should be.
Disclosure of the Invention
Herein are taught methods and apparatus for controlling the loudness of auditory
events in an audio signal. In an embodiment, the method includes weighting the auditory
events (an auditory event having a spectrum and a loudness), using skewness in the
spectra and controlling loudness of the auditory events, using the weights. Various
embodiments of the invention are as follows: The weighting being proportionate to the
measure of skewness in the spectra; the measure of skewness is a measure of smoothed
skewness; the weighting is insensitive to amplitude of the audio signal; the weighting is
insensitive to power; the weighting is insensitive to loudness; any relationship between
signal measure and absolute reproduction level is not known at the time of weighting; the
weighting includes weighting auditory-event-boundary importance, using skewness in the
spectra; and reducing swelling of AGC or DRC processing level during perceptibly
quieter segments of the audio signal as compared to methods not performing the claimed
weighting.
In other embodiments, the invention is a computer-readable memory containing a
computer program for performing any one of the above methods.
In still other embodiments, the invention is a computer system including a CPU,
one of the above-mentioned memories and a bus communicatively coupling the CPU and
the memory.
In still another embodiment, the invention is an audio-signal processor including a
spectral-skewness calculator for calculating the spectral skewness in an audio signal, an
auditory-events identifier for identifying and weighting auditory events in the audio
signal, using the calculated spectral skewness, a parameters modifier for modifying
parameters for controlling the loudness of auditory events in the audio signal and a
controller for controlling the loudness of auditory events in the audio signal.
In still another embodiment, the invention is a method for controlling the loudness
of auditory events in an audio signal, including calculating measures of skewness of
spectra of successive auditory events of an audio signal, generating weights for the
auditory events based on the measures of skewness, deriving a control signal from the
weights and controlling the loudness of the auditory events using the control signal.
The various features of the present invention and its preferred embodiments may
be better understood by referring to the following discussion and the accompanying
drawings in which like reference numerals refer to like elements.
Description of the Drawings
FIG. I illustrates a device for performing two Crockett and Seefeldt methods of
analyzing auditory scenes and controlling dynamics-gain parameters.
FIG. 2 illustrates an audio processor for identifying auditory events and
calculating skewness for modify the auditory events, themselves for modifying the
dynamics-processing parameters, according to an embodiment of the invention.
FIG. 3 is a series of graphs illustrating the use of auditory events to control the
release time in a digital implementation of a Dynamic Range Controller (DRC),
according to one embodiment of the invention.
FIG. 4 is an idealized characteristic response of a linear filter suitable as a
transmission filter according to an embodiment of the invention.
FIG. 5 shows a set of idealized auditory-filter characteristic responses that
approximate critical banding on the ERB scale.
Best Mode for Carrying Out the Invention
FIG. I illustrates a device I for analyzing auditory scenes and controlling
dynamics-gain parameters according to Crockett and Seefeldt. The device includes an
auditory-events identifier 10, an optional auditory-events-characteristics identifier 11 and
a dynamics-parameters modifier 12. The auditory events identifier 10 receives audio as
input and produces an input for the dynamics-parameters modifier 12 (and an input for
the auditory-events-characteristics identifier 11, if present). The dynamics-parameters
modifier 12 receives output of the auditory-events identifier 10 (and auditory-events-
characteristics identifier 11, if present) and produces an output.
The auditory-events identifier 10 analyzes the spectrum and from the results
identifies the location of perceptible audio events that are to control the dynamics-gain
parameters. Alternatively, the auditory-events identifier 10 transforms the audio into a
perceptual-loudness domain (that may provide more psychoacoustically relevant
information than the first method) and in the perceptual-loudness domain identifies the
location of auditory events that are to control the dynamics-gain parameters. (In this
alternative, the audio processing is aware of absolute acoustic-reproduction levels.)
The dynamics-parameters modifier 12 modifies the dynamics parameters based on
the output of the auditory-events identifier 10 (and auditory-events-characteristics
identifier 11, if present).
In both alternatives, a digital audio signal .v[«] is segmented into blocks, and for
each block t, D[t] represents the spectral difference between the current block and the
previous block.
For the first alternative, D[i] is the sum, across all spectral coefficients, of the
magnitude of the difference between normalized log spectral coefficients (in dB) for the
current block / and the previous block / -1 . In this alternative D[i] is proportional to
absolute differences in spectra (itself in dB). For the second alternative, D[t] is the sum,
across all specific-loudness coefficients, of the magnitude of the difference between
normalized specific-loudness coefficients for the current block / and the previous block
/ -1. In this alternative, D[t] is proportional to absolute differences in specific loudness
(in sone).
In both alternatives, if D[t] exceeds a threshold Dmin , then an event is considered
to have occurred. The event may have a strength, between zero and one, based on the
ratio of D[t] minus Dmm to the difference between Dnux and Dmm . The strength A[t]
may be computed as:

The maximum and minimum limits are different for each alternative, due to their
different units. The result, however, from both is an event strength in the range 0 to I.
Other alternatives may calculate an event strength, but the alternative expressed in
equation (1) has proved itself in a number of areas, including controlling dynamics
processing. Assigning a strength (proportional to the amount of spectral change
associated with that event) to the auditory event allows greater control over the dynamics
processing, compared to a binary event decision. Larger gain changes are acceptable
during stronger events, and the signal in equation (I) allows such variable control.
The signal A[t] is an impulsive signal with an impulse occurring at the location of
an event boundary. For the purposes of controlling the release time, one may further
smooth the signal A[t] so that it decays smoothly to zero after the detection of an event
boundary. The smoothed event control signal A[t] may be computed from A[t]
according to:

Here aevent controls the decay time of the event control signal.
FIG. 3 is a sequence of graphs illustrating the operation and effect of the
invention, according to one embodiment, "b)" in FIG. 3 depicts the event control signal
A[t] for the corresponding audio signal of "a)" in FIG. 3, with the half-decay time of the
smoother set to 250 ms. The audio signal contains three bursts of dialog, interspersed with
quiet background campfire crackling sounds. The event control signal shows many
auditory events in both the dialog and the background sounds.
In FIG. 3, "c)" shows the DRC gain signal where the event control signal A[t] is
used to vary the release time constant for the DRC gain smoothing. As Crocket and
Seefeldt describes, when the control signal is equal to one, the release smoothing
coefficient is unaffected, and the smoothed gain changes according to the value of the
time constant. When the control signal is equal to zero, the smoothed gain is prevented
from changing. When the control signal is between zero and one, the smoothed gain is
allowed to change — but at a reduced rate in proportion to the control signal.
In "c" of FIG. 3, the DRC gain rises during the quiet background sounds due to
the number of events detected in the background. The resulting DRC-modified audio
signal in "'d)" of FIG. 3 has audible and undesirable swelling of the background noise
between the bursts of dialog.
To reduce the gain change during quiet background sounds, an embodiment of the
invention modifies or weights the auditory strength A[i] using a measure of the
asymmetry of the audio signal spectrum. An embodiment of the invention calculates the
spectral skewness of the excitation of the audio signal.
Skewness is a statistical measure of the asymmetry of a probability distribution. A
distribution symmetrical about the mean has zero skew. A distribution with its bulk or
mass concentrated above the mean and with a long tail tending lower than the mean has a
negative skew. A distribution concentrated below the mean and with a long tail tending
higher than the mean has a positive skew. The magnitude or power spectrum of a typical
audio signal has positive skew. That is, the bulk of the energy in the spectrum is
concentrated lower in the spectrum, and the spectrum lias a long tail toward the upper part
of the spectrum.
FIG. 2 illustrates an audio processor 2 according to an embodiment of the
invention. The audio processor 2 includes the dynamics-parameters modifier 12 and the
optional auditory-events-characteristics identifier 11 of FIG. 1, as well as an auditory-
events identifier 20 and a skewness calculator 21. The skewness calculator 21 and
auditory-events identifier 20 both receive the audio signal 13, and the skewness calculator
21 produces input for the auditory-events identifier 20. The auditory-events identifier 20,
auditory-events-characteristics identifier 11 and dynamics-parameters modifier 12 are
otherwise connected as are their counterparts in FIG. 1.
In FIG. 2, the skewness calculator 21 calculates the skewness from a spectral
representation of the audio signal 13, and the auditory-events identifier 20 calculates the
auditory scene analysis from the same spectral representation. The audio signal 13 may
be grouped into 50 percent overlapping blocks of Msamples, and the Discrete Fourier-
Transform may be computed as follows:

where M=2*N samples and x[n,t] denotes a block of samples.
The block size for the transform is assumed to be the same as that for calculating
the auditory event signal. This need not be the case, however. Where different block rates
exist, signals on one block rate may be interpolated or rate converted onto the same
timescale as signals on the other block rate.
The excitation signal E[b.t] approximating the distribution of energy along the
basilar membrane of the inner ear at critical band b during time block t is computed;

where T[k] represents the frequency response of a filter simulating the transmission of
audio through the outer and middle ear and Ch[k] represents the frequency response of
the basilar membrane at a location corresponding to critical band b.
FIG. 4 depicts the frequency response of a suitable transmission filter T[k]. FIG.
5 depicts a suitable set of critical band filter responses, corresponding to Cb[k]. in which
40 bands are spaced uniformly along the Moore and Glasberg Equivalent Rectangular
Bandwidth (ERB) scale, for a sample rate of 48 kHz and transform size of A/= 2048. A
rounded exponential function describes each filter shape, and 1 ERB separates the bands.
If the auditory event boundaries are computed from the specific loudness
spectrum, per Crocket and Seefeldt, then the excitation signal E[b,t] already exists as
part of the specific-loudness calculation.
Finally the spectral skewness is computed from the excitation signal E[b,t] as:

where f.t is the arithmetic mean of the excitation:

and s is the variance of the excitation signal:

The skewness signal SK[t] of equation (5) fluctuates considerably and requires
smoothing for it to avoid artifacts when modifying the event control signal and
subsequent dynamics processing parameters. One embodiment uses a single pole
smoother with a decay constant aSK having a half-decay time of approximately 6.5 ms:

Limiting the skevvness to maximum and minimum SKmax and SKmin, respectively,
may be useful. A constrained skewness 5A""[t] may be computed as:

Low values (values close to 0.0) of the skewness signal 5A.'"[/] typically
correspond to characteristically quieter signals, while high skewness values (values close
to 1.0) typically correspond to characteristically louder signals. In FIG. 3, the "e)" graph
shows the skewness signal that corresponds to the audio signal in "a)" of FIG. 3. The
skevvness is high for the louder dialog bursts and low for the background sounds.
The skevvness signal i>7C"[/] passes to the auditory-events identifier 20 of FIG. 2
that weights the spectral difference measure D[i) as:

The skewness-modified auditory strength signal ASK[t] is computed in the same
way as A[t] in equation (1):
c

The skewness-modified auditory strength signal ASK[t] is smoothed in the same
way as A[(] in equation (2):

In FIG. 3, "f)" depicts the skewness-modified event control signal ASk[t] for the
corresponding audio signal in "a)" of FIG. 3. Fewer auditory events appear during the
background sounds while events corresponding to the louder dialog remain.
In FIG. 3, "g)" shows the skewness-modified event-controlled DRC signal. With
fewer auditory events in the background sounds, the DRC gain stays relatively constant
and moves only for the louder dialog sections, "h)" in FIG. 3 shows the resulting DRC-
modified audio signal.
The DRC-modified audio signal has none of the undesirable swelling in level
during the background sounds.
The skewness signal SK"[t] goes low sometimes for perceplually louder signals.
For these loud signals, the value of spectral difference measure D[t] is large enough that
even after weighting by the skewness signal SK"[i] in equation 8, the weighted spectral
difference measure DSK[t] is typically still large enough to indicate an auditory event
boundary. The event control signal A^ll] is not adversely affected.
Claims
1. A method for controlling the loudness of auditory events in an audio signal, the
method comprising:
weighting the auditory events (an auditory event having a spectrum and a
loudness), using skewness in the spectra; and
controlling loudness of the auditory events, using the weights.
2. The method of claim 1 wherein the weighting comprises
weighting the auditory events, the weighting proportionate to the measure
of skewness in the spectra.
3. The method of claim 2 wherein
the measure of skewness is a measure of smoothed skewness.
4. The method of claim 1 wherein the weighting is insensitive to amplitude of the
audio signal.
5. The method of claim 1 wherein the weighting is insensitive to power.
6. The method of claim 1 wherein the weighting is insensitive to loudness.
7. The method of claim 1 wherein any relationship between signal measure and
absolute reproduction level is not known at the time of weighting.
8. The method of claim 1 wherein the weighting comprises
weighting auditory-event-boundary importance, using skewness in the
spectra.
9. The method of claim I further comprising
reducing swelling of AGC or DRC processing level during perceptibly
quieter segments of the audio signal as compared to methods not performing the
claimed weighting.
10. A computer-readable memory containing a computer program for performing
any one of the methods of claims 1 - 9 .
11. A computer system comprising:
a CPU;
the memory of claim 10 ; and
a bus communicatively coupling the CPU and the memory.
12. A audio-signal processor comprising:
a spectral-skewness calculator for calculating the spectral skewness in an
audio signal;
an auditory-events identifier for identifying and weighting auditory events
in the audio signal, using the calculated spectral skewness;
a parameters modifier for modifying parameters for controlling the
loudness of auditory events in the audio signal; and
a controller for controlling the loudness of auditory events in the audio
signal.
13. A method for controlling the loudness of auditory events in an audio signal,
comprising:
calculating measures of skewness of spectra of successive auditory events
of an audio signal;
generating weights for the auditory events based on the measures of
skewness;
deriving a control signal from the weights; and
controlling the loudness of the auditory events using the control signal.

A method for controlling the loudness of auditory events in an audio signal. In an embodiment, the method includes
weighting the auditory events (an auditory event having a spectrum and a loudness), using skewness in the spectra and controlling
loudness of the auditory events, using the weights. Various embodiments of the invention are as follows: The weighting being
proportionate to the measure of skewness in the spectra; the measure of skewness is a measure of smoothed skewness; the weighting
is insensitive to amplitude of the audio signal; the weighting is insensitive to power; the weighting is insensitive to loudness; and any
relationship between signal measure and absolute reproduction level is not known at the time of weighting; the weighting includes
weighting auditory-event-boundary importance, using skewness in the spectra.

Documents:

http://ipindiaonline.gov.in/patentsearch/GrantedSearch/viewdoc.aspx?id=RoL1+A4Qja8pmsceXJ/BsQ==&loc=wDBSZCsAt7zoiVrqcFJsRw==


Patent Number 272040
Indian Patent Application Number 4521/KOLNP/2009
PG Journal Number 12/2016
Publication Date 18-Mar-2016
Grant Date 15-Mar-2016
Date of Filing 29-Dec-2009
Name of Patentee DOLBY LABORATORIES LICENSING CORPORATION
Applicant Address 100 POTRERO AVENUE, SAN FRANCISCO, CA 94103-4813 UNITED STATES OF AMERICA
Inventors:
# Inventor's Name Inventor's Address
1 SMITHERS, MICHAEL, JOHN C/O DOLBY LABORATORIES LICENSING CORPORATION, 100 POTRERO AVENUE, SAN FRANCISCO, CA 94103-4813 UNITED STATES OF AMERICA
2 SEEFELDT, ALAN, JEFFREY C/O DOLBY LABORATORIES LICENSING CORPORATION, 100 POTRERO AVENUE, SAN FRANCISCO, CA 94103-4813 UNITED STATES OF AMERICA
PCT International Classification Number G10L21/02; H03G3/20; G10L21/00
PCT International Application Number PCT/US2008/008592
PCT International Filing date 2008-07-11
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 60/959,463 2007-07-13 U.S.A.