|Title of Invention||
AN APPARATUS AND METHOD FOR DETERMINING A POSITION IN A FILM (110)
|Abstract||The invention relates to a Apparatus for determining a position in a film (110) having advance perforations (116), images (112) and sound information (114) applied in a time sequence, the sound information being applied on an analog or digital sound track (114) on the film, comprising: a memory (320) for storing a reference audio fingerprint representation of the sound information (114), wherein the reference audio fingerprint representation is an audio fingerprint representation of the sound information, wherein the reference audio fingerprint representation is generated based on methods of the feature extraction and is formed so that a temporal development of the audio fingerprint representation depends on a temporal development of the sound information, and wherein a time scale is associated with a stored reference audio fingerprint representation; a means (340) for receiving a portion of the sound information from the analog or digital sound track of the film, wherein the portion of the sound information is read from the analog or digital sound track (114) on the film (110); a means (350) for extracting a test audio fingerprint representation from the read portion; a means (360) for comparing the test audio fingerprint representation with the reference audio fingerprint representation, to determine the position in the film (110) on the basis of the comparison and the time scale; and means (180) for determining a control signal for a film event system for a film event at the position of the film.|
Descr i.pt i on.
The present, invention relates Lo an apparatus and a method for determining a posit.ion in 'a f"i lm having film i n to r ma t i. on app Id ed In a time sequence, to synchronize, for e x a m p I c , film c v e n L s with image r o p r o d u c t i. o n .
Audio video data are stored on data carriers, i.e. film or tape, or transmission channels, i.e. radio or telephone, in a fd xed format, which does not allow an extension by novel. audio formats or other synchronous or image synchronous, respectively, supplementary services, such as subtitles. Thus, . with the introduction of, for example, new audio formats thus, new data carriers or film copies, respectively, have to be produced, which have the new audio tormafs .
Idg. 8 shows an exemplary film 110. film information, such as video information or images 112, respectively, which are also referred to as "frames" or "video frames", and audio information or a plurality of analog or digita.1 soundtracks ill, which have "audio frames" in the digital case is applied In spatial sequence or, during replay, in time sequence, respectively. Further, the film 110 has, for example, advance perforations 116, with the help of which the film is p.l ayed.
Basically, two methods are known for synchronizing s u p p i o me n t s .
The I irst method comprises storing a time code on the data carrier, such as with DTS (digital theatre system) for cinema sound or in an additional channel connected to the audio signal. Kxamples herefore arc ancillary data by DAB and rnp3. 'the time code is used to replay sound or
add i 1 lonai informal ion, rcspocL.ivo.ly, synchronously from an ex no ma! da La carrier, lor oxamp.i o CD wiLh i)TS . However, it. is a disadvant.aqo of t.his meLhod LhaL every add.i L i ona i fonnat. requires further space on the data carrier or transmission channel, respectively, which might not bo available. With film, these arc for example the tracks for ana lea sound, Dolby digital, DTS, SDDS (sorry dynamic digital sound) . However, proprietary formats avoid t.he ul.i 1 i cation of the time code of one extension by other extensions. Mutual Interferences of the extensions cannot always be avoi.ded, one example is the usage of ancillary data in MP 3 for additional information and bandwidth extension from different manufacturers.
The second method is based on the improper use of analog soundtracks for storing time code, such as if is used for example in a prototype cinema equipped with an JOSONO system. However, it. is a disadvantage of this method that the analog track exists in ail systems and is often used as fallback solution during interferences of the other systems, which means a misuse of the analog track prevents the fallback possibility. Automatic switching to the analog track, which is installed in most cinemas, leads to the fact, that the time code is replayed as analog signal when no signal is present on the "modern" tracks for Dolby digital or DTS, respectively. Thus, in the prototype cinema, during a pure wave-field synthesis reproduction, which . will be discussed below, the redundant analog reproduction has t.o be switched off manually, because otherwise t.he time code can be heard via the redundant further loudspeakers.
The acoustic wave-field synthesis, short WPS goes beyond t.he surround approaches of the formats Dolby, SDDS or DTS. In WPS, an attempt is made to reproduce the air vibrations of a real situation, which constitute sound, across a whole room. In contrary to conventional reproduction across two or more loudspeakers, where the mapping of t.he position of
the original sound source is limited t.o a I I no bcLwocn Lhe loudspeakers, the wavo-i icid synthesis is >. o transmit the whoic sound fioid iruc t.o she ordaina! LO the room. This moans that. the- virtual sound sources can be oxac; iy spatially local i zed, and oven scorn to exist within the room, and thus can be encircled. Systems with up to /100 loudspeakers .in cinema systems and up to 900 Loudspeakers in theater sound systems have ai. ready been real i.zed.
Wave- 1 i oil d synthesis is based on the Huyqens' principle, whieh says that every point on a wave front can be seen as start inq point Lor an eimentary spherical wave. toy interference of all. elementary waves, a now wave front occurs, whicdi is identical to the original wave.
Such a sound system has been developed by ilraunLioier Institute for Digital Media TecLmo l.oqy under the name IOSONO and is used in cinema of Ilmonau.
Thus, fiie cinema of Ilmenau is mentioned as a practical example, where the wave-field synthesis is operated in 'two modes.
In the first mode, the cinema is operated as "real" wave--field synthesis system, wherein the time code is stored on the analog track of the 35 mm f i im, such as has been discussed above with regard to the second "improper" method, whore the WL'S sound is played from an external medium, such as hard disk or DVD.
In the second mode, "compatible reproduction", the sound
stored in every 3b mm film is read out. and decoded by a
Dolby processor, aLternatively, DTS or SDDS, respectively,
could be used, wherein the Dolby processor, if necessary,
switches automatically to the analog track and maps the
occur'ring mul L i -channel signal via WF'S to virtual
Since different signal paths arc required for both modes, a d i v i s i on of the siqnai corning from the read hoao lor l.ho anaioq signal is required, which causes additional t.oohno Iog i oa1 o f fo r f.
Thus, in summary, i.f can be said that there is no room on current. spools of cinema film to attach a further synchronization track, such as lor ext.ernai sound systems or subtitio systems. A .1.1 cinema sound systems avail able "up to now, anaioq and digital, obtain their soundtrack either d ire., d. 1 y via one or a plurality of soundtracks on the spool. of 1 Mm or by a manufacturer-spec i .1 i c time code signal, on the spool of" f i 1m. 'this means that for both known approaches, as explained above, new copies of the films have to be produced, usually with significant costs. Yet, audio formats like Dolby digital and SODS a I low modern audio experiences, but. have still no time code for the synchronization of, for example, subtitles or foreign-language versions of the film sound recording.
Hence, Frank Jordan and Jesper Dannow, in Lholr publication "Generating Timecode Information from Anaioq Sources", 118th Convention, Audio Engineering Society, of May 28 to 31, 2005, in Barcelona, Spain, Convention Paper 6473, propose generating a time code on the basis of the analog sound crack. The publication describes a system with the designation "SoundfiLl.es", which is aLLached to the analog sound Lrack of the projector. Based on an edited, digital copy of" the sound track and the anaioq signal, of the film projector, time information or a time code- is determined by cross -cor re 1 at. 1 on . The system "Sound!, i t.l cs" consists, of throe components. The core module "Sync Tracker" generates the time code siqnai. The second module, the "Sync Player", qenerat.es subtitles projected with a beamer, for example. The third module, the "Clip Player" plays synchronized audio clips transmitted to the cinema visitor via wireless headphones.
A disadvantage of Lhe previously described prior art. is Lhat I.he synchroni zaL.ion and Lime determination wiLhin the film, as described in Lhe publication, is limited t.o a sotifsh window of 1 minute, for example, if is especial iy in I. he i nit. Lai phase of t.he film that. if is difficult, however, to define or determine the right. window for successful synchronization. If the portion read or sampled from trie film is not. within Lhe portion of Lhe stored film information used for the synch ron i/.at. i on, Lhe synchronization remains unsuccessful or a wrong synchronization Lakes place. In this case, t.he cinema vis; ! or wi 1 1 hear no sound or the wrong one to t.he film.
If is the object of t.he present invention to provide an efficient concept to determine a position in a fi 1m.
This object, is achieved by an apparatus for determining a position In a film according t.o claim 1, a method for determining a position in a film according to claim 20, and by a computer program according to claim 21.
The present, invention is based on the finding that each position of a film generally comprises film information specific for this position, so that in a feature extraction different positions of a film comprise different., specific manifestations of the features. In other words, different positions in a film comprise different. "fingerprints". These fingerprints may in turn be used t.o determine a pos i I. i on in a f i.lm.
According t.o the invention, there is provided an apparatus for determining a position in a film having film information applied in a time sequence, comprising: a memory for storing a reference fingerprint representation (KA1), 1'AI) V Ingerabdruckdarsfc 1 1 ung fingerprint representation) of the film information, wherein the fingerprint representation is formed so that, a temporal course of Lhe fingerprint representation depends on a
temporal course of Lhc film information, wherein a time scale Ls associated with a slnrcd reference fingerprint representation, a means for receiving a port-ion read from the film, a means for extracting a tost fingerpri.nf representation from the read portion, and a means for comparing the test .fingerprint representation with the reference fingerprint representation, to determine the position in the film on the basis of the comparison and the fi me scale.
The apparatus and the method for determining a position in a film enable determining any position In a film at any time, without having to prepare or change the film itself. The relevant time information, the time scale, is stored together with a stored version of the film. Here, the film is stored in form of a reference fingerprint representation, which corresponds to a feature extraction. Hence, the memory space required and also the computation power and/or the duration for determining the position can be reduced. Preferred embodiments further have the advantage of enabling unique determination of the position with suitable choice of the fingerprint representation.
The apparatus and method for determining a position in a film may, for example, be employed in an apparatus for generating a control signal for a film event system, which synchronizes film events with image reproduction. Examples for ftlm events are the audio sound, subtitles and special effects, wherein special effects may include e.g. airflow, shaking the cinema chairs, smells or light effects on side and back waJ ls. Here, with regard to the audio result, both different languages, such as simultaneous playing of the original version and translations info other languages as well as different audio techniques arc possible, such as the synchronization of digital surround systems, like the wavefiold synthesis. Here, the apparatus or the method for determining a position especially serve for synchronization of a starting phase of the film, but also effect higher
t.olerance, for example, Lo jumps during Lho film, so as 10 guarant.ee opt. i mum synch ron i za Lion and/or det.e rm i na L i on of a pos ; L i on i.n a film even under adverse condi Lions.
Kven when Lho above-described and following examples La 1 k of a cinemagoo.r or a f i 1m, Lhe invent, ion is not. l.imiLed Lo cinema f.idms for c i nemagoe.rs, but. aiso relat.es generally Lo films or audio-visual signais, respect, i ve 1 y, regardless whet, her these are film information stored on films or other dat.a carriers and memory media, such as magnetic bands or hard drj.ves. Additional 1 y, the invention can also be used for pure sound systems without video, or for example, it can be used for Lhe synchronization of pure video material, i . e. without, sound, via video-ID, with arbitrary events.
Preferred embodiments of the present, ."invention will be discussed below with reference to t.he accompanying drawings. They show:
Pig. 1 a basic block diagram of a preferred embodiment of an apparatus for generating a control signal for a film event system;
[•' i q. ia a basic block diagram of an embodiment oi' an apparatus for performing a correlation;
Pig. ?.b a basic block diagram of a preferred embodiment, of an apparatus for performing a correlation;
Kig. 2c.1 an exemplary section of a film;
Kiq. 2c. 2 an exemplary curve of a sound signal of the section of the film illustrated in Pig. 2.c.l with a variable first replay speed and a constant. Lest sample rate;
Pig. 2c.3 an exemplary curve of a sound signal of the section of: the Ei'lm illustrated in Fig. 2.C.1 in
a variable second rep.lay speed and a constant. Los t. samp I e rale ;
!•'i q . /c.4 an exemplary curve of a sound siqnal of idic section of" Lhe .film i "1 lust, rated in !•'i q . 2 . c . 1 wit.h a variable third replay speed and a constant. Lost, samp 1 e rate ;
P i q . 2d.l two exemplary sections of a f"i lm;
big. 2d.2 an exemplary curve of a reference sound siqnal. of the f." i .1 m;
biq. 2d. 3 an exemplary curve of a Lest, sound siqnal based on a first replay speed and a constant test sample rate for a section of" the fi. I in;
biq. 2d. 4 an exemplary first correlation result, from the correlation of" the reference sound siqnai according to big. 2d. 2 and- t.he test, sound signal, a c c o r d i. n g t o b i g . 2 d . 3 ;
big. 2d.5 two exemplary sections of a film according • to big. 2 d . 1 ;
big. 2d. 6 an exemplary curve of a reference sound signal of the film according to big. 2d. 2;
biq. 2d."I an exemplary curve of Lhe Lest, sound signal based on a second replay speed and a constant test sample rate for a section of Lhe film;
biq. 2d.8 an exemplary second correlation result, from Lhe correlation of the reference sound signal according to big. 2d. 6 and Lhe Lest, sound signal according to big. 2d.7;
I-'i g . 3a a basic block diagram of a pro for reel embodiment. of an apparatus for determining a part. in the i i I in ba scd on a f i ng e rpr i n L rep r e s en t. a t. i on ;
I-1 i g . 3b. 1 two sections of" a f'i l.m;
I' i g . 3b. 2 an exemplary curve of the reference sound signal for the two sections according to big. 3b.1;
big. 4 a basi.e block diagram of" a preferred embodiment of an apparatus tor determining a position in the f" i lm based on a coarse and a subsequent fine determination of the position;
Fig. ba a basic block diagram of a preferred embodiment of an apparatus for generating a control signal for a film event, system;
!■'i g. bb.l t.wo sections of a fi lm;
Fig. bb.2 an exemplary curve of a reference sound signal for a first section of the film;
big. bb.3 an exemplary curve of a test sound signal for a second section of the film;
Fig. bb.4 an exemplary correlation result. from the correlation of the reference sound signal according to Fig. 5b. 2 and the test, sound signal acco rding to Fi g. 5b.3;
Fig. 6a a basic block diagram of an exemplary film p r o j e c t i on system w i. t h a n app a r a t u s f o r generating a control signal for a film event system and a film event system;
Fig. 6b a basic block diagram of an exemplary film projection system with an apparatus for
acne rating a control signal wit.h an exemplary a u cl i o Li 1 m o v o n L s y s t; cm ;
I-'iq. ' a schema Li.c represenLa f i on of an exemplary associate ion of a t; i.mc scale Lo a piece of trim i n format, i on ;
!■'i q . 8 a schemafic representa L ion of an exemplary f i 1m w i fh app! i ed fi1m jnfo rma L i on.
In t.he :f o 1 lowing description of the .invention or .the preferred embodiments, respectively, the same reference numbers arc used for similar or equal elements.
In the fo.l .lowing, the invention w:i 11 be discussed in more detai I with regard to embodiments which use the sound signal applied to the film as film information. However, this is not to 1 i mi t the invention but. only serves for i 1 lustration purposes.
tig. 1 shows a basic block diagram of an apparatus for generating a control signal for a film event system and an exemplary film 110, as has been explained above with regard to tig. 8, wherein the apparatus for generating a control signal comprises a means for storing 120 the film information, a means for receiving a section read from the film 140, a means 160 for comparing the read section with
the stored film information 112, 114 and a means 180 for determining the control signal based on the comparison and the time scale.
The stored video information 112, 114, comprises, for example, the sound or audio signals, respectively, the images or video signals, respect i.ve.l y, or also .labels that can current..! y be found on films, and which determine, 'for example, where the aperture opens or from when on sound is played or when the film stops, respectively. The stored audio and/or video signals are, for example in digitized
torn, preferably in compressed lorm Lo reduce memory reqe i t omen t. s .
An ,-idvanlaqo of the digitized storage .is the simple ana particularly error-free reproducibility of the stored image of the f .i .1 m information.
In contrary to conventional systems, the M im remains
unchanged, as above-described, a stored image of the film
information is generated only once, e.g. when producing the
f i .1 m .
When replaying the .fiJm via a f.iJm replay device, such as a f i 1m projector, for example, the sound signal contained-on the soundtrack 114 is received by the means 140 for receiving and edited for the means 160 for comparing, sampled, for example, with a given sample rate and passed on as section of a given length or a given number of sample ra Los, respect!vely.
The means 160 is formed to compare this section read from the film with the stored film information, wherein the means 160 for comparing can be formed to compare the read section with the entire stored information, preferably, however, to compare the read section with a section of the stored film information to minimize the computing effort. The comparison can be made, for example, via
crosscorrelaLion but; also via calculating the difference, e.g. by calculating a compressed hash sum and searching the same in a database. The comparison can be based on the sound signal alone, the video signal alone, a comparison of the sound signal and the video signal as well as a combination with an evaluation of the above-mentioned features, based on the result of the comparison of the means for comparing 160 and the time scale, the means 180 for determining determines the control, signal 190. A film event system is controlled via the control signal 190, which generates, for example, WI:'S sound signals or
subt.it. I cs based on the control siqnal 190 time synchronously to the replayed film 110. Thereby, the apparatus for generating a control signal or specifically the means for do f e rm i n.i nq the control signal 180 can be formed such that the control. signal is any time code format, proprietary or standardized such as fine LTC L i.me code formal. (1,'I'C longitudinal time code) standardized according to SIMPTf, (Society of Motion Picture and To 1 ov i-s i on ling inocrs) .
linio- synchronous means that the f i lm event. system generates, based on the control signal. 190, a simultaneous event, corresponding to the time on the time scale ol a position of the film just replayed, to which a time on the time scale is associated in the stored film information.
Thereby, differing from the explained embodiment, instead of the fiLm projector, any film replay device can be used, any fdlm formats, such as silent. films (e.g. with synchronization based on video information), fi Litis with analog or digital soundtrack, one soundtrack or several para I lei soundtracks can be used, or as an alternative to a fi lm, any other memory media can be used, such as tapes or hard drives, whose format can not or must not. bo changed, for example to be compatible to the film replay device- in future, to which, however, other .film events are to be synchronized at the same time.
In a prefer red embodiment, the sound siqnal is used as film information for the synchronization. Thereby, the section read from the film is sampled with a qiven sample rate, which will be referred to below as test sample rate, to qenerafe a test sound siqnal, and the stored film information is stored in digital form, wherein the stored film information will be referred .to below as reference siqnal, and the test sound slgna.1 and the reference sound siqnal are compared in the means 160 for comparing via crosscorrelation.
iii one embodiment., the test, s i qua I sample rat.c and the¬re fc re-nee signal sample rate are invariable, i.e. con si. ant;. Trio means 160 for comparing can, for example, be formed ro qeneiafe a first cor.roll at. ion result, at. a first, t i me based on a first. Lost, sound signal and a first .reference sound signal, t.o dot.erm.ine a first; Lime of Lhc time scale, and t.o goner a LG a second correlation result, at, a second Lime based on a second Lest, sound signal, and a second reference sound signal t.o doLormi.no a second Lime of Lho Lime scale for d e t. erm i n i n g , f o r e x a mp 1. e , a t. .i. me d i f f e r e n c e or r o p 1 a y speed, respect; i vc 1 y, or for determining a speed difference in comparison wi.t.h a target, or reference replay specci. Based thereon, the means 180 for determining determines the control signal for synchronizing, for example, Lhc film event system.
However, if is a disadvantage of a constant, sample rate that. Lhe correlation result decreases with varying test replay speed, and thus the accuracy of determining the time or position in the film becomes more inaccurate and thus file synchronization decreases. This di.sadvant.age can be compensated by varying of the sample rates, which means the tost sample rate and/or the reference sample rate.
tig. ?. shows a basic block diagram of an apparatus for performing a correlation between a Lest, sound si anal that
can be played with a variable replay speed, and a reference sound signal, which is a digitally stored version of the Lest, sound signal, wherein the apparatus for performing a correlation comprises a means 210 for determining a measure for a test replay speed, a means 230 for varying a test sample rate or a reference sample rate and a means 250 for comparing. The means 230 is formed to vary a test sample rate, by which the test sound signal 270 is sampled, to gone rate the modified test signal 27 2. or to vary a reference sample rate to generate a modified reference sound signal based on the reference sound signal 274.
Fur...tor, Lho mean? 2 JO for vary] nq is fonneci 1. o vary the Lest, sample rate or: a roference samp I o rate such that a Jovial, ion hot. wo on a Lost replay speed associated to the test, sound signal or reference replay speed associated to Lho modi_.fi.od reference sound signa.l 276 is reduced, or that, a deviation between a test replay speed associated to the modified test sound s.igna.i 2 72 and a re.for once rep Lay speed associated to the role r once sound signal 2/'I, or that a deviation between a test replay speed associated to the modified test sound signaJ 272 and a reference rep.Lay speed associated to the modified roference sound signal 276 Is reduced, wLiorcin t.he term ropl.ay speed or t.iho problem of a variable ropl.ay speed, respectively, will be discussed be 1ow i n mo re do ta i1.
The moans 2b0 for comparing the modi, fled sound signal 272 and t.he reference sound signal 274, or the test sound signal 270 and the modified reference sound signal 276, or t.he modified test sound signal 272 and' t.he modified reference sound signal 276 is formed to determine a result. 2/8 of the correlation.
The embodi.ment. of the apparatus for performing a correlation shown in Fig. 2a can, for example, be used as a moans 160 for comparing in an apparatus -for generating a control, signal for a film event system, such as shown, -for oxamp l.e, in Fig. 1 .
Fig. 2b shows a basic block diagram of a preferred embodiment of an apparatus for performing a correlation between a test sound signal and a reference sound signal.
Fig. 2b shows a means 280 for storing a reference sound signal' 274, which is a digital version of the test sound signal 270, wherein the reference sound signal 274 has been generated once based on a given memory reference replay speed and memory reference sample rate.
The: tost, sound signal is replayed- w.i t.h a variabic l.est. replay speed and sampled with a Lest. samp!;: r-at.e l.o g e n«:■ r a t. e t. h e t. o s f s oun d s i gna .1 2 7 0.
Tlie moans 210 [or determining the measure f.or the test r o p ! : i y s p e e d o f L h e t. e s t sound s J g n a .1. 27 0 c o 11 L r o 1. s t h o moans 230 fo.f: varying based on the measure for the lost replay speed. The means 230 tor varying controls a reference or sample rate converter 2 32 and a variabie sampler 234, wherein the sample rate converter 232 is formed l.o convert, a reference sound signal based on the memory reference rop.lay speed and the memory reference sampie rate info a modified reference sound signal 276 corresponding t.o a reference sound signal based on a different memory reference sample speed and/or memory reference sample rate, and wherein the variable sampfer 234 is formed Lo sampie the test sound signal with a varied sampio rate differing from the standard or basic sample rate, to generate a modified test sound signal 272.
Differing from Fig. 2b, the apparatus for performing a correlation can also be formed such that the test sound signal 270 .is always supplied to the means 250 for comparing via the variable sampler 234, wherein the variable sampler 234 is then formed such that, one of the variable Lest sample rates corresponds to the standard or basic sample rate, and is further formed such that the
reference sound signal 274 is always supplied Lo the means 2b0 for comparing via the reference sampie rate converter 232, wherein the reference sampie rate converter 232 is formed such that it passes the reference sound signal 274 in an unmodified way t.o the means 250 for comparing with respective control by the means 230 for varying.
The representation of the separate supply of the test sound signal 270 compared to the modified test sound signai. 272 and the reference sound signal compared, to the modified r e I e r c n c e s o u n d signal 2 7 6 t o 1. h e me a n s 2 b 0 f o r c omp a r.i n g
select.od in f i q . 2b serves t.o i I lusl.rai.e l.he a I Lornat ivc embed; men Ls or real izat. ion possibj ! i t. i es .
Thus, tor example, in one embodiment, where l.he means 230 for cornpa r i nq is formed to compare the modi fd od Lest, sound siqna! 272 wJ Lh t.iie non-modi hi ed reference sound siqnai 274, no reference sample rate convert.or 234 is required, or tdie apparatus for performinq a correlation according to f'i q . 2b has no reference sample rate converter 232, respectively. In the same way, a means 2b0 for comparing, which) 'is formed to compare the unmodified test sound siqnai 270 t.o the modified reference sound siqna! 2 4 6, has no va r i a b1e s amp 1e r 2 3 4.
In a further embodiment, the means 280 for storing is a means for storing film information, wherein a time scale is associated to the stored film information, and the test sound siqnai 270 is, for example, a film sound signal. The apparatus for performing a correlation according to fig. 2b can then, for example, be used as means for comparing 160 according to Fig. 1.
tig. 2c.1 shows a section of an exemplary f i 1m 110 with a soundtrack 114 as described above in tig. 1. In fig. 2c.1, two positions of the film 11.0 are indicated, a first position, further referred to as position' Li, and a second position, further referred to as position L2. The 'two
positions Li and L2 define a section on the film 1.10 having a length of AL = Li - L2.
Fig. 2c.2 shows an exemplary curve of the test sound signal associated to the section between the position Li and L2 described in fig. 2c.1, wherein further the time, when the position Li of the film is played, is referred to as time '['1, and the time when the position I,2 of the film, is played .is referred to as time T2. The time period AT - It T2 depends on the length of the respective section and the replay speed v of the film. The following applies:
A'! Al. / v or
Ti Ti (T2 Li) / v, respcci ivoly.
When samp.linq the test sound signal with the sample rate f 1 /At, wherein At; is the sample period and At= n • AT, the tost sound siqnal can be i .1.1 us t ra Led as a sequence of n ' 1 samples, as indicated exomp 1 a r i ! y in (•' fig . 2c.?. with n
When replaying the film with a replay speed v and a sample rare; f 1 / At, the section of the film between in and L? or T. an 'IT, respectively, is divided, for example, in n Lime periods, or represented by n i 1 samples, respectively. The following applies:
n = ΔL / (Δt v) or
n : Δl. • f / v, respectively.
This means the number of sample periods or samples, respectively, for a given section of the film AL is proportional to the sample rate f or antiproportionai to the sample period At, respectively, and antiproportionai to the replay speed v. In other words, in a section of constant length AL, the quotient "f / v" or the product "At; v", respectively, has to be con;st.ant, when n or the number Of Samples nil is to be constant,.
In thai case, if the first sample is equal, the individual samples are also equal under the above-mentioned condition.
Correspondingly, when generating the stored film information or the reference sound siqnal, respectively, in a memory sample rate fmemory and a memory replay speed vmemory, the stored section of the film information or the test sound siqnal, respectively, is represented, for example, by nmemorv i 1 reference samples and stored.
For i bl usl.rat.i ng Lhc Facts, Figs. 2c::. 2 Lo 2c. 4 show o x o m p I a r y s a mp ! o s o r s 1. o r a g c s o f t h e f i 1 m s e c: L i o n be t: wo c i I t-hc position F; and I,-: for a consLa-nl. sample rate F or a oonslant sample po r i oci At, respectively, and a variable sample speed, whorei.n Fig. 2 . c2 shows an exemplary sampling or storing for a first; rep] ay speed vx, Pig. 2c. 3 shows a samp! i ng cr storing the same section of the film with a second replay speed v:, and tig. 2 c. 4 shows sampling of: the same sect.ion of the film For a third sample speed v3. Thereby, in this example, v* is ha 1 F the si ze of v? and twice the size of v5: vL - v2 /2 and v-i - 2 ■ v3 .
Al I three sound signals illustrated in tigs. 2c.2. to 2c.4 have the same sample at the position h-i or at the corresponding time T'i, respectively.'Thus, correspondingly, as i I lustrated exompiarily in Figs. 2c.2 to 2c.4, the stored image information or the reference sound signal in Fig. 2c. 2 is represented by n} + 1 il samples, in Fig. 2c.3 the same section of the film is represented by n2 + 1 6 samples and in Fig. 2c.4 the same section of fiim is represented by n3 -t 1 ; 21 samples.
As can be seen in Figs. 2c. 2 to 2c. 4, with a constant sample rate, an increase of the replay speed v corresponds to a time compression of the sound signal, i .e. a doubling of the replay speed vi of Fig. 2c.2 leads, as indicated in Fig. 2c. 3 Lo halving lb " rl'i an(i n, 'and a reducLion of the
replay speed v Lo a Lime extension of the sound signal,
i.e. halving the replay speed vi of big. 2c./1 leads to
doubling T2 i'i and n, as indicated in F.ig. 2c. 4.
bigs. 2d.l and 2d.2 correspond merely to big. 2c. 1 and 2c.2. Compared Lo Fig. 2c.1, Fig. 2d.] shows two additional positions defining a search section or a search window with regard Lo Lhe film and Lhe film informal, ton applied Lhereon, wherein a first; position of the search window is indicated by L0 and a second position of the search window is indicated by L3, wherein the section between Lhe
posh, ion I,,-, and the.1 position ln is qrcaLor than the section defined by positions t; and h2, or A!1;v-i,:d;;w > Ah with Ah,,,,;.,.; y., I.-.- to and At - t2 l,1 applies. Co r respond i nq 1 y, in id q . 2d. 2, additionally t.o Pig. 2c. 2, the time T0 representing Idio Lime associated to the position I,0 based on the qi-ven replay speed, and the time t3 representing the time associated to the position L5 based on the given sample rep 1 .• Jy specd wore added.
In relation to the generation of the stored fi 1m i [i f ormat i on or the reference sound s igna.I and add.itionai.Iy stored time scale, respectively, this means that 'it defines, for example, the time on the time scale associated to the position f0, the time it defines the time on the time scale associated to the position hi, the time T2 defines the time on the time scale associated to the position h2, and the time t3 defines the time on the time scale associated to the position L3 on the film.
Fig. 2d. 3 corresponds to Fig. 2c. 2.
In the following, with regard to Figs. 2d.2 to 2d.4, a
basic curve of a comparison of two signals via correlation
or the problematic of a variable replay speed when
comparing two signals, respectively, will be exemplarily
represented and discussed.
Thereby, fig. 2d. 3 illustrates currently read film information applied to the film or the test, sound signal 7.70, respectively, and Fig. 2d.2 stored film in formation, or a reference sound signal, respectively, wherein in an optimum case, which is represented by tig. 2d.?, and Fig. 2d.3, the memory replay speed and the memory sample rate with which the reference sound signal has been generated, correspond to the replay speed of the test sound signal and the sample rate of the test sound signal or as above mentioned, the quotient of memory sample rate ■ fmemory anci memory replay speed vmemory corresponds to the quotient of
samp I rate: for t.ho test, sound signal f and replay speed of t.ho lost, sound si qua] v, respect; i vo i y. ! n t.hal. case;, 'die re; for once sound s i qna 1 or a sect. i. on of Lhc roforence son rid s i qna ! defined by 'l'i and 'i'?, respect ively, can correspond oxacl.ly to Lhc Lest, sound s.igna.l represent, inq t.ho seoL.i on between 'Id and '!'?, more precisely, Lheir sample sequences, and a def.in.iLe local maximum or a correlation peak can bo qa i rued via correlation, as i I Lustrafed exemp 1 a r i ly in b; q .
The pos.it ion of the peak indicates Lhc Lime shift of tine test sound signal in relation to the reference sound signal or the search window, respectively. Based thereon, the current, time can be determined with regard to Lhc stored t. j me scale.
I.n contrary to bigs. 2d.l to 2d. 4, bigs. 2d.b to 2d. 8 show an example whore t.ho replay speed of the test sound signal indicated in big. 2d. 7 is reduced compared to the replay speed of the test, sound signal as indicated in big. 2d. 2.
big. 2d.5 corresponds to Fig. 2d..1. big. 2d.6 corresponds to big. 2d. 2, that, means Fig. 2d. 6 represents an exemplary curve; of a reference sound signal based on a memory sample rate fmemory and a memory sample speed vmemoiy. big. 2d. 7 shows an exemplary curve or an exemplary sample of the test, sound signal, based on a Lest sample rate f unaltered in relation
to 2d.3 or big. 2d.6, respectively, but. an altered reduced
replay speed v' of the test sound signal.
Relating to a time period AT under consideration, this means that in the same time period AT with reduced speed v' only a smaller section or a section of less length At' according to Aid v' • AT of the film is replayed, so that relating to the just played film after the time period A' only one position I/2 prior to the position 1,2 is reached, as illustrated in Fig. 2d.5. Relating to the reference sound signa.i and the time scale associated thereto, the
time "\" ? of the t. i mo scale is associated t.o the position i.';:, as .indicated in i'i.g. 2d. 7.
Relating to the individual samples of" the tost sound signal, th.i s moans that the "spatial" curve of" the test sound signal predetermined by the soundtrack of the f" i Ira is invariable, so that with lower replay speed, v' corresponds to a sample period At or a corresponding spatial sample-section Al', respectively, which is smaller than Al, so that., as indicated in Mi g. 2d. 7 compared to tig. 2d. 6, the samples of the test sound signaj "migrate" towards the left with regard to the "spatial" signal curve.
In the opposite case where the altered replay speed v' is greater than the memory replay speed vmemory, the opposite case occurs, where in the same time period At a longer spatial section Al is played, so that the samples of the test, sound signals "migrate" towards the "right" on the signal curve in the "spatial" curve of" the test sound s i g n a 1 .
Thus, with an altered replay speed, regardless whether it is higher or lower than the memory replay speed, the result of" the comparison decreases, since even with otherwise optimum conditions, the tost sound signal and the reference sound signal reproduce two different spatial sections of"
the film. The result of the comparison becomes the worse
the more the memory replay speed deviates from the test replay speed. When comparing by correlation, the amount of the local maximum or peak decreases and the maximum itself becomes broader and Hatter, so that the time determination with regard to the time scale becomes more and more inexact until it is no longer possible.
Under real conditions, the replay speed of the test sound signal, varies, for example, not. only between different film projectors but can also vary during a film. Thus, accurate
refuring is essential to ensure synchronism during the
whole ! i.J m .
Thus, the apparatus for performing a correlation varies the sample rate of the test sound signal or the sample rate el the reference sound signal to minimize the adverse effect of a variable replay speed of the test sound signai as dose- ibed above according to the above-described condition that '..no quotient of sample rate and replay speed of the t.est sound signal, and the reference sound signal have to be the same Ln order t.o represent the same sect: ion of the f i lm w i f h trie same samp 1 os .
In a digital reference sound signai that has been generated before with a memory sample rate, the change of replay speed is effected by sample rate conversion, wherein the stored reference sound signal 274 is, for example, correspondingly interpolated to generate a reference sound signai with the sample rate corresponding to the altered rep I ay speed.
figs. 2d.l 2d.8 represent simplified examples, where it has been assumed for clarity reasons that the memory replay speed • vmemory corresponds to a normal or common replay speed of a player for generating a test. sound signal. As explained above, however, the quotient of sample rate f and
rCDlfiV SPCCd V 13 t-he amount that has to be same for the reference sound signal and the test sound signal, in order to be able to represent the same section of the film with the same samples, as discussed above. For example, when generating the reference sound signal, double replay speed can be used when the sample rate is doubled at the same t. i me.
In an embodiment according to Fig. 2b the moans 210 for determining can determine a measure for the test replay speed based on the result 278 of the correlation.
One approach is to use a single correlation result for the dote r mi nat i on oh a measure of" the rep.lay speed by compel ring, tor example, an amplitude of" a peak with a given threshold to determine whether a deviation bet,ween a repiay speed of: a test sound signal and a reference sound signal. ! ies within in a given range.
In a preferred embodiment, at least two different reference s o unci sign a 1. s b a s e d o n d i f f e r e n t. r e ['ore n c e s a mp "1 e rat. e s o r corresponding to different reference replay speeds, respectively, are compared to the test sound signal., to compare the results of" the correlation, for example, via qua 1 i by evaluation, which is discussed in more detail with reference to big. 5 in order to determine from the same a most similar reference sound signal and thus a measure for the replay speed of the test sound signal based on .the known sample rate and the known memory replay speed. Thereby, the different reference sound signals can be formed successively and compared to the tost sound signal or can bo formed and compared simultaneously.
A particularly preferred embodiment of the apparatus for performing a correlation generates three reference sound signal's based on different reference sample rates, wherein the reference sound signal of the medium of" the three sample rates is based on the reference sample rate of" the
reference sound signal which had the 'bost quality 01
maximum match with the test sound signal, respectively,- in
a previous comparison and wherein the two other reference
sound signals have each a reference sample rate, which is
higher or lower than the reference sample rate of the
medium reference sound signal or reference sample rate,
respectively. This is controlled by the means 230 for
varying based on an output signal of the means 210 for
determining the measure for the test replay speed. Thus, it
is ensured that the reference sample rate or the reference
replay speed of the reference sound signal, respectively,
is adapted to the replay speed or reference sample rate of I he Lest; sound signal, respect, ivo 1 y .
I'd q . 3a shows an exemplary fd..Lm as i Id ust ra Led .in Fig.. 8 and a basic b.lock diagram of an apparatus for determining a pos i I. i on i n the f i 1 m.
The embodiment, of the apparatus for determining a position in a Id 1m shown in Fig. 3a can, for example, bo used in an apparatus for q one rating a controJ signal for a Id Lm event system, as shown, lor example, in Fig. 1, as means 180 for d e t. e r m i n i n q t h a c o n L r o .1 s i q n a J .
The apparatus for determining a position in a film comprises a memory 320 for storing a reference fingerprint-representation of" the film information, wherein -the fingerprint, representation is formed such that, a time curve of" the fingerprint, representation depends on a time curve of the film information, and wherein a time scale is associated to a stored reference fingerprint representation, a means 340 for receiving a section read from the film, a means 350 for extracting a test fingerprint representation from the read-in section and a means '360 for comparing the test fingerprint, representation l.o the reference fingerprint representation to determine the position in the film based on the comparison and the
In a preferred embodiment, the fingerprint representation comprises a representation in form of a spectral flatness, wherein a time curve of the fingerprint, representation comprises a time curve of the spectra.! flatness.
Fig. 3b. 1 shows an exemplary film 110, as illustrated in Fig. 8. Thus, during playing the film with a given replay speed, for example, the time '1'ioc of the time scale corresponds to a position Lioo of the film, the time T103 of the Lime scale to a position hi03, the time '\]]U of the time
sorj 1 ■." lo a pos.it. ion i,; ;. and the Lime !iY:6 of trie Lime scale to a pos i f i on I-; i ;■ .
I n t.ho sLep of generating Lhe reference f. i nqerp r i n t.
rep resenLat. con of Lhe film information, in one embodiment., a fingerprint is determined for certain spatial or time portions of the film, respectively.
Kig. lb. '?. shows, for example:, a fdrst section oompr.lsi.nq the section from the position f,10u to fie, or ifoo t.o '!'■!,, respectively, and a second section comprising the section from Lhe pos.it ion hjoj to the position f,113 or from the time i'loj to the time Trier respectively. Based on these sections, a fingerprint associated to the section is generated based on, for example, spectral analysis, Fourier transformation or other methods of." feature extraction. In a particularly preferred embodiment, the fingerprint comprises the spectral flatness y/, which is calculated from the curve of the power density spectrum, so that the value of the spectral flatness is determined for every section, and a sequence of spectral flatnesses results in dependence on the time curve of the film information, for example the sound signal, which is stored in the memory 320 with the associated time scale.
Sample rate, length or duration of the section, r0SP0CtLVe.]y, OI tlie distance between two subsequent sections are determined according to the requirements, for example, with regard to uniqueness or accuracy of the determination of the position in the film. The longer the section the clearer the specification of the feature in general, the higher the sample rate and/or the smaller the distance between two sections the more accurately the positron in the film can be determined. The higher the sample rale the longer the sections and Lhe lower Lhe disLances between Lhe sections, Lhe higher Lhe memory requirement, for Lhe reference signal or Lhe requirements oi computing power signal processing.
A slqn i -ricdiiL advantage of"" the fi nqo rp r i n t representation In lorra of spectral flatness is its lower memory requirement compared to, for example, a complete storage of tdie power density spectrum for an equa i section. Preferably, a curve or sequence of spectral flatnesses, respectively, i.s used as fingerprint for a section.
I'd q . la shows an exemplary film 110, as indicated in tig. 8, as 'well as an apparatus for determining a position in a f" i lm having film information appl i ed in a time sequence.
The embodiment of the apparatus for determining a position in a film shown in Fig. 4a can, for example, be used in- an apparatus for generating a control signal for a film event system such as shown in Fig. 1, as a means 180 for determining the control signal.
The apparatus for determining a position has a memory 420 for storing film information applied to a film in time sequence, wherein a time scale is associated t.o the stored film Information, a means 440 for receiving a section read from the f i lm and a synchronization means 460, which is formed to compare a sequence of . samples of the read portions based on a first sample rate and a first search window of the stored film information to obtain a coarse
rcsu.lL and to compare a sequence of samples of the read
sect.ion based on a second sample rale and a second search window of the stored film informal.ion to obtain a fine result pointing t.o the position of the film, wherein a position of the second search window in the stored film information depends on the coarse result, and wherein the first, search window is longer in Lime than the second search window and wherein further the first sample rate is lower than the second sample rate.
Fig. ba shows an exemplary film 110, as indicated in E'ig. 8, as weii as a preferred embodiment of an apparatus 'for
gcneratinq a control s i gnai Cor a f" i lm cvonL system, which is termed to determine the control s i qna I based on an aria i-q soundtrack applied to the Pi lm of a section of the audi'1 s i qna 1 or test sound si. qna.l, respectively, road from the i i lm, and a stored digital version of the test sound signal, re herred to as reference sound signal below, to which a time scale is associated, .by comparing the test sound si.qua I and the reference sound signal via the time
Fig. ha shows a preferred embodiment of an apparatus for qenetafing a control signal for a film event system having a f i rsf film sound sampler 542, which is connected to a first A/D converter 544 (A/I) - analog/digital), wherein the first A/D converter 544 is connected to a first, feature extractor 552, a first, means 562 for correlation with a first reference sound signal based on a first, sample rate, with a second means 564 for correlation with a second reference sound signal based on a second sample rate, and a third means 566 for correlation with a third reference sound signal, based on a third sample rate. An input, of 'the first, means 562 for correlation, an input of the second means 564 for correlation, and an input of the third means 566 (or correlation are connected to an output of a sample rate converter (SRC) 232.
An output of the first, means 562 for correlation, an output of t.hc second means 564 for correlation and an output of the third moans 566 for correlation are connected to an input, of a first means 568 for qualify evaluation. The means 568 for quality evaluation again is coupled to the sample rate converter 232 and a means 570 for sampler selection, wherein an output of the means 570 for sampler selection Is connected to an input of a timer 582. The timer 582 again is connected to the stored soundtrack or a means 522 for storing the soundtrack, respectively, wherein an output of the means 522 for storing the soundtrack is connected to an input of the sample rate converter 232.
An output of Lho first. loaLuro cxl.racLor 552 is connected to an input of means 554 for comparing a feature having, for example, a feature classificator and a database of features, wherein an output oi: the means 554 for comparing a feature is connected to an input of the timer 582.
An output, of the timer 582 i. s coupled to an input, of a means 584 i'or: t. fine code generation, which has a time code database or is coupled to a time code database, wherein further an output, of the means 584 for time code generation is connected to an .input of means 58 6 for time code smoothing, wherein the means 586 for time code smoothing is further formed to output, a time code 592, wherein further an output of the means 586 for time code smoothing is connected to an input, of a word clock generator 588, which is further formed t.o output a word clock signal 594.
Opt. ional.Iy, the apparatus for generating a control signal for a film event system further has a second film sound sampler 542', which is connected to a second A/I) converter 544', wherein the second A/D converter 544' is connected to a second feature extractor 552', to a fourth means 562' for correlation with a fourth reference sound signal based on a first, sample rate, to a fifth means 564' for correlation with a fifth reference sound signal based on a second sample rate and to a sixth means 566' for COr.re.l.clL.ion WiLJl a sixth reference sound signal based on the third sample rate.
An output of the fourth means 562' for. correlation, an output, of the fifth means 564' tor correlation and an output of the sixth means 566' for correlation are connected to an input of a second means 568' for quality evaluation, wherein an output of the second means 568' tor qualify evaiuat.ion is connected t.o an offset compensation 5 69 and a further output to an input of the sample rate
converter 232, and wherein further the means for offset compensation b69 is connected to the samp .1 or select, ion b' Thereby, the first fi Im sound sampler 542, also referred to as main sampler, is positioned such that the apparatus lor g e n c r a t i. n g a c o n t r o i s i q n a 1 has enough t, Jmc t o s y n c h r o n i z e . Thus, the first fi 1. m sound sampler 542 provides a predelayed s i gnaJ. . At the time of synchronization, the correlation window width or width of the section of the test, sound signal is added. Based on the perforations on the spool of f i 1m, the time difference for the predeiay can be adjusted accurately. Three seconds are recommended as f i r s t, b a s i s .
Below, the mode of operation of the embodiment of the apparatus for generating a control signal for the film event system will be discussed in more detail, wherein the principle wiil be discussed based on the test, sound signal generated by the first film sound sampler b42 or its signal process i.ng chain, respectively, since the second optio.nai signal processing chain or signal processing of the test, sound signal generated by the second film sound sampler b42', respectively, corresponds to the first and thus merely the means 569 for offset, compensation will be discussed in detail.
The first, film sound sampler 542 reads the sound signal
from t;he soundtrack of: the film or samples the sound signal from the soundtrack of the film, respectively, and passes this signal on to the first A/D converter b 4 4, wherein the first A/D converter 544 is formed to generate a digital audio signal or test sound signal based on the sample rate of the first film sound sampler 542 and the replay speed of the film from which the soundtrack or film information, respectively, is read.
Based on the test, sound signal 270, one or a plurality of features is extracted or a test fingerprint, representation
is lormcd, respectively. For the f'oaLurc oxLracLion or f" i nqo rpr.i nf rep rosontat, i on, respect, i vol y, for example the spectral f.lat.ncss is used as feature or fingerprint,, respectively. The Lest fingerprint representation is then compared to a reference fingerprint, representation by the means 544 for comparing a feature or a fingerprint representation, respectively, wherein, as mentioned above, fiie fingerprint representation is formed such that a time curve- of the fingerprint representation depends on a time curve of the film information, and wherein a time scale is associated to a reference fingerprint representation stored in a means b44 for comparing a feature, and a means bb4 for comparing is formed to determine a position in the film or to generate a time code signal 544Z, respectively, based on the comparison of the test fingerprint representation with the reference fingerprint representation and the time s c a I e .
Based on the stored reference sound 'signal 274, the sample rate converter generates the same signal with slightly different sampie rates, i.e. modified reference sound signals for the correlations to be calculated in parallel. Thereby, the case that a modified reference sound signal has the same sample rate as the original reference sound signal, is included, so that for the discussion of Fig. 5 below generally the term reference sound signals is used.
In olhcr words, the sample rate converter 232 generates three reference sound signals 276 or modi tied reference sound signal 276, respectively, wherein a first reference sound signal, is based on a first sample rate and supplied to the first, means S62 for correlation, wherein a second reference sound signal 276 is based on a second sample rate and supplied to the second means b64 for correlation, and the third reference sound signai 276 is based on a third sample rate and supplied to a third means 566 for correlation. The sample rate converter 232 provides slightly stepped signals, with different sampie rates, to
the corrclaLion or Lhe means 562, 564, 566 for correlation, respectively, wherein the sample rate is always adjusted in dependence on the previously measured maximum peak to noise value: from the correlation. One correlation each receives the modified reference sound signal with this sample rate, a further correlation receives a further slightly lower one, which is one step lower, and ' a further correlation receives a si iqht.iy higher stepped sample rate::. Thereby, if is ensured that the sample rate converter can tune or synchronize, respectively, to a speed change ol' the analog sound s i gna 1 .
The means 522 for storing the soundtrack and the sample rate converter 232 are preferably formed to use a window width of 2", to calculate large calculation windows via fast. Fourier transformation (FFT) with little effort. More than three correlations can be calculated in parallel to compensate for sudden jumps in the sound check. The correlation window is selected large to obtain a significant correlation peak. To obtain the defection accuracy of the correlation peak in a sample or a sample period, respectively, oversampling of the input, signal or test, sound signal, respectively, can be used.
The means 522 for storing the soundtrack outputs the reference sound signal in the length of the correlation window in dependence on the supplied time code signals 582Z
of' the timer b02, wherein the corre.I.at. Ion window is the search window wherein the test sound signal, is searched.
The first means 568 lor quality evaluation is formed to perform a maximum value search in the result, of the crosscorrel at.ion of the signals or the amounts of the signals, respectively, and to weight the qualify of the result of the crosscorrelation depending on the height of the correlation peak compared to other peaks and the crosscorrelation or to determine the quality of every
i nd i v i chia '1 cor re 1 a L i on w L Lh rega rd l.o t.hc poa k Lo ric.i so d i s t. a n c o , r c spe c t; .i vc 1 y .
Based on Lho qua Id to/ cvaluald.on, Lhe rcCerence sound s i qna ! wild: Lho best. qualiLy factor or quaiiLy, respocdoi vo 1 y, is dote; rn i nod, and based on Lho pos.i L i on of Lho poak of. Lho reference sound signal with Lhe best. qualiLy or qualify factor, Lho shift, of Lhe peak in relation Lo Lho search window is determined, and, for example, output as time code dificroncc between measured and actually valid time code er-as relative time code.
Depending on the result o.f the qua 1. Lfy eva.luaL.ion, Lhe first, means 568 for qualiLy evaluation sends a control signal 568A to the sample rate converter 232, which, for example, differentiates only the three signal values "0", "Pi", and "-]", wherein, for example, with "0", the sample rates of the last sample rate conversion or correlation, respectively, are maintained, because the correlation result, from Lhe modified reference sound signal with the medium sample rate has been determined as the one with 'the highest qualiLy, wherein with "»]", the sample rates are increased by one step in relation t.o the last, sample rate conversion or correlation, respectively, because the correlation result from the modified reference sound signal with the highest sample rate has been determined as the one with Lhe highesL quaiiLy, and with "-1", the sample rates
are reduced by one step in relation to Lhe previous sample rate conversion or correlation, respectively, since the correlation from the Lest sound signal, and the modified reference sound signal with the lowest, reference sample rate had the best, correlation result or the best peak to noise distance, respectively.
in other words, depending on with which sampJo rate (first, second or third) the best correlation poak has been obtained, the sample rate converter is increased or
decreased, e.g. by a sample rate cie 1 t a value or control loo sued) that it. per terms no sample rat.e conversion.
Thereby, the correlation serves for- addressing two main aspects. First., the determination of the position in the f i .1 m or determination of trie time in the hi 1m, respectively, based on the time code difference from the correlation. Second, the doterm.i na-f i on of the measure for the replay speed to determine the optimum reference sample rat.e or optimum sample rate conversion of the reference sample rate, respectively, here, the adaption of the sample rates or the generation of adapted sample replay speeds, respectively, again allows improved correlation results and thus improves the time determination or determination of the position in the film, respectively, and thus improves synchronization and prediction.
A preferred embodiment according to Fig. b is performed to defect signal parts with certain characteristics via signal analysis to suppress them during synchronization and thus avoid wrong detections or synchronizations, respectively, or t.o avoid random variations of the time axis.
Such characteristics can, for example, be the loudness of the signal component or the "problems" of a signal and the signal analysis or detection of problematic components can
bo based on 3M (signal to noise ratio), I?NR (peak to
noise), spectral power or power density spectrum, spectral flatness or averaging of a time sequence.
Below a threshold of the peak noise value or the peak noise
distance, the time code difference can, for example, be detected as invalid. Or if several peaks with similar peak noise distance are determined, the Lime code difference can also be detected as invalid.
Further, the quality of correlations with quiet signal components, i.e. signal components with low amplitudes, LS
i owe r ;.han Lho one of correlations with loud signals duo ;. o (die higher quanL i.zat. J on noise during digital sampling compared to, thus, quiet., s.igna.l components a re suppressed based on thresholds o.r adaptive ly, to avoid random va r i a h i oris o f the t i me ax i. s . Add i t i ona 1 1 y, the s i gna 1 ene rgy can be a further quality icaturc.
A further example is suppression of problematic, because repetitive signal components to avoid ambiguities and thus, tor example, wrong synchronization.
Problematic signal components or portions, respectively, can further be signalized as metadata, for example, to suppress these signal components independent of the quality of the current, correlation.
The means b84 for time code generation is formed to convert based on the time code signal 582Z of the timer 582, which can, for example, be based on an infernal or proprietary time code, tor example into a standardized time code or a time code signal based on a standardized time code.
The timer b82 is controlled by an internal, clock (interval or frequency of the correlations), a coarse audio ID fingerprint or fingerprint representation, for example the time code signal bbiZ from the feature determination or fingerprint representation, and the determined correlation
ci:i It erence, for example the time code difference signal b70/. determined from the correlation of the means 570 for sampler selection. The timer has to perform a prioritization of correlation signal (highest priority), Lime code from feature determination and interna! clock (1 owesf priori ty) .
The means 586 for time code smoothing is formed to smooth the time code signal 584Z, to so, for example, avoid a highly jumping time code or to find useful intermediate valuers if there are no time codes from the correlation, to
compensate, for example, breaks .in the analog sound. The t. i mo code s i qna i 592 qonoraLed by the moans 586 for rime code smoothing is preferably a standardized time code, by wh i cri the f i .l.m event, system .is synchronized or cont.roi led, rospecl; i ve 1 y. However, t.ho t.i.me code signal 5 92 can also be useci 't.o qonera t.e the corresponci i. ng samp 1 e c 1 oc k v i a a slowly regulating phase locked .loop (Phf), If t.ho included sound replay system i.s digital. Such phase locked loops are aval I able as complete devices and thus no subject, matter of t.ho pa Lent.
Optionally, more than one film sampler with time different, offset, from the projection lens can be used tor improving the robustness with damages of the film or for the synchronization of unsuitable portions, respectively.
A second 11Im sound sampler 542' can then, for example, be used, since the second film sound sampler 542' already exists in conventional cinema systems. Breaks in the analog sound can here be bridged by the fi.im sound samplers 542, 542' applied at different positions on the cinema film, since with short breaks in the film sound the probability increases that at. least one sampler, either the first film sound sampler 542 or the second film sound sampler 542' provides enough signal for a correlation and the associated synch ronizat. ion .
further, optionally, different samplers, e.g. for analog sound, Dolby digital sound (including decoder), DTS digital sound (including DTS decoder) or a different sound as well as a combination of the above-mentioned can be used as reference soundtrack and/or test soundtrack.
Here, individual tracks can be used for the comparison by using averaging, majority decision or prioritization, automatically or via metadata of the generated time information as well as a downmix on mono.
Goners My, different samplers can be used for different sound forma!,s and/or different film sampler's with offsets d i f fe ren f i n f i me.
The usage of a downmix on mono has the advantage that, when the mono track is used as stored soundtrack, needs to be stored compared to storing, for example, five channels.
The storage of several., i.e. more than one soundtrack, i.e. no downmix, means that all. channels are stored independently of each other and that, then, for example, as ci i scu s sed a bove , corrcspond ing compa r i. son s o r ma j or i t;y decisions have to be performed to perform the synchronization by using a certain channel, the actual soundtrack and a corresponding channel. of the stored sound t. rack .
The initialization phase or first synchronization and the synchronization, respectively, after a sound break from two critical, phases during film projection or a synchronization of a film event system, respectively.
Thus, preferred embodiments calculate more than three parallel correlations in the beginning, since no synchronization has been performed, this means more than three reference sound signals of different, sample rates are compared or correlated, respectively, with the test sound
signal Lo deLermi.no Lhc correct: sample raLo or sample speed of Lhc test sound signal as fast as possible. Here, different sample rates can be tested one after other until one of" the correlation has the best signal noise distance.
AlLornative 1 y or additionally, the first feature extractor bbz and the means b54 for feature classification provide, together with the database, a coarse absolute time code value defining a coarse position in the t.i..lm, l.o perform in a second step, for example by Lhe correlation, a fine determination of Lhe po si Li. on of Lhe film or a fine time
code dote rminat i on, respectively- As soar, as Lho synch ron.i./.a Li on has boon made, Lhroo corrclat. ions can bo usou Lo synchronize chanqos of." Lho ropiay speed of Lho Lost. sound signal during film projection.
The accuracy with which a position in a f i lm or the Lime ,.j ssoo i a t cd to the position, respectively, can be associated to the time scale (time code), depends on Lho sample rate of the reference sound signal and the sample rate of the tost sound signal, the higher the sample rate the more exact the position in the film can be determined. However, a lower sample rate has the advantage that with the same number of." samples a longer section of the reference sound signal or the test sound signal can be represented. Thus, a preferred embodiment is formed to determine in a first step a coarse determination of a position in a film by representing a longer section of the film by a reference sound signal with lower sample rates, and a test sound signal is also gained by sampling with a lower sample rate. Then, in a second step, based on the coarse position in the film, a reference sound signal of higher sample rate and a test sound signal of" higher sample rate are used for a fine determination of the position in the film.
In other words, the window length is adapted during correlation. At the beginning of the search, windows with long timing but a reduced sample rate of the signals are
used, but: when a Lime is to be found approximately and is
only Lo be followed, short, windows arc used, even with
oversampling of Lhe signals Lo obLaln a higher Lime
In Lhe iniLializaLion phase, for example, a "compaLible replay" of Lhe "old" audio format, can be performed unLi.l. Lhe exacL position is deLermined.
In '.he same way, a "compatible replay" of the "old" audio format can bo performed when t. ho synchronization has been clearly .lost, until the exact posit.ion is determined again.
The moans 570 lor sampler selection and the means tor oi.isot compensation 569 are only required in embodiments with more than one film sound sampler. Thus, for example, the means 570 tor sampler selection decides whether the result or the time code difference of the first means 568 for qual i L y evaluation 568Z, respectively, or the result, or the time code difference 568Z' of the second means 568' tor qual ify evaluation, respectively, is passed on to the timer 582 for determining a position in the film or a time code 582/, respectively. Since the second film sound sampler b42' samples the test sound signal, at a different position of the film, the difference (offset) between the position where the first film sound sampler 542 samples the film to the position where the second film sound sampler 542' samples the film is compensated by the means 569 for offset compensation, so that the timer 582 obtains the correct time code difference 570Z, regardless whether the time code difference 568Z or the time code difference 568Z' is selected with regard to the last stored Lime or the last stored position of the film, respectively, stored in the L i me i".
D.i f, 1 o r i.ncj from the embodiment illustrated in Pig. 5a, the
different reference sound signals of different; reference sampie rates can also be generated successively and compared or correlated, respectively, to the test sound signal' to determine the measure for the replay speed of the test, sound signal or the optimum reference sampie rate, respectively. Alternatively, more than three modified reference sound signals can be compared to the test sound 5ignc.il, parallel or serial, to allow a fast synchronization not only in the initial, phase but also to synchronize the
film event system during film projection more quickly to
I. ho current position in t.ho film after larger jumps in the f i Im, o.q. caused by cuts or port, ions missing in the f i !m, .
i) if (-ring from the embodiment i .1 i us I. ra ted in big. b, a synch ' on i /.a t i on of a film event. system can also bo performed based on the pictures applied to the .film, both for -an evaluation of .features or fingerprints, respectively, and for a correlation of a test image signal with one or a pl.ural.ito/ of reference image signals.
Thereby, as i i 1 usf rated above, t.ho correlation of audio and/or video signals can be used for determining the f i.mc space in an audio and/or video stream, and synchronous replay can bo controlled due to this time determination.
Alternatively, the determination of an audio and/or video signature from the raw material, in the form of an audio ID/video "ID (ID =- Identification) can be used for coarsely determining the time in a long AV stream to enable synchronization at any position.
The basic approach of the invention is to store the already existing analog sound again in digital form to synchronize onto the cinema film with the analog soundtrack via correlation and other feature determination. The output signal or control signal, respectively, of the apparatus for generating a control signal or the synchronization
device, respectively, can be any time code formal.
Preferably, the SMP'i'E standardized Li'C time code format is
used, for every cinema film, during production, a dataset
has to be generated for the apparatus for generating a
control signal or for the synchronization device,
respect, i.ve 1 y.
During production, a separate data carrier i.s generated for every cinema film for the above-described means for generating a control si.gnat or a synchronization device, respectively. The data carrier contains the dig! ti.zed
analoq soundtrack, e.g. in Dolby stereo formal., as can be found on the spoof of.' film, feature data to the soundtrack and niatchinq time codes.
In the f'o.1 lowi.nq, an exemplary determination of a time code difference i.s described with reference to tigs, bb.l to bb. A .
Fig. bb.il shows an exemplary film 110 with a soundtrack 114 as already described in Fig. 8.
Based on the Lime code signal. 58 2 7, of the timer b82, a reference sound signal 27 4 is read out from the means 52 2 for storing a soundtrack and a modified reference sound signal is generated according to Fig. 5b.2 via the apparatus for sample rate conversion 232, which represents a fi 1m section from the position L0 to the position L3 or the time T0 associated to the position L0 or a corresponding time code and the time T3 or time code, respectively, associated to the position L3.
Fig 5b.3 shows an exemplary test sound signal or section of a test sound signal, respectively, which is defined by the starting time '\\ and the end time T2 and has been generated based on the sample rate f = 1/At.
Fig. 5b. 4 shows the result of the correlation of the
modified reference sound signal according to i'ig. 5b.2 and the section of the test sound signal Kj.g. 5b.3. The time difference AT' :r '\\ - T0 between the starting lime T0 of the search window or modified reference sound signal of Fiq. 5b. 2 and the time Ti of the search window or reference sound signal, respectively, is the time shift based on
which the time code difference or the relative time code,
respectively, is formed. Thereby, the time Id is the time or the time shift of the test sound signal where a section of the reference sound signal, which is n 11 samples long, maximally matches the test sound signal or a
correlation of" the reference sound siqna.t and t.he lost, sound signal, which is N = 11 samples .'long, has a maximum as correlation result, respectively.
Thereby, know.Ledge of. the absolute time T0 or the time T;
is not required for quality evaluation 568, since, .for
example, the timer 582 knows the last, absolute time or
absolute time code, respect i. ve.1 y, and only requires the
time code di fference 570Z to determine the updated absolute
time or t i mo code, respective 1 y . '.I'he di.f 1 e ronce can for
example be i I.lust rated from the position of the peak in
relation to the time of the beginning of the search window.
In big. 5b.4, the peak is, for example, the first sample,
i.e. the test sound signal of Fig. 5b.3 is shi ffed by "3 ■
At." in relation to the reference sound signa.1 ol' big. 5b. 2,
wherein At. is the sample period corresponding to the
modi tied sample rate.
Thus, the time code difference 570Z can consist, for example, of" the value n - 3. Here, the advantage ol the sample rate or replay speed of the reference sound signal, respectively, adapted to the variable replay speed of the test, sound signal is advantageous, since At is also adapted t.o the replay speed, a more exact, determination of 'the position of the film or offset in relation to the search window is possible compared to a fixed sample rate of the reference sound signai, since the only multiples of this
samp Io rate arc generated for a dct.Grmiriat.ion of the
pos i 1. i on in the film.
Thereby, for example, the lime T0 of the search window or
reference sound signal, respectively, can be equal to 'if of
the previous correlation since the film is only played
big. 6a shows an embodiment of a film system, wherein an apparatus 100 for generating a control signal 190 is coupled to a film event system 600, thereby, t.he apparatus
100 (or qcnoraL i.nq a control signal based on the film 110, shown i.n big. 8, generates the control signal. 190, for example a time code, with which the film event system 600
I s s vnc:hrom zed .
fig. 6b shows a f i l.m system having an apparatus 100 for generating a control, signal 100 and a wave-! i old synthesis system 610 as exemplary film event system, wherein the embodiment of the wave-.fi.eld synthesis system 610 comprises a means 620 [or controlling the wave-field synthesis system, a digital memory 622 for fh'o wave-tie id synthesis audio signal and a plurality of Loudspeakers 624 for the ■wave- f ieid synthesis system. Based on the 11 fm 110 or an analog film soundtrack 114, respectively, the means 100 for gene rating a control signal generates the control signal 190 to enable a wave-field synthesis audio experience with an originally analogously soundtracked film in a lip synchronous way.
As an alternative to the wave-field synthesis system 610, naturally, other audio systems, for example digital audio systems or digital surround audio systems can be synchronized via the apparatus 3 00 for generating a control signal in a lip synchronous way.
fig. 7 shows an exemplary film as i .1 lustrafed in fig. 8, an exemplary digitally stored reference sound signal 720 and
an association of a time scale.
When generating the stored film information or the reference sound signal, respectively, the analog sound signal, is sampled at a given replay speed and a given sample rate, for example 44.1 kHz and sound portions of, for example, 10 ms are stored as a so called audio frame, i.e. the digital reference sound signal is present as a sequence of audio frames on the memory. The associated time of a time scale can then, for example, consist in numbering the audio frames from 0 or 1 i.n an ascending way as time
code: or Lime scale, respectively, Lime code '['CI corresponds Lo audio frame AF1 in Fig. 1 or: for example to find the start i ng time or end time of an audio frame as f i.me code, such as i o r the first audio frame either 0 ms or 10 ms when an audio frame has a .length of 10 ms .
Usual !y, time codes have formats like hour : m i nu to : second : frame;, wherein frame usually relat.es to video frames with, for example, ?A frames per second (cinema film) .
Thus, the t. i.me scaJe or lime code . can associate several audio frames to one vi.deo frame or define an audio frame as sma'lost time scale unit.
Cor responding.!.y, the time code or the time scale can, for example, associate four audio frames to one time code, see TCI' in Fig. 7, which comprises four audio frames AF1 to AF4 , or associate a single audio frame to a time code, see TCI in Fig. 7, to which one audio frame AF1 is associated. Thereby, depending on audio format., the audio frames can also represent, portions of the audio signal overlapping in t. i me .
The control signal 190 can, for example, be formed as time code, but. also as a sequence of pulses, wherein, for example, every pulse corresponds to a time scale unit, and the f i l.m event system accumulates t.he pulses similar to a
re la five time code to synchronize with the film.
A further embodiment offers the approach to embed a watermark into the audio and/or video signal to further have, lor example, an analog sound signal as fallback, but. t.o realize at. the same time a time code for synchronous additional services, it. is an advantage of this approach that, even with "difficult" audio signals, e.g. very quiet sequences or even similar "monotonous" sounds, a clean clock recovery is possible. For this variation, basically, t.he full set of relevant watermarked claims is useful,
par! ocularly i.n Lho area of searching for l.ho correct cloc:k rate or Lho readjustment of Lho sample rafo, respectively, . 'The decisive disadvantage of Lhis approach'! is, however, that. Lho actual f ilm is aflcred or a new version or Lho copy of the f id m has t.o bo made, respect, i vo.l y, in order to be aoie to embed the watermarks into the audio and/or video s i g n a I .
Depending on the circumstances, the inventive method can be implemented in hardware or .in sofd.ware. The implementation can he made on a digital, storage medium, particularly a disc: or CD with electronically readabie control signals, which! can interact, with a programmable computer system such that the method is performed. Generally, the invention consists also in a computer program product, with a program code for performing the inventive method stored on a machine-readable carrier when the computer program product runs on a computer. Thus, in other words, the invention can be real i.zed as a computer program with a program code for performing the method when the computer program runs on a computer.
1. Apparatus for determining a position in a film (110) having advance perforations (116), images (112) and sound information (114) applied in a time sequence, the sound information being applied on an analog or digital sound track (114) on the film, comprising: a memory (320) for storing a reference audio fingerprint representation of the sound information (114), wherein the reference audio fingerprint representation is an audio fingerprint representation of the sound information, wherein the reference audio fingerprint representation is generated based on methods of the feature extraction and is formed so that a temporal development of the audio fingerprint representation depends on a temporal development of the sound information, and wherein a time scale is associated with a stored reference audio fingerprint representation;
a means (340) for receiving a portion of the sound information from the analog or digital sound track of the film, wherein the portion of the sound information is read from the analog or digital sound track (114) on the film (110);
a means (350) for extracting a test audio fingerprint representation from the read portion;
a means (360) for comparing the test audio fingerprint representation with the reference audio fingerprint representation, to determine the position in the film (110) on the basis of the comparison and the time scale; and
means (180) for determining a control signal for a film event system for a film event at the position of the film.
2. Apparatus as claimed in claim 1, wherein the means (350) for extracting is formed to calculate successive spectral flatness values for successive sections of audio data as the audio fingerprint representation, so that a temporal development of the audio fingerprint representation includes a temporal development of the spectral flatness.
3. Apparatus as claimed in claim 1 or 2, comprising a further means for receiving a portion read from the film, wherein the portion is different from the portion received by the means (340) for receiving a portion of the sound information.
4. Method for determining a position in a film (110) having advance perforations (116), images (112) and sound information (114) applied in a time sequence, the sound information (114) being applied on an analog or digital sound track (114) on the film, comprising:
receiving a portion of the sound information from the analog or digital sound track, which portion is read from the film (110);
extracting a test audio fingerprint representation from the read portion; and
comparing the test audio fingerprint representation with a stored reference audio fingerprint representation, wherein the reference audio
fingerprint representation is an audio fingerprint representation of the sound information, wherein the audio fingerprint representation is generated based on methods of the feature extraction and is formed so that a temporal development of the audio fingerprint representation depends on a temporal source of the sound information (114), and wherein a time scale is associated with the stored reference audio fingerprint representation, to determine the position in the film (110) on the basis of the comparison and the time scale; and
determining a control signal for a film event system for a film event at the position of the film.
TITLE "APPARATUS AND METHOD FOR DETERMINING A POSITION IN
A FILM (110V'
The invention relates to a Apparatus for determining a position in a film (110) having advance perforations (116), images (112) and sound information (114) applied in a time sequence, the sound information being applied on an analog or digital sound track (114) on the film, comprising: a memory (320) for storing a reference audio fingerprint representation of the sound information (114), wherein the reference audio fingerprint representation is an audio fingerprint representation of the sound information, wherein the reference audio fingerprint representation is generated based on methods of the feature extraction and is formed so that a temporal development of the audio fingerprint representation depends on a temporal development of the sound information, and wherein a time scale is associated with a stored reference audio fingerprint representation; a means (340) for receiving a portion of the sound information from the analog or digital sound track of the film, wherein the portion of the sound information is read from the analog or digital sound track (114) on the film (110); a means (350) for extracting a test audio fingerprint representation from the read portion; a means (360) for comparing the test audio fingerprint representation with the reference audio fingerprint representation, to determine the position in the film (110) on the basis of the comparison and the time scale; and means (180) for determining a control signal for a film event system for a film event at the position of the film.
|Indian Patent Application Number||4951/KOLNP/2007|
|PG Journal Number||49/2013|
|Date of Filing||20-Dec-2007|
|Name of Patentee||FRAUNHOFER-GESELLSCHAFT ZUR FORDERUNG DER ANGEWANDTEN FORSCHUNG E.V.|
|Applicant Address||HANSASTRASSE 27C, 80686 MUNICH|
|PCT International Classification Number||G11B 27/10|
|PCT International Application Number||PCT/EP2006/005553|
|PCT International Filing date||2006-06-09|