Title of Invention	"METHOD AND SYSTEM FOR PERFORMING EVENT DETECTION AND OBJECT TRACKING IN IMAGE STREAMS"
Abstract	Method and System for performing event detection and object tracking in image streams by installing in field, a set of Image acquisition devices, where each device includes a local programmable processor for converting the acquired image stream that consists of one or more images, to a digital format, and a local encoder for generating features from the image stream. These features are parameters that are related to attributes of object in the image stream. The encoder also transmits a feature stream, whenever the motion features exceed a corresponding threshold. Each image acquisition device is connected to a data network through a corresponding data communication channel. An image processing server that determines the threshold and process the feature stream is also connected to the data network. Whenever the server receives features from a local encoder through its corresponding data communication channel and the data network, the server provides indication regarding events in the image streams by processing the feature stream and transmitting these indications to an operator.

Title of Invention

"METHOD AND SYSTEM FOR PERFORMING EVENT DETECTION AND OBJECT TRACKING IN IMAGE STREAMS"

Abstract

Method and System for performing event detection and object tracking in image streams by installing in field, a set of Image acquisition devices, where each device includes a local programmable processor for converting the acquired image stream that consists of one or more images, to a digital format, and a local encoder for generating features from the image stream. These features are parameters that are related to attributes of object in the image stream. The encoder also transmits a feature stream, whenever the motion features exceed a corresponding threshold. Each image acquisition device is connected to a data network through a corresponding data communication channel. An image processing server that determines the threshold and process the feature stream is also connected to the data network. Whenever the server receives features from a local encoder through its corresponding data communication channel and the data network, the server provides indication regarding events in the image streams by processing the feature stream and transmitting these indications to an operator.

Full Text	The present invention relates to a method and a system for performing event detection and object tracking in image streams. Field of the Invention The present invention relates to the field of video processing. More particularly, the invention relates to a method and system for obtaining meaningful knowledge, in real time, from a plurality of concurrent compressed image sequences, by effective processing of a large number of concurrent incoming image sequences and/or features derived from the acquired images. Background of the Invention Many efforts have been spent to improve the ability to extract meaningful data out of images captured by video and still cameras. Such abilities are being used in several applications, such as consumer, industrial, medical, and business applications. Many cameras are deplotyed in the streets, airports, schools, banks, offices, residencies- as standard security measures. These cameras are used either for allowing an operator to remotely view security events in real time, or for recording and analyzing a security event at some later time. The introduction of new technologies is shifting the video surveillance industry into new directions that, significantly enhance the functionality of such systems. Several processing algorithms are used both for real-time and offline applications. These algorithms are implemented on a range of platforms from pure software to pure hardware, depending on the application. However, these platforms are usually designed to simultaneously process a relatively small number of incoming image sequences, due to the substantial computational resources required for image processing. In addition, most of the common image processing systems are designed to process only uncompressed image data, such as the system disclosed in U.S. Patennt6,188,381. Modern networked video environments require efficient processing capability of a large number of compressed video steams, collected form plurality of image sources. Increasing operational demands, as well as cost constrains created the need, for automation of event detection. Such event detection solutions provide a higher detection level, save manpower, replace other types of sensors and lower false alarm rates. Although conventional solutions are available for automatic intruder detection, license plate identification, facial recognition, traffic violations detection and other image based applications , they usually support few simultaneous video sources, using erpeasive hardware platforms that require field installation, which implies high installation, maintenance and upgrade costs. Conventional surveillance systems employ digital video networking technology and automatic event detection. Digital video networking is implemented by the development of Digital Video Compression technology and the availability of IP based, networks. Compression standards, such as MPEG-4 and similar formats allow traaasmfeting high quality images with a relatively narrow bandwidth. A major limiting factor when using digital video networking is bandwidth requirements. Because it is too expensive to transmit all the cameras all the time, networks ace designed to conecarrently transmit data, only from few cameras. The transmission of data only from cameras that are capturing important events at any given moment is crucial for establishing an efficient and cost-effective digital video network. Automatic video-based event detection technology becomes effective lor this purpose. This technology consists of a series of algorithms that are able to analyze the camera image in peel time and provide notification of a special event, if it occurs. Currently available event-detection solutions use conventional image processing methods, which require heavy processing resources. Furthermore, they allocate a fixed processing power (usually one processor) per each camera inpnt Therefore, such systems either provide poor performance due to reconecees limitation or are extremely expensive. As a result, the needs of large digital surveillance installations -namely, reliable detection, effective bandwidth usage, flexible event definition, large-scale design and cost cannot be met by any of the current automatic event detection solutions. Video Motion Detection (VMD) methoda are disclosed, for example, in U.S. Patent 6,349,114, WO 02/37429, in U. S. Patent Application Publication 2002,041,626, in U.S. Patent Application Publication No. 2002,054,210, in WO 01/63937, in EP1107609, in EP3020, in U.S. Patent 6,384,862 , in U.S. Patent 6,188,381, in U.S. patant 6,130,707, and in U.S. Patent 6,069,655. However, all the methods dascrlibed above have not yet provided satisfactory solutions to the problem of effectively obtaining meaningful knowledge, in real time, from a plurality of concurrent image sequences. It is an object of the present invention to provide a method and system for obtaining meaningful knowledge, from a plurality of concurrent image sequences, in real time. It is another object of the present invention to provide a method and system for obtaining meaningful knowledge from a plurality of concurrent image sequences, which are cost effective It is a further object of the present invention to provide a method and system for obtaining meaningful knowledge,, from a plurality of concurrent image sequences, with reduced amount of bandwidth resources. It is still another object of the present inyention to provide a method and system for obtaining meaningful knowledge, from a plurality of concurrent image sequences, which is rehiable having high sensitivity in noisy environments. It is yet another object of the presentinvention to provide a method and system for obtaining meaningful knowledge, from a plurality of concurrent image sequences, with.reduced installstion on and maintenance costs. Other objects and advantages of the intention will become apparent as the description proceeds. Summary of the Invention While these specifications discuss primarily video cameras, a person skilled in the art will recognize that the invention extends to any appropriate image source, such as stiall cameras, computer generated images, prerecorded video data, and ihe ;like, and that image sources should he equivalently considered. Similarly, the terms video and video stream, should be construed broadly to include video sequences, still pictures, computer generated graphics, or any other sequence of images provided or converted to an electronic format that may be processed by a computer. The present invention is directed to a method for performing event detection and object tracking in image streams. A set of image acquisition devices is installed in field, such that each device comprises a local programmable processor for converting the acquired image stream, that consists of one or more images, to a digital format, and a local encoder, for generating features from the image stream. The features are parameters that are related to attributes of objects in the image stream. Each device transmits a feature stream, whenever the number and type of features exceed a corresponding threshold. Each image acquisition device is connected to a data network through • corresponding data communication channel. An image processing server connected to the data network determines the threshold and processes the feature stream. Whenever the server receives features from a local encoder through its corresponding data communication channel and the data network, the server obtains indications regarding events in the image streams by processing the feature stream and transmitting the indications to an operator. The local encoder may be a composite encoder, which is a local encoder that further comprises circuitry for compressing the image stream. The composite encoder may operate in a first mode, during which it generates and transmits the features to the server, and in a second mode, during which it transmits to the server, in addition to the features, at least a portion of the image stream in a desired compression level, according to commands sent from the server. Preferably, each composite encoder is controlled by a command sent from the server, to operate in its first mode. As long as the server receives feataree from a composite encoder, that composite encoder is controlled by a command sent from the server, to operate in its second mode. The server obtains indications regarding events in the image streams by processing the feature stream, and transmitting the indications and/or their corresponding image streams to an operator. Whenever desired one or more compressed image streams containing events are decoded by the operator station, and the decoded image streams are transmitted to the display of an operator, for viewing. Compressed image streams obtained while their local encoder operates in its second mode may be recorded. Preferably, additional image procaaaing resources, in the server, are dynamically allocated to data commanication channels that receive image streams. Feature streams obtained while operating in the first mode may comprise only a portion of the image. A graphical polygon that encompa an object of interest, being within the frame of an image or an AOI (Area Of Interest) in the image may be generated by the server and displayed to an operator for viewing. In addition, the server may generate and display a graphical trace indicating the history of movement of an object of interest, being within the frame of an image or an AOI in the image. The image stream may be selected from the group of images that comprises video streams, still images, computer generated images, and pre-recorded digital, analog video data, or video streams, compressed using MPEG format. The encoder mey use different resolution and frame rate during operation in each mode. Preferably, the features may includemotion features, color, portions of the image, edge data and freaquency re;ated information. The server may perform, using a feature stream, received from the local encoder of at least one image acquisition device, one or more of the following operations and/or any combination thereof: - License Plate Recognition (LPE); • Facial Recognition (FR); - detection of traffic rules violations; - behavior recognition; - fire detection; - traffic flow detection; - smoke detection. The present invention is also directed to a system for performing event detection and object tracking in image streams, that comprises: a) a set of image acquisition devisee, installed in field, each of which includes: a.1) a local programmable processor for converting the acquired image stream, to a digital format a.2) a local encoder, for generating, from the image stream, features, being parameters related to attributes of objects in the image stream, and for transmitting % feature stream, whenever the motion features exceed a corresponding threshold; b) a data network, to which each image acquisition device is connected through a corresponding data commmanication channel; c); and d) an image processing server connected to the data network, the server being capable of determining the threshold, of obtaining indications regarding events in the image streams by processing the feature stream, and of transmitting the indications to an operator. The system may further comprise an operator display, for receiving and displaying one or more image streams that contain events, as well as a network video recorder for recording one or more image streams, obtained while their local encoder operates in its first mode. Statement of the Invention Accordingly, the present invention relates to a method for performing event detection and object tracking in image streams, comprising; a) installing a set of image acquisition devices (CAM 1 - CAM n), each in field where an event is to be detected, each comprising a local image processor programmed to convert at least one image stream into a digital image format; b) connecting each image acquisition device to a data network (101) through a corresponding data communication channel; c) installing and connecting a central image processing server (102) to said data network (101); the method being characterized by: d) providing an encoder (Encoder 1 - Encoder n) to each image acquisition device for analyzing said digital image stream so as to generate features, the features being parameters related to attributes of objects in said image stream; and transmitting feature stream to the central server, whenever said features exceed a predetermined threshold; e) whenever said central image processing server (102) receives features from the local encoder (Encoder 1 - Encoder n) through its corresponding data communication channel and said data network (101), obtaining indications regarding events in said image streams by processing, by the server (102), said feature stream, and transmitting said indications to an operator, A system for performing event detection and object tracking in image streams, comprising: (a) a set of image acquisition devices, installed in field, each of which includes : a.l) a local programmable processor for concerting the acquired image stream, to a digital format a.2) a local encoder, for generating, from said image stream, features, being parameters related to attributes of objects in said image stream, and for transmitting a feature stream, whenever said motion features exceed a corresponding threshold ; (b) a data network, to which each image acquisition device is connected through a corresponding data communication channel ; and (c) an image processing server connected to said data network, said server being capable of determining said threshold, of obtaining indications regarding events in said image streams by processing said feature stream, and of transmitting said indications to an Operator.. Brief Description of the Drawings The above and other characteristics and advantages of the invention will be better understood through the following illustrative and non-limitative detailed description of preferred embodiments thereof, with reference to the appended drawings, wherein Fig. 1 schematically illustrates the structure of a surveillance system that comprises a plurality of cameras connected to a data network, according to a preferred embodiment of the invention; Fig. 2 illustrates the use of AOTs (Area of Interest) for designating areas where event detection will be performed and for reducing the usage of system resources, according to a preferred embodiment of the invention; and Fig 3 A to 3 C illustrate the generation of an object of interest and its motion trace, according to a preferred embodiment of the invention. Detail Description of Preferred Embodiments A significant saving in system resources can be achieved by applying novel data reduction techniques, proposed by the present invention. In a situation where thousands of cameras are connected to a single server, only a small number of the cameras actually acquire important events that should be analyzed. A large-bcale system can function properly only if it has the capability of identifying the inputs that may contain useful information and perform further processing only on such inputs. Such a filtering mechanism requires miaimal processing and bandwidth resources, so that it is possible to apply it concurrently on a large number of image streams. The present uveation proposes such a filtering mechanism, called Massively Connnrrent image Processing (MCIP) technology that is based on the analysie of incoming image sequences and/or feature streams, derived from the acquired images, so as to fulfill the need for automatic image detectilm capabilities in a large-scale digital video network environment. MCIP technology combines diverse toehnologies such as large scale data reduction, effective server design and optimized image processing algorithms, thereby offering a platfrarm that is mainly directed to the security market and is not rivaled by conventional solutions, particularly with vast numbers of potential usaes. MCIP is a networked solution for event detection in distributed instalations, which is designed for large scale digital video surveillance networks that concurrently support thousands of camera inputs, distributed in an arbitrarily large geographical area and with real time performance. MCIP employs a unique feature transmission method that consumes narrow bandwidth, while maintaining high sensitivity and probability of detection. MCIP is a server-based solution that is compatible with modern monitoring and digital video recording systems and catties out complex detection algorithms, reduces field maintenance and provides improved scalability, high availability, low cost per channel and backup utilities. The same system provides concurrently multiple applications such as VMD, LPE and FR. In addition, different detedtion applications may be associated with the same camera. MOP is composed of a server plation with various applications, camera encoders (either internal or external to the camera}, a Network Video Recorder (NVE) and an operator station. The server contains a computer that includes proprietary hardware and software components. MCIP is based on the distribution of image processing algorithms between low-level feature extraction, which is perfnmed by the encoders which are located in field (Le., in the vicinity of camera), and high-level processing applications, which are performed by a remote central server that collects and analyzes these features. The MCIP system described hereafter solves not only the bandwidth problem but also reduces the load from the server and uses a unique type of data stream (not a digital video stream), and performs an effective process for detecting events at real thine in a large scale video surveillance environment. A major element in MCIP is data reduction, which is achieved by the distribution of the image proceasiag algorithms. Since all the video sources, which require event defection, transmit concurrently, the required network bandwidth is reduced by generating a reduced bandwidth feature stream in the vicinity of each camera. In order to detect and track moving objects in digitally transmitted video sources by analyzing the transmitted reduced bandwidth feature, there is no need to transmit full video streams, bust only partial data, which contains information regarding moving objests. By doing so, a significantly smaller data bandwidth is used, which reduces the demands for both the network bandwidth and the event detection processing power. Furthe if only the shape, size, direction of movement and velocity should there is no need to transmit data regarding their intensity or aelor, and thus, a further bandwidth reduction is achieved. Another bandeidth optimization may be achieved if the encoder in the transmitting the filters out all motions which are under a motion threshold, determined by the remote central server. Such threshold may be the AC level of a thoving object, motion distance or any combination thereof, and may be and changed dynamically, according to the attributes of the image, such as resolution, AOI, compression level, etc. Moving object with are under the threshold are considered either as noise, or non-interating motions. One method for extracting featume at the encoder side is by slightly modifying and degrading existing based video compressors which were originally designed to video. the features may also be generated by a specific featurse algorithm (such as any motion vector generating algorithre is not related to the video compression algorithm. When working in this reduced bandwidth mode, the output streams of these enoodemare definitely not a video stream, and therefore cannot not be used by any receiving party to produce video images. Pig. 1 schematically illustrates the atueture of a surveillance system that comprises a plurality of cameras connected to a data network, according to a preferred embodiment of the invention. The system 100 comprises n image sources (in this example, n cameras, CAMl,....,CAMn), each of which connected to a digital encoder BNCi, for converting the images acquired by CAMj to a compressed digital format. Each digital encoder ENCj is connected to a digital data network 101 at point pj and being capable of transmitting data, which any de a reduced bandwidth feature stream or a full compressed video stream, through its corresponding channel Cj. The data network 101 ecallects the data transmitted from all channels and forwards them to the MEOIP server 102, through data-bus 103. MCIP server 102 processes the date received from each channel and controls one or more cameras while transmit any combination of the reduced bandwidth feature stream and the full compressed video stream, which can be analyzed by MCIP sever 102 in real time, or recorded by NVR104 and analyzed by MCIP server M)2 later. An operator station 105 is also connected to MCIP server 10% for real time monitoring of selected full compressed video streams. Operator station 105 can manually control the operation of MCIP server 102, whenever desired. The MCD? (Massively Concurrent Image Processing) server is connected to the image sources (depicted as cameas in the drawing, but may also be any image source, such taped video, still cameras, video cameras, computer generated images or graphics, and the like.) through data-bus 103 and network 101, and receives features or images in a compressed format. In the broadest sense the is any type of network, wired or wireless. The images can be comprssed using any type of compression. Practically, IP based networks are used as well as compression schemes that use DCT, VideoLAN Client VLE which is a highly portable multimedia player for various audio and video formats as well as Digital Versatile Discs (DVDs), Video Compact Discs (VCDs), and various streaming protocols, disclosed in WO 01/63937) and morion estimation techniques such as MPEG. The system 100 uses an optional lose-halncing module that allows it to easily scale the number of inputs that can be processed and also creates the ability to remove a single point of failure, by creating backup MCIP servers. The system 100 also has a -configuration component that is used, for defining the type of processing that ehould be performed for each input and the destination of the proceeaing results. The destination can be another computer, an email addreese a monitoring application, or any other device that is able to receive textual and/or -visual messages. The system can optionally be connected to an external database to assist image processing. For example, a database of suspect, stolen cars, of license plate numbers can be used for identifying vehicles. Fig. 2 illustrates the use of AQFs (Area of Interest) for reducing the usage of system resources, according to a preferred embodiment of the invention. An AOI is a polygon (in this Fig., an nexagon) that encloses the area -where detection vnS. occur. The reetangleacate the estimated object size at various distances from the camera. In this example, the scene of interest comprises detection movement of a iperaon in a field (shown in the first Tectangle). It may be used in the filtering unit to decide if further processing is required. In this case, the filtering unit examines the feature data. The feature stream is analysed to determine if enough significant features lie within the AOI. If the number of features that are located inside the AOI and comprise changes, exceeds the threshold, then this frame is designated as possibly changes an event and is transferred for further processing. Otherwise, the frame is dropped and no further processing is performed. The MCIP server receives the redueed bandwidth dwidth feature stream (such a feature stream is not a video streamnt ell, and hence; no viewable image can be reconstructed thereof) from form the video sources which require event detection. When an event is dbtected within a reduced bandwidth stream that is transmitted from a specific video source, the central server may instruct this video source to ehange its operation mode to a video stream mode, in which that vido seurcji may operate as a regular video encoder and transmits a standard vido stream, which may be decoded by the server or by any receiving pafrty for observation, recording, further processing or any other purpose. Optionally the video encoder also continues transmitting the feature stream at the same time. Working according to this scheme, meat of the video sources remain in the reduced bandwidth mode, while tranamitting a narrow bandwidth data stream, yet sufficient to detect eventar with high resolution and frame rate at the MOP server. Only a very srall portion of the sources (in which event is detected) are controlled to work ncurrently in the video stream mode. This results in a total network bandwidth, which is significantly lower than the network bandwidth required for concurrenily transmitting from all the video sources. For example, if a conventional videf surveillance installation that uses 1000 cameras, a bandwidth of about 500Kbp/s is needed by each camera, in order to transmit at an adequate qutility. In the reduced bandwidth mode, only about 5Kbp/s is requireiitiy each camera for the transmission of information regarding moving ohjaets at the same resolution and frame rate. Therefore, all the cameras working in this mode are using a total bandwidth of SKbp/s times 1000 = 5Mp/s. Assuming that at steady state suspected objects appear in 1% of the cameras (10 cameras) and they are working in video stream mode, extra bandwidth of 10 times 500Kbp/s = 5Mbp/s is required. Thus, the total required etwork bandwidth using the solution proposed by the present inention 10Mbp/s. A total required network bandwidth of 500Mbp/s wold be consumed by conventional systems, if all the 1000 cameras would concurrently transmit video streams. The proposed solution may be appleble not only for high-level moving objects detection and tracking in live cameras but also in recorded video. Huge amounts of video footage are retarded by many surveillance systems. In order to detect interesting events in this recorded video, massive processing capabilities are needed. By converting recorded video, either digital or analog, to a reduced bandwidth stream according to the techniques described above, event detestion becomes much easier, with lower processing requirements and fastes operation. The system proposed in the present invention comprises the following components: 1. One or more MC5P servers 2. One or more dual mode video encoders, which may be operated at reduced bandwidth mode or at vido stream mode, according to remote instructions. 3. Digital network, LAN or WAN, IP on other, which establishes communication between the system components. 4. One or more operator stations, by which operators may define events criteria and other system parameters ani manage events in real time. 5. An optional Network Video Recerder (NVR), which is able to record and play, on demand, any selected video source which is available on the network. Implementation for security applications Following is a partial list of types of image processing applications which, can be implemented very effectively using the method proposed by the present invention: Video Motion Detection — for both indoor and outdoor applications. Such application is commonly used to detect intruders to protected zones. It is desired to ignore nuisances such as anoving trees, dust and animals. In this embodiment of the present invention manipulates input images at the stream level in order to filter out certain images and image changes. Examples of such filtering ate motien below a predetermined threshold, size or speed related filtering all preferably applied within the AOIs, thus reducing significantly the amount of equired system resources for further processing. Since the system is sawer-based and there is no need for installation of equipment in the field (except the camera), this solution is very attractive for low budget application such as in the residential market. Exceptional static objects detection this application is used to detect static objects where such objects may require an alarm, By way of example, such objects may comprise-an unattended bag at the airport, a stopped car on a highway, a person stopped at a protected location and the like. In this embodiment the presaprt invention manipulates the input images at the stream level and exammes the motion vectors at the AOIs. Objects that stopped moving are further processed. License Plate Recognition, - this application is used for vehicles access control, stolen or suspected car detection and parking automation, lit this embodiment, it is possible to detect wanted cars using hundreds or more cameras installed in the field, thus providing a practical detection solution. Facial Becoenition - tins application is desired for biometric verification or detection device, for tasks such as locating criminals or terrorists and for personal access control purposes. Using this embodiment offers fecial recognition capability to many cameras the field. This is a very useful tool for large installations such as ainpots and public surveillance. Smoke and flames detection - this application is used for fire detection. Using this embodiment of the invintion, all the sites equipped with cameras may receive this service in addition to other application without any installation of smoke or flame detectors. Traffic violations - this application detect variety of traffic violation such as red light crossing, separation libe crossing, parking or stopping at forbidden zone and the like. Using this embodiment, this functionality may be applied for many cameras located along roads and intersections, thus significantly optimizing police work. Traffic flow analysis - this application is useful for traffic centers by automatically detecting any irregular traffic events such as traffic obstacles, accidents, too slow or too fajsfc or too crowded traffic and the like. Using this embodiment, traffic centeiB may use many cameras located as desired at the covered area in order to provide a significantly better control leveL Suspicions vehicle or person tracking - this application is used to track objects of interest. This is needed to link a burglar to an escape car, locate a running suspect and more. Using this embodiment, this functionality may be associated with any selected camera or cameras in the field. It should be noted that each of those applications or their combination may each be considered as a separate embodiment of the invention, all while using the basic structure contemplated herein, while specific it embodiments may utilize specialized components. Selection of such component and the combination of features and applications provided herein is a matter of technical choice that will be clear to those skilled in the art. Pigs. 3A to 3C illustrate the generation of an object of interest and its motion trace, according to a preferred embodiment of the invention. Fig. 3A is an image of a selected AOI (in this example, an elongated zone, in which the presence of any person is forbidden), on which the MCIP server 102 generates an object, which is determined according to predefined size and motion parameters, received from the corresponding encoder. The object encompasses the body of a parson, penetrating into the forbidden zone and walking from right to left. The motion parameters are continuously updated, such that the center of the object is tracked. The MCIP server 102 generates a trace (solid line) that provides a graphical indication regarding his motion within the forbidden zone. Pig. 38 is an image of the same selected AOI, on which the MCIP server 102 generates the object and the trace (solid line) that provides a graphical indication regarding his motion within the forbidden zone from left to right and more closely to the camera. Fig. 3C is an image of the same selected AOI, on which the MCIP server 102 generates the object and the trace (solid line) that provides a graphical indication regarding his motion within the forbidden zone again from right to left more closely to the camera. The filtration performed by the correspespoeding encoder prevents the generation of background movements, such as tree-tops and lower vegetation, which are considered as background noise. The above examples and description haveof course been provided only for the purpose of illustration, and are not Intended to limit the invention in any way. As will be appreciated by the skilled person, the invention can be carried out in a great variety of ways employing more than one technique from those described above, all without exceeding the scope of the invention. WE CLAIM: 1. A method for performing event detection and object tracking in image streams, comprising: a) installing a set of image acquisition devices (CAM 1 - CAM n), each in field where an event is to be detected, each comprising a local image processor programmed to convert at least one image stream into a digital image format; b) connecting each image acquisition device to a data network (101) through a corresponding data communication channel; c) installing and connecting a central linage processing server (102) to said data network (101); the method being characterized by: d) providing an encoder (Encoder 1 - Encoder n) to each image acquisition device for analyzing said digital image stream so as to generate features, the features being parameters related to attributes of objects in said image stream; and transmitting feature stream to the central server, whenever said features exceed a predetermined threshold; e) whenever said central image processing server (102) receives features from the local encoder (Encoder 1 - Encoder n) through its corresponding data communication channel and said data network (101), obtaining indications regarding events in said image streams by processing, by the server (102), said feature stream, and transmitting said indications to an operator. 2. The method as claimed in Claim 1, wherein the local encoder is a composite encoder, being the local encoder that comprises circuitry for compressing the image stream, said composite encoder being capable of .operating in a first mode, during which it generates and transmits the features to the server, and in a second mode, during which it transmits to said server, in addition to said features, at least a portion of said image stream in a desired compression level, according to commands sent from said server. 3. The method as claimed in Claim 2, comprising, controlling each composite encoder, by a command sent from said sepver (102), to operate in its first mode ; as long as the server receives features from a composite encoder : (a) controlling that composite encoder , by a command sent from said server, to operate in its second mode ; and (b) obtaining indications regarding events in said image streams by processing, by said server, said feature stream, and transmitting said indications and/or their corresponding image streams to an operator. 4. The method as claimed in Claims 1 or 2, comprising decoding one or more compressed image streams containing events by said server, and transmitting the decoded image streams to the display of in operator, for viewing. 5. The method as claimed in Claims 1 or 2, comprising recording one or more compressed image streams obtained while their local encoder operates in its second mode. 6. The method as claimed in Claims 1 or 2, comprising dynamically allocating additional image processing resources, in the server, to data communication channels that receive image streams. 7. The method as claimed in Claims 1 or 2, wherein one or more feature streams obtained while operating in the first mode, comprises only a portion of the image. 8. The method as claimed in Claim 6, comprising generating and displaying a graphical polygon that encompasses an object of interest, being within the frame of an image or an area of interest in said image. 9. The method as claimed in Claim 8, comprising generating and displaying a graphical trace indicating the history df movement of an object of interest, being within the frame of an image or an area of interest in said image. 10. The method as claimed in Claims 1 or 2, wherein the image stream is selected from the group of images that comprises video streams, still images, computer generated images, pre-recorded digital or analog video data, and video streams compressed using MPEG format. 11. The method as claimed in Claims 1 or 2, wherein during each mode, the encoder uses different resolution and frame rale. 12. The method as claimed in Claims 1 or 2, wherein the features are selected from the following group: - motion features; - color; - portion of the image, - edge data; and - frequency related information. 13. The method as claimed in Claim 1 or 2, comprising performing, by the server, one or more of the following operations and/or any combination thereof: - License Plate Recognition (LPR); - Facial Recognition (FR); - detection of traffic rules violations ; - behavior recognition; - fire detection; - traffic flow detection; - smoke detection, using a feature stream, received from the local encoder of at least one image acquisition device, through its data communication channel. 14. A system for performing the method of event detection and object tracking in image streams as claimed in claim 1, comprising: (a) a set of image acquisition devices, installed in field, each of which comprises: - a.l) a local programmable processor for concerting the acquired image stream, to a digital format a.2) a local encoder, for generating, from said image stream, features, being parameters related to attributes of objects in said image stream, and for transmitting a feature stream, whenever said motion features exceed a corresponding threshold ; (b) a data network, to which each image acquisition device is connected through a corresponding data communication channel; and (c) an image processing server connected to said data network, said server being capable of determining said threshold, of obtaining indications regarding events in said image streams by processing said feature stream, and of transmitting said indications to an operator. 15. The system as claimed in Claim 14, in which the local encoder is a composite encoder, being the local encoder that comprises circuitry for compressing the image stream, said composite encoder being capable of operating in a first mode, during which it generates and transmits the features to the server, and in a second mode, during which it transmits to said server, in addition to said features, at least a portion of said image stream in a desired compression level aeqording to commands sent from said server. 16. The system as claimed in Claims 14 or 15, comprising an operator display, for receiving one or more image streams that are decoded by the server and contain events. 17. The system as claimed in Claims 14 or 15, comprising a network video recorder for recording one or more image streams, obtained while their local encoder operates in its second mode. 18. The system as claimed in Clainis 14 Or 15, in which the server is capable of dynamically allocating additional image processing resources to data communication channels that receive image streams. 19. The system as claimed in Claims 14 or 15, in which one or more image streams obtained while operating in the first mode, comprises only a portion of the image that corresponds to a desired area of interest (AOI)- 20. The system as claimed in Claims 14 or 15, in which the server comprises processing means for generating and displaying a graphical polygon that encompasses an object of interest, being within the frame of an image or an area of interest in said image. 21. The system as claimed in Gain) 20, in which the server comprises processing means for generating and displaying a graphical trace indicating the history of movement of an object of interest, being within the frame of an image or an area of interest in said image. 22. A method for performing event detection and object tracking in image streams, substantially as herein described with reference to the accompanying drawings. 23. A system for performing event detection and object tracking in image streams, substantially as herein described with reference to the accompanying drawings.

Full Text

The present invention relates to a method and a system for performing event detection and object tracking in image streams.
Field of the Invention
The present invention relates to the field of video processing. More particularly, the invention relates to a method and system for obtaining meaningful knowledge, in real time, from a plurality of concurrent compressed image sequences, by effective processing of a large number of concurrent incoming image sequences and/or features derived from the acquired images.
Background of the Invention
Many efforts have been spent to improve the ability to extract meaningful data out of images captured by video and still cameras. Such abilities are being used in several applications, such as consumer, industrial, medical, and business applications. Many cameras are deplotyed in the streets, airports, schools, banks, offices, residencies- as standard security measures. These cameras are used either for allowing an operator to remotely view security events in real time, or for recording and analyzing a security event at some later time.
The introduction of new technologies is shifting the video surveillance industry into new directions that, significantly enhance the functionality of such systems. Several processing algorithms are used both for real-time and offline applications. These algorithms are implemented on a range of platforms from pure software to pure hardware, depending on the application. However, these platforms are usually designed to simultaneously process a relatively small number of incoming image
sequences, due to the substantial computational resources required for image processing. In addition, most of the common image processing systems are designed to process only uncompressed image data, such as the system disclosed in U.S. Patennt6,188,381. Modern networked video environments require efficient processing capability of a large number of compressed video steams, collected form plurality of image sources.
Increasing operational demands, as well as cost constrains created the need, for automation of event detection. Such event detection solutions provide a higher detection level, save manpower, replace other types of sensors and lower false alarm rates.
Although conventional solutions are available for automatic intruder detection, license plate identification, facial recognition, traffic violations detection and other image based applications , they usually support few simultaneous video sources, using erpeasive hardware platforms that require field installation, which implies high installation, maintenance and upgrade costs.
Conventional surveillance systems employ digital video networking technology and automatic event detection. Digital video networking is implemented by the development of Digital Video Compression technology and the availability of IP based, networks. Compression standards, such as MPEG-4 and similar formats allow traaasmfeting high quality images with a relatively narrow bandwidth.
A major limiting factor when using digital video networking is bandwidth requirements. Because it is too expensive to transmit all the cameras all
the time, networks ace designed to conecarrently transmit data, only from few cameras. The transmission of data only from cameras that are capturing important events at any given moment is crucial for establishing an efficient and cost-effective digital video network.
Automatic video-based event detection technology becomes effective lor this purpose. This technology consists of a series of algorithms that are able to analyze the camera image in peel time and provide notification of a special event, if it occurs. Currently available event-detection solutions use conventional image processing methods, which require heavy processing resources. Furthermore, they allocate a fixed processing power (usually one processor) per each camera inpnt Therefore, such systems either provide poor performance due to reconecees limitation or are extremely expensive.
As a result, the needs of large digital surveillance installations -namely, reliable detection, effective bandwidth usage, flexible event definition, large-scale design and cost cannot be met by any of the current automatic event detection solutions.
Video Motion Detection (VMD) methoda are disclosed, for example, in U.S. Patent 6,349,114, WO 02/37429, in U. S. Patent Application Publication 2002,041,626, in U.S. Patent Application Publication No. 2002,054,210, in WO 01/63937, in EP1107609, in EP3020, in U.S. Patent 6,384,862 , in U.S. Patent 6,188,381, in U.S. patant 6,130,707, and in U.S. Patent 6,069,655. However, all the methods dascrlibed above have not yet provided satisfactory solutions to the problem of effectively obtaining meaningful knowledge, in real time, from a plurality of concurrent image sequences.
It is an object of the present invention to provide a method and system for obtaining meaningful knowledge, from a plurality of concurrent image sequences, in real time.
It is another object of the present invention to provide a method and system for obtaining meaningful knowledge from a plurality of concurrent image sequences, which are cost effective
It is a further object of the present invention to provide a method and system for obtaining meaningful knowledge,, from a plurality of concurrent image sequences, with reduced amount of bandwidth resources.
It is still another object of the present inyention to provide a method and system for obtaining meaningful knowledge, from a plurality of concurrent image sequences, which is rehiable having high sensitivity in noisy environments.
It is yet another object of the presentinvention to provide a method and system for obtaining meaningful knowledge, from a plurality of concurrent image sequences, with.reduced installstion on and maintenance costs.
Other objects and advantages of the intention will become apparent as the description proceeds.
Summary of the Invention
While these specifications discuss primarily video cameras, a person skilled in the art will recognize that the invention extends to any appropriate image source, such as stiall cameras, computer generated images, prerecorded video data, and ihe ;like, and that image sources
should he equivalently considered. Similarly, the terms video and video stream, should be construed broadly to include video sequences, still pictures, computer generated graphics, or any other sequence of images provided or converted to an electronic format that may be processed by a computer.
The present invention is directed to a method for performing event detection and object tracking in image streams. A set of image acquisition devices is installed in field, such that each device comprises a local programmable processor for converting the acquired image stream, that consists of one or more images, to a digital format, and a local encoder, for generating features from the image stream. The features are parameters that are related to attributes of objects in the image stream. Each device transmits a feature stream, whenever the number and type of features exceed a corresponding threshold. Each image acquisition device is connected to a data network through • corresponding data communication channel. An image processing server connected to the data network determines the threshold and processes the feature stream. Whenever the server receives features from a local encoder through its corresponding data communication channel and the data network, the server obtains indications regarding events in the image streams by processing the feature stream and transmitting the indications to an operator.
The local encoder may be a composite encoder, which is a local encoder that further comprises circuitry for compressing the image stream. The composite encoder may operate in a first mode, during which it generates and transmits the features to the server, and in a second mode, during which it transmits to the server, in addition to the features, at least a portion of the image stream in a desired compression level, according to
commands sent from the server. Preferably, each composite encoder is controlled by a command sent from the server, to operate in its first mode. As long as the server receives feataree from a composite encoder, that composite encoder is controlled by a command sent from the server, to operate in its second mode. The server obtains indications regarding events in the image streams by processing the feature stream, and transmitting the indications and/or their corresponding image streams to an operator.
Whenever desired one or more compressed image streams containing events are decoded by the operator station, and the decoded image streams are transmitted to the display of an operator, for viewing. Compressed image streams obtained while their local encoder operates in its second mode may be recorded.
Preferably, additional image procaaaing resources, in the server, are dynamically allocated to data commanication channels that receive image streams. Feature streams obtained while operating in the first mode may comprise only a portion of the image.
A graphical polygon that encompa an object of interest, being within the frame of an image or an AOI (Area Of Interest) in the image may be generated by the server and displayed to an operator for viewing. In addition, the server may generate and display a graphical trace indicating the history of movement of an object of interest, being within the frame of an image or an AOI in the image.
The image stream may be selected from the group of images that comprises video streams, still images, computer generated images, and pre-recorded digital, analog video data, or video streams, compressed
using MPEG format. The encoder mey use different resolution and frame rate during operation in each mode.
Preferably, the features may includemotion features, color, portions of the image, edge data and freaquency re;ated information.
The server may perform, using a feature stream, received from the local encoder of at least one image acquisition device, one or more of the following operations and/or any combination thereof:
- License Plate Recognition (LPE); • Facial Recognition (FR);
- detection of traffic rules violations;
- behavior recognition;
- fire detection;
- traffic flow detection;
- smoke detection.
The present invention is also directed to a system for performing event detection and object tracking in image streams, that comprises:
a) a set of image acquisition devisee, installed in field, each of which
includes:
a.1) a local programmable processor for converting the acquired image stream, to a digital format
a.2) a local encoder, for generating, from the image stream, features, being parameters related to attributes of objects in the image stream, and for transmitting % feature stream, whenever the motion features exceed a corresponding threshold;
b) a data network, to which each image acquisition device is connected
through a corresponding data commmanication channel;
c); and
d) an image processing server connected to the data network, the server being capable of
determining the threshold, of obtaining indications regarding events in the image streams
by processing the feature stream, and of transmitting the indications to an operator.
The system may further comprise an operator display, for receiving and displaying one or
more image streams that contain events, as well as a network video recorder for recording
one or more image streams, obtained while their local encoder operates in its first mode.
Statement of the Invention
Accordingly, the present invention relates to a method for performing event detection and object tracking in image streams, comprising; a) installing a set of image acquisition devices (CAM 1 - CAM n), each in field where an event is to be detected, each comprising a local image processor programmed to convert at least one image stream into a digital image format; b) connecting each image acquisition device to a data network (101) through a corresponding data communication channel; c) installing and connecting a central image processing server (102) to said data network (101); the method being characterized by: d) providing an encoder (Encoder 1 - Encoder n) to each image acquisition device for analyzing said digital image stream so as to generate features, the features being parameters related to attributes of objects in said image stream; and transmitting feature stream to the central server, whenever said features exceed a predetermined threshold; e) whenever said central image processing server (102) receives features from the local encoder (Encoder 1 - Encoder n) through its corresponding data communication channel and said data network (101), obtaining indications regarding events in said image streams by processing, by the server (102), said feature stream, and transmitting said indications to an operator,
A system for performing event detection and object tracking in image streams, comprising: (a) a set of image acquisition devices, installed in field, each of which includes : a.l) a local programmable processor for concerting the acquired image stream, to a digital format a.2) a local encoder, for generating, from said image stream, features, being parameters related to attributes of objects in said image stream, and for transmitting
a feature stream, whenever said motion features exceed a corresponding threshold ; (b) a data network, to which each image acquisition device is connected through a corresponding data communication channel ; and (c) an image processing server connected to said data network, said server being capable of determining said threshold, of obtaining indications regarding events in said image streams by processing said feature stream, and of transmitting said indications to an Operator..
Brief Description of the Drawings
The above and other characteristics and advantages of the invention will be better understood through the following illustrative and non-limitative detailed description of preferred embodiments thereof, with reference to the appended drawings, wherein
Fig. 1 schematically illustrates the structure of a surveillance system that comprises a plurality of cameras connected to a data network, according to a preferred embodiment of the invention;
Fig. 2 illustrates the use of AOTs (Area of Interest) for designating areas where event detection will be performed and for reducing the usage of system resources, according to a preferred embodiment of the invention; and
Fig 3 A to 3 C illustrate the generation of an object of interest and its motion trace, according to a preferred embodiment of the invention.
Detail Description of Preferred Embodiments
A significant saving in system resources can be achieved by applying novel data reduction techniques, proposed by the present invention. In a

situation where thousands of cameras are connected to a single server, only a small number of the cameras actually acquire important events that should be analyzed. A large-bcale system can function properly only if it has the capability of identifying the inputs that may contain useful information and perform further processing only on such inputs. Such a filtering mechanism requires miaimal processing and bandwidth resources, so that it is possible to apply it concurrently on a large number of image streams. The present uveation proposes such a filtering mechanism, called Massively Connnrrent image Processing (MCIP) technology that is based on the analysie of incoming image sequences and/or feature streams, derived from the acquired images, so as to fulfill the need for automatic image detectilm capabilities in a large-scale digital video network environment.
MCIP technology combines diverse toehnologies such as large scale data reduction, effective server design and optimized image processing algorithms, thereby offering a platfrarm that is mainly directed to the security market and is not rivaled by conventional solutions, particularly with vast numbers of potential usaes. MCIP is a networked solution for event detection in distributed instalations, which is designed for large scale digital video surveillance networks that concurrently support thousands of camera inputs, distributed in an arbitrarily large geographical area and with real time performance. MCIP employs a unique feature transmission method that consumes narrow bandwidth, while maintaining high sensitivity and probability of detection. MCIP is a server-based solution that is compatible with modern monitoring and digital video recording systems and catties out complex detection algorithms, reduces field maintenance and provides improved scalability, high availability, low cost per channel and backup utilities. The same
system provides concurrently multiple applications such as VMD, LPE and FR. In addition, different detedtion applications may be associated with the same camera.
MOP is composed of a server plation with various applications, camera encoders (either internal or external to the camera}, a Network Video Recorder (NVE) and an operator station. The server contains a computer that includes proprietary hardware and software components. MCIP is based on the distribution of image processing algorithms between low-level feature extraction, which is perfnmed by the encoders which are located in field (Le., in the vicinity of camera), and high-level processing applications, which are performed by a remote central server that collects and analyzes these features.
The MCIP system described hereafter solves not only the bandwidth problem but also reduces the load from the server and uses a unique type of data stream (not a digital video stream), and performs an effective process for detecting events at real thine in a large scale video surveillance environment.
A major element in MCIP is data reduction, which is achieved by the distribution of the image proceasiag algorithms. Since all the video sources, which require event defection, transmit concurrently, the required network bandwidth is reduced by generating a reduced bandwidth feature stream in the vicinity of each camera. In order to detect and track moving objects in digitally transmitted video sources by analyzing the transmitted reduced bandwidth feature, there is no need to
transmit full video streams, bust only partial data, which contains information regarding moving objests.
By doing so, a significantly smaller data bandwidth is used, which reduces the demands for both the network bandwidth and the event detection processing power. Furthe if only the shape, size, direction of movement and velocity should there is no need to transmit data regarding their intensity or aelor, and thus, a further bandwidth reduction is achieved. Another bandeidth optimization may be achieved if the encoder in the transmitting the filters out all motions which are under a motion threshold, determined by the remote central server. Such threshold may be the AC level of a thoving object, motion distance or any combination thereof, and may be and changed dynamically, according to the attributes of the image, such as resolution, AOI, compression level, etc. Moving object with are under the threshold are considered either as noise, or non-interating motions.
One method for extracting featume at the encoder side is by slightly modifying and degrading existing based video compressors which were originally designed to video. the features may also be generated by a specific featurse algorithm (such as any motion vector generating algorithre is not related to the video compression algorithm. When working in this reduced bandwidth mode, the output streams of these enoodemare definitely not a video stream, and therefore cannot not be used by any receiving party to produce video images.
Pig. 1 schematically illustrates the atueture of a surveillance system that comprises a plurality of cameras connected to a data network, according to a preferred embodiment of the invention. The system 100 comprises n
image sources (in this example, n cameras, CAMl,....,CAMn), each of which connected to a digital encoder BNCi, for converting the images acquired by CAMj to a compressed digital format. Each digital encoder ENCj is connected to a digital data network 101 at point pj and being capable of transmitting data, which any de a reduced bandwidth feature stream or a full compressed video stream, through its corresponding channel Cj. The data network 101 ecallects the data transmitted from all channels and forwards them to the MEOIP server 102, through data-bus 103. MCIP server 102 processes the date received from each channel and controls one or more cameras while transmit any combination of the reduced bandwidth feature stream and the full compressed video stream, which can be analyzed by MCIP sever 102 in real time, or recorded by NVR104 and analyzed by MCIP server M)2 later. An operator station 105 is also connected to MCIP server 10% for real time monitoring of selected full compressed video streams. Operator station 105 can manually control the operation of MCIP server 102, whenever desired.
The MCD? (Massively Concurrent Image Processing) server is connected to the image sources (depicted as cameas in the drawing, but may also be any image source, such taped video, still cameras, video cameras, computer generated images or graphics, and the like.) through data-bus 103 and network 101, and receives features or images in a compressed format. In the broadest sense the is any type of network, wired or wireless. The images can be comprssed using any type of compression. Practically, IP based networks are used as well as compression schemes that use DCT, VideoLAN Client VLE which is a highly portable multimedia player for various audio and

video formats as well as Digital Versatile Discs (DVDs), Video Compact Discs (VCDs), and various streaming protocols, disclosed in WO 01/63937) and morion estimation techniques such as MPEG.

The system 100 uses an optional lose-halncing module that allows it to easily scale the number of inputs that can be processed and also creates the ability to remove a single point of failure, by creating backup MCIP servers. The system 100 also has a -configuration component that is used, for defining the type of processing that ehould be performed for each input and the destination of the proceeaing results. The destination can be another computer, an email addreese a monitoring application, or any other device that is able to receive textual and/or -visual messages.
The system can optionally be connected to an external database to assist image processing. For example, a database of suspect, stolen cars, of license plate numbers can be used for identifying vehicles.
Fig. 2 illustrates the use of AQFs (Area of Interest) for reducing the usage of system resources, according to a preferred embodiment of the invention. An AOI is a polygon (in this Fig., an nexagon) that encloses the area -where detection vnS. occur. The reetangleacate the estimated object size at various distances from the camera. In this example, the scene of interest comprises detection movement of a iperaon in a field (shown in the first Tectangle). It may be used in the filtering unit to decide if further processing is required. In this case, the filtering unit examines the feature data. The feature stream is analysed to determine if enough significant features lie within the AOI. If the number of features that are located inside the AOI and comprise changes, exceeds the threshold, then this frame is designated as possibly changes an event and is transferred for further processing. Otherwise, the frame is dropped and no further processing is performed.
The MCIP server receives the redueed bandwidth dwidth feature stream (such a feature stream is not a video streamnt ell, and hence; no viewable image can be reconstructed thereof) from form the video sources which require event detection. When an event is dbtected within a reduced bandwidth stream that is transmitted from a specific video source, the central server may instruct this video source to ehange its operation mode to a video stream mode, in which that vido seurcji may operate as a regular video encoder and transmits a standard vido stream, which may be decoded by the server or by any receiving pafrty for observation, recording, further processing or any other purpose. Optionally the video encoder also continues transmitting the feature stream at the same time.
Working according to this scheme, meat of the video sources remain in the reduced bandwidth mode, while tranamitting a narrow bandwidth data stream, yet sufficient to detect eventar with high resolution and frame rate at the MOP server. Only a very srall portion of the sources (in which event is detected) are controlled to work ncurrently in the video stream mode. This results in a total network bandwidth, which is significantly lower than the network bandwidth required for concurrenily transmitting from all the video sources.
For example, if a conventional videf surveillance installation that uses 1000 cameras, a bandwidth of about 500Kbp/s is needed by each camera, in order to transmit at an adequate qutility. In the reduced bandwidth mode, only about 5Kbp/s is requireiitiy each camera for the transmission of information regarding moving ohjaets at the same resolution and frame rate. Therefore, all the cameras working in this mode are using a total bandwidth of SKbp/s times 1000 = 5Mp/s. Assuming that at steady state
suspected objects appear in 1% of the cameras (10 cameras) and they are working in video stream mode, extra bandwidth of 10 times 500Kbp/s = 5Mbp/s is required. Thus, the total required etwork bandwidth using the solution proposed by the present inention 10Mbp/s. A total required network bandwidth of 500Mbp/s wold be consumed by conventional systems, if all the 1000 cameras would concurrently transmit video streams.
The proposed solution may be appleble not only for high-level moving objects detection and tracking in live cameras but also in recorded video. Huge amounts of video footage are retarded by many surveillance systems. In order to detect interesting events in this recorded video, massive processing capabilities are needed. By converting recorded video, either digital or analog, to a reduced bandwidth stream according to the techniques described above, event detestion becomes much easier, with lower processing requirements and fastes operation.
The system proposed in the present invention comprises the following components:
1. One or more MC5P servers
2. One or more dual mode video encoders, which may be operated at reduced bandwidth mode or at vido stream mode, according to remote instructions.
3. Digital network, LAN or WAN, IP on other, which establishes communication between the system components.
4. One or more operator stations, by which operators may define events criteria and other system parameters ani manage events in real time.
5. An optional Network Video Recerder (NVR), which is able to record and play, on demand, any selected video source which is available on the network.
Implementation for security applications
Following is a partial list of types of image processing applications which, can be implemented very effectively using the method proposed by the present invention:
Video Motion Detection — for both indoor and outdoor applications. Such application is commonly used to detect intruders to protected zones. It is desired to ignore nuisances such as anoving trees, dust and animals. In this embodiment of the present invention manipulates input images at the stream level in order to filter out certain images and image changes. Examples of such filtering ate motien below a predetermined threshold, size or speed related filtering all preferably applied within the AOIs, thus reducing significantly the amount of equired system resources for further processing. Since the system is sawer-based and there is no need for installation of equipment in the field (except the camera), this solution is very attractive for low budget application such as in the residential market.
Exceptional static objects detection this application is used to detect static objects where such objects may require an alarm, By way of example, such objects may comprise-an unattended bag at the airport, a stopped car on a highway, a person stopped at a protected location and the like. In this embodiment the presaprt invention manipulates the input images at the stream level and exammes the motion vectors at the AOIs. Objects that stopped moving are further processed.
License Plate Recognition, - this application is used for vehicles access control, stolen or suspected car detection and parking automation, lit this embodiment, it is possible to detect wanted cars using hundreds or more cameras installed in the field, thus providing a practical detection solution.
Facial Becoenition - tins application is desired for biometric verification or detection device, for tasks such as locating criminals or terrorists and for personal access control purposes. Using this embodiment offers fecial recognition capability to many cameras the field. This is a very useful tool for large installations such as ainpots and public surveillance.
Smoke and flames detection - this application is used for fire detection. Using this embodiment of the invintion, all the sites equipped with cameras may receive this service in addition to other application without any installation of smoke or flame detectors.
Traffic violations - this application detect variety of traffic violation such as red light crossing, separation libe crossing, parking or stopping at forbidden zone and the like. Using this embodiment, this functionality may be applied for many cameras located along roads and intersections, thus significantly optimizing police work.
Traffic flow analysis - this application is useful for traffic centers by automatically detecting any irregular traffic events such as traffic obstacles, accidents, too slow or too fajsfc or too crowded traffic and the like. Using this embodiment, traffic centeiB may use many cameras located as desired at the covered area in order to provide a significantly better control leveL
Suspicions vehicle or person tracking - this application is used to track objects of interest. This is needed to link a burglar to an escape car, locate a running suspect and more. Using this embodiment, this functionality may be associated with any selected camera or cameras in the field.
It should be noted that each of those applications or their combination
may each be considered as a separate embodiment of the invention, all
while using the basic structure contemplated herein, while specific
it embodiments may utilize specialized components. Selection of such
component and the combination of features and applications provided
herein is a matter of technical choice that will be clear to those skilled in
the art.
Pigs. 3A to 3C illustrate the generation of an object of interest and its motion trace, according to a preferred embodiment of the invention. Fig. 3A is an image of a selected AOI (in this example, an elongated zone, in which the presence of any person is forbidden), on which the MCIP server 102 generates an object, which is determined according to predefined size and motion parameters, received from the corresponding encoder. The object encompasses the body of a parson, penetrating into the forbidden zone and walking from right to left. The motion parameters are continuously updated, such that the center of the object is tracked. The MCIP server 102 generates a trace (solid line) that provides a graphical indication regarding his motion within the forbidden zone. Pig. 38 is an image of the same selected AOI, on which the MCIP server 102 generates the object and the trace (solid line) that provides a graphical indication regarding his motion within the forbidden zone from left to right and more closely to the camera. Fig. 3C is an image of the same selected AOI, on which the MCIP server 102 generates the object and the trace (solid line) that provides a graphical indication regarding his motion within the
forbidden zone again from right to left more closely to the camera. The filtration performed by the correspespoeding encoder prevents the generation of background movements, such as tree-tops and lower vegetation, which are considered as background noise.
The above examples and description haveof course been provided only for the purpose of illustration, and are not Intended to limit the invention in any way. As will be appreciated by the skilled person, the invention can be carried out in a great variety of ways employing more than one technique from those described above, all without exceeding the scope of the invention.

WE CLAIM:
1. A method for performing event detection and object tracking in image streams,
comprising:
a) installing a set of image acquisition devices (CAM 1 - CAM n), each in field where an event is to be detected, each comprising a local image processor programmed to convert at least one image stream into a digital image format;
b) connecting each image acquisition device to a data network (101) through a corresponding data communication channel;
c) installing and connecting a central linage processing server (102) to said data network (101);
the method being characterized by:
d) providing an encoder (Encoder 1 - Encoder n) to each image acquisition
device for
analyzing said digital image stream so as to generate features, the features being parameters related to attributes of objects in said image stream; and
transmitting feature stream to the central server, whenever said features exceed a predetermined threshold;
e) whenever said central image processing server (102) receives features from
the local encoder (Encoder 1 - Encoder n) through its corresponding data
communication channel and said data network (101), obtaining indications regarding
events in said image streams by processing, by the server (102), said feature stream, and
transmitting said indications to an operator.
2. The method as claimed in Claim 1, wherein the local encoder is a composite
encoder, being the local encoder that comprises circuitry for compressing the image
stream, said composite encoder being capable of .operating in a first mode, during which
it generates and transmits the features to the server, and in a second mode, during which
it transmits to said server, in addition to said features, at least a portion of said image
stream in a desired compression level, according to commands sent from said server.
3. The method as claimed in Claim 2, comprising, controlling each composite
encoder, by a command sent from said sepver (102), to operate in its first mode ; as long
as the server receives features from a composite encoder :
(a) controlling that composite encoder , by a command sent from said server, to operate in its second mode ; and
(b) obtaining indications regarding events in said image streams by processing, by said server, said feature stream, and transmitting said indications and/or their corresponding image streams to an operator.

4. The method as claimed in Claims 1 or 2, comprising decoding one or more compressed image streams containing events by said server, and transmitting the decoded image streams to the display of in operator, for viewing.
5. The method as claimed in Claims 1 or 2, comprising recording one or more compressed image streams obtained while their local encoder operates in its second mode.
6. The method as claimed in Claims 1 or 2, comprising dynamically allocating additional image processing resources, in the server, to data communication channels that receive image streams.
7. The method as claimed in Claims 1 or 2, wherein one or more feature streams obtained while operating in the first mode, comprises only a portion of the image.
8. The method as claimed in Claim 6, comprising generating and displaying a graphical polygon that encompasses an object of interest, being within the frame of an image or an area of interest in said image.
9. The method as claimed in Claim 8, comprising generating and displaying a graphical trace indicating the history df movement of an object of interest, being within the frame of an image or an area of interest in said image.
10. The method as claimed in Claims 1 or 2, wherein the image stream is selected from the group of images that comprises video streams, still images, computer
generated images, pre-recorded digital or analog video data, and video streams compressed using MPEG format.
11. The method as claimed in Claims 1 or 2, wherein during each mode, the encoder uses different resolution and frame rale.
12. The method as claimed in Claims 1 or 2, wherein the features are selected from the following group:

- motion features;
- color;
- portion of the image,
- edge data; and
- frequency related information.
13. The method as claimed in Claim 1 or 2, comprising performing, by the server,
one or more of the following operations and/or any combination thereof:
- License Plate Recognition (LPR);
- Facial Recognition (FR);

- detection of traffic rules violations ;
- behavior recognition;
- fire detection;
- traffic flow detection;
- smoke detection,
using a feature stream, received from the local encoder of at least one image acquisition device, through its data communication channel.
14. A system for performing the method of event detection and object tracking in
image streams as claimed in claim 1, comprising:
(a) a set of image acquisition devices, installed in field, each of which comprises: -
a.l) a local programmable processor for concerting the acquired image stream, to a digital format
a.2) a local encoder, for generating, from said image stream, features, being parameters related to attributes of objects in said image stream, and for transmitting a feature stream, whenever said motion features exceed a corresponding threshold ;
(b) a data network, to which each image acquisition device is connected through a corresponding data communication channel; and
(c) an image processing server connected to said data network, said server being capable of determining said threshold, of obtaining indications regarding events in said image streams by processing said feature stream, and of transmitting said indications to an operator.
15. The system as claimed in Claim 14, in which the local encoder is a composite
encoder, being the local encoder that comprises circuitry for compressing the image
stream, said composite encoder being capable of operating in a first mode, during which
it generates and transmits the features to the server, and in a second mode, during which
it transmits to said server, in addition to said features, at least a portion of said image
stream in a desired compression level aeqording to commands sent from said server.
16. The system as claimed in Claims 14 or 15, comprising an operator display, for receiving one or more image streams that are decoded by the server and contain events.
17. The system as claimed in Claims 14 or 15, comprising a network video recorder for recording one or more image streams, obtained while their local encoder operates in its second mode.
18. The system as claimed in Clainis 14 Or 15, in which the server is capable of dynamically allocating additional image processing resources to data communication channels that receive image streams.
19. The system as claimed in Claims 14 or 15, in which one or more image streams obtained while operating in the first mode, comprises only a portion of the image that corresponds to a desired area of interest (AOI)-
20. The system as claimed in Claims 14 or 15, in which the server comprises processing means for generating and displaying a graphical polygon that encompasses an object of interest, being within the frame of an image or an area of interest in said image.
21. The system as claimed in Gain) 20, in which the server comprises processing means for generating and displaying a graphical trace indicating the history of movement of an object of interest, being within the frame of an image or an area of interest in said image.
22. A method for performing event detection and object tracking in image streams, substantially as herein described with reference to the accompanying drawings.
23. A system for performing event detection and object tracking in image streams, substantially as herein described with reference to the accompanying drawings.

Documents:

437-delnp-2005-abstract.pdf

437-delnp-2005-claims.pdf

437-delnp-2005-complete specificatiion (as,fiels).pdf

437-delnp-2005-complete specificatiion (granted).pdf

437-delnp-2005-Correspondence Others-(31-07-2012).pdf

437-delnp-2005-correspondence-others.pdf

437-delnp-2005-correspondence-po.pdf

437-delnp-2005-descreption (complete).pdf

437-delnp-2005-drawings.pdf

437-delnp-2005-form-1.pdf

437-delnp-2005-form-13.pdf

437-delnp-2005-form-18.pdf

437-delnp-2005-form-2.pdf

437-delnp-2005-form-26.pdf

437-delnp-2005-Form-3-(31-07-2012).pdf

437-delnp-2005-form-3.pdf

437-delnp-2005-form-5.pdf

437-delnp-2005-pct-210.pdf

437-delnp-2005-pct-306.pdf

437-delnp-2005-pct-409.pdf

437-delnp-2005-pct-416.pdf

437-delnp-2005-Petition-137-(31-07-2012).pdf

437-delnp-2005-petition-137.pdf

437-delnp-2005-petition-138.pdf

abstract.jpg

« Previous Patent

Next Patent »

Patent Number

255403

Indian Patent Application Number

437/DELNP/2005

PG Journal Number

08/2013

Publication Date

22-Feb-2013

Grant Date

18-Feb-2013

Date of Filing

04-Feb-2005

Name of Patentee

ASPECTUS LTD.

Applicant Address

94 DERECH EM HAMOSHAVOT ALON BUILDING, PARK AZORIM, P.O. BOX 3142, KIRYAT ARIE, 49130 PETACH TIKVA ISRAEL

Inventors:

#	Inventor's Name	Inventor's Address
1	TALMON, GAD	9/18 YIGAL ALON STREET, 55030 KIRYAT ONO ISRAEL
2	ASHANI, ZVI	16, HACARMEL STREET, 55900 GANEI TIKVA ISRAEL

PCT International Classification Number

G06T 7/00

PCT International Application Number

PCT/IL2003/000555

PCT International Filing date

2003-07-03

PCT Conventions:

#	PCT Application Number	Date of Convention	Priority Country
1	60/394,205	2002-07-05	U.S.A.