Title of Invention

METHOD FOR TRACKING AN OBJECT IN AN AVATAR-BASED VIDEO CONFERENCING SYSTEM

Abstract The invention describes a system and a method to capture a pre-defined object from a camera generated video in a mobile environment. The object to be captured is first superimposed on a cue. A user then analyses whether the cue fits the object precisely or not. The key features to be tracked from the captured object are selected through a user interface and matched with the features of the cue. Once the user is satisfied with the arrangement of the cue and the actual object, the user sends a feedback to capture the image. The object is further tracked by reading the expressions of the object from the selected cue outline. If the tracking of the object stops functioning properly then a re-initialization of the object for tracking can be done.
Full Text FIELD OF INVENTION
This invention in general relates to the field of mobile communication. More particularly this invention relates to a method and system for user assisted object tracking in an Avatar based video-conferencing system.
DESCRIPTION OF RELATED PRIOR ART
The basic concept of an Avatar-based video conferencing and messaging system is as follows:
Avatars are virtual representations of actual objects which are meant to replicate the object's behavior in a given environment, without the actual object coming into view. For example, a cartoon's face could be used by a person to represent him.
Such systems are already in place in popular chatting portals like Yahoo! Etc. The level of representation varies in different applications. In some cases, the avatars are static images. In others, they are displayed as emoticons which can do some pre-defined actions based on user selection.
Recently, a company called Sensorylnc has developed text-to-3D and voice-to-3D avatars to be deployed in mobile environments.
LIMITATIONS
All the existing Avatar based systems, mentioned in related art, are limited in their capabilities of replicating their source's behavior in real-time. Camera-based systems allow the possibility of real-time tracking of source's behavior (for example a user's face) in real-time. The tracked motion can be interpreted suitably to extract source behavior and this behavior can be mapped onto the avatar.
One of the key practical problems which arise in implementing a camera-based

Avatar is the issue of identifying the source object in the camera image. SUMMARY OF THE INVENTION
The present invention relates to a system and method for capturing a pre-defined object from a camera generated video in a mobile environment.
The present invention relates to a method of using customizable, parameterized mask (depending on object) in achieving user-assisted capture from camera images to arrive at an initial estimate of the location of the object.
The present invention relates to the method of allowing a customizable mask to allow best possible mapping of mask to the desired object in the camera image.
The present invention further related to the Method of applying re-sync mechanism in case tracking is going wrong.
Accordingly, this invention explains a method for a user assisted object tracking in an avatar-based video conferencing system comprising the steps of:
(a) capturing an image of an object in a camera;
(b) superimposing the object captured in the camera to a cue;
(c) adjusting the parameters associated with the cue to make the object fit in the said cue;
(d) locking the image and assigning plurality of cue points if the image and the cue fits;
(e) storing the cue points in a memory and mapping the cue points to plurality of featured points on the object image to form a parameterized cue model; and
(f) Deriving the motion and the location of the object from the mapped cue points and the feature points.

The cue is adapted to be configured before tracking or while the said tracking is in progress. The shape and size of the cue is adapted to be controlled through parameters associated with the cue. The camera is moved accordingly to make the object fit in the said cue. The feature points comprise feature point coordinates located at fixed positions in the parameterized cue model. On changing the cue's shape or size, the feature point coordinates also gets changed accordingly. A reinitialization of the object for a fresh object location for tracking is carried out if the tracking of the object stops functioning. Re-initialization involves the process of capturing and superimposing the cue with the fresh object location.
Accordingly, this invention further explains a system for a user assisted object tracking in an avatar-based video conferencing system comprising:
(a) a camera unit for capturing an image of an object;
(b) display unit for superimposing the object captured in the camera unit to a cue;
(c) memory module to store plurality of cue points which are mapped to plurality of feature points; and
(d) an object tracking module for tracking the feature-points of the object; and
are-initialization module for tracking a fresh object location if the tracking of the object stops functioning.
The tracking module outputs the average or regularized motion of the object derived from the motion of the feature points.
These and other objects, features and advantages of the present invention will become more apparent from the ensuing detailed description of the invention taken in conjunction with the accompanying drawings.
BRIEF DESCRIPTION OF ACCOMPANYING DRAWINGS
Figure 1 illustrates an example system where a camera is used to capture image

And the object to be tracked is a human face in the camera image.
Figure 2 illustrates the using of a cue / mask which resembles a human face, superimposed onto the camera image.
Figure 3 illustrates the flowchart of operation of the invention.
Figure 4 illustrates an example coordinate system to classify points in the cue /mask.
Figure 5 illustrates an example of an initial cue being modified by the user to fit the object in a better fashion.
DETAILED DESCRIPTION OF THE INVENTION
The preferred embodiments of the present invention will now be explained with reference to the accompanying drawings. It should be understood however that the disclosed embodiments are merely exemplary of the invention, which may be embodied in various forms. The following description and drawings are not to be construed as limiting the invention and numerous specific details are described to provide a thorough understanding of the present invention, as the basis for the claims and as a basis for teaching one skilled in the art how to make and/or use the invention. However in certain instances, well-known or conventional details are not described in order not to unnecessarily obscure the present invention in detail.
The present invention improves upon existing methods of object-tracking by offering a robust, low-complexity mechanism to acquire object location to be used for tracking purposes. Due to recent advance in computer vision technology, object tracking systems are now coming into the mainstream. The success of object tracking rests heavily on obtaining a good initial estimate of the object's location - and then using motion and other parameters to track the object.

Obtaining a good initial estimate is a tricky problem in mobile scenarios such as a hand-held camera phone system. Our present invention lays down a simple and robust system to get initial estimate of object location. We use human-assisted feedback and customizable cues to allow the user to help the system get an accurate initial estimate of the object location. A mechanism for resynchronization is also provided, in case the tracking starts becoming erroneous, so that a new location estimate can be provided to the tracker.
The invention relates to the field of avatar-based video conferencing systems for mobile phones. The invention describes a system and a method to capture a predefined object from a camera generated video in a mobile environment. The object to be captured is first superimposed on a cue. A user then analyses whether the cue fits the object precisely or not. The key features to be tracked from the captured object are selected through a user interface and matched with the features of the cue. Once the user is satisfied with the arrangement of the cue and the actual object, the user sends a feedback to capture the image. The object is further tracked by reading the expressions of the object from the selected cue outline. If the tracking of the object stops functioning properly then a reinitialization of the object for tracking can be done.
The present invention involves a camera system and a display system which displays the camera captured image. The objective is to track the spatial motion (in time) of an object (in the camera image). To start the tracking, an estimate of the initial location of the object is required. An example system is shown in figure 1. The figure 1 describes an example system where a camera is used to capture image and the object to be tracked is a human face in the camera image. The figure shows a avatar-based duplex communication scenario where two users -Useri and User2, are communicating using avatars. At UserVs end, the Camera captures images at periodic intervals which are given to the "Camera Frame Capture" block which buffers the images. These images are then given to the "Face Analysis and Abstraction Engine". This block is responsible for locating/extracting facial features and tracking the facial movement. The facial

Features are in form of a set of parameters which define salient points on the head. These parameters could be raw 2D-coordinates of facial feature points (like center of each eye, tip og nose, corners of mouth, etc.) or they could be higher level semantic interpretations of graded facial expressions (like smile, big smile, frown, angry, wide-eyed, etc.). These parameters are sent to the Tx-engine to be transmitted to User2. Similarly, the User2 sends his parameters to Useri which are received by the Rx engine. The parameters are sent to the mapping engine, which uses these parameters to modify the facial expressions/head orientation of the Avatar representation of User2. After the ampping the display frame containing the updated representation of User2 is sent to UseiTs display engine which renders it onto the display device (e.g. LCD) of UseM. A similar process happens at User2's end. In this manner, the avatar representations of both users get updated in real-time and are displayed to each other.
The present invention is structured around the concept of using user feedback to get a reliable initial estimate of the location of the object in the camera image. The user is provided (or can select from an available database) a cue whose shape and size can be controlled through parameters .Figure 2 illustrates the using of a cue / mask which resembles a human face, superimposed onto the camera image.
The User customizes the cue according to the object he wishes to track, by modifying these parameters .Figure 5 illustrates an example of a initial being modified by the user to fit the object in a better fashion. The first image shows the initial cue superimposed on the camera image. This cue does not fit the face in the camera image properly. The second image shows the cue after being modified by the user. This modified cue fits the face in a better manner. This cue-designing activity can be done beforehand (wherein the customized cue can be stored into a database), or done while using the application.
Once the user selects the cue, he aligns the object with the cue using camera movements. The process continues until the user is satisfied with the best fit

Possible under the circumstances. Following this, the user signals a "lock" event to the application.
Upon receiving the "lock" event from the user, the application derives the initial estimate of object's location through a set of feature points in the cue.
These feature points are at fixed positions in the parameterized cue model. When the cue's shape or size is changed, the feature point coordinates change accordingly. The coordinates are defined as referenced by the display coordinates. Figure 4 illustrates an example coordinate system to classify points in the cue /mask. After superimposition, the location of the feature points in the cue (e.g. eyes, mouth, nose, etc.) will be input to the tracker. The tracker will start its tracking algorithm centered on these feature point coordinates. The origin of the coordinate system is located at the top left corner of the displayable area on the screen/display device. The convention followed is that the coordinates advance in a positive manner while going down or right from the origin. Following are the definitions of the terms used in the figure 4.
Sh = Height of the screen
Sw = Width of the screen
Ew = distance between both eyes (measured from center of pupil of each
eye)
Eh = vertical distance of eye from center of display area (Sh/2, Sw/2)
Mw = horizontal distance of mouth from center of display area
Mh = vertical distance of mouth from center of display area The positions of the cues are defined in terms of the above coordinate system.
The User first describes the object in terms of a cue or a mask outline. On start of the application, the cue is presented on the display unit (example a LCD) superimposed on the camera-generated image.
One can think of the cue and the camera-generated image as two image planes superimposed and shown on the LCD. As the camera moves, the camera-

Generated image on the LCD moves. While the cue image stays static on the LCD. The user then aligns the camera so that the desired object in the camera-image fits the cue as best as possible. Once the user is satisfied with the fit, a signal is sent by the user to the system indicating that the cue has been synchronized with the object.
Once the synchronization is achieved, the system, which is already aware of the location of cue on the display, captures the Points or Region of Interest from the camera image upon receiving the signal from the user. These points can be now used to track the object in future frames in the camera captured video.
For example, the object to be captured could be human face in a camera image from a mobile camera. The cue could be a generic face mask. The user will fit his face in the mask and send a signal to the system. The system will then capture the salient points of the face, after receiving the signal (for e.g. location of eyes, mouth, nose, etc.).
Once the feature-points are obtained, they are tracked in the camera image by the -object-tracking module. The Tracking module can have as an output the average or regularized motion of the object derived from the motion of the feature points.
There can be situations where the object tracking stops functioning properly. For example, the object goes out of the camera view, and then reappears after some time. In such cases, a re-initialization of the object for tracking needs to be done. The user can signal such an event to the application, when he notices that the tracking is not going well through some display-based feedback mechanism.
When a "resync" is signaled by the application, then the tracking module stops operation. The application gives control to the user to go through the process of .aligning the cue with the fresh object location again. In the "resync" mode, the cue used at the start of the beginning can be reused.

A block flowchart of full system operation is shown in the drawings.
1. The User invokes the application using the application screen.
2. Cue/Mask pops up on the screen superimposed on the camera image.
3. User then signals to align object with the cue/mask
4. Adjusting the cue/mask by changing its parameters is done.
5. Using the cue and camera image the user aligns the relavent object with the cue/mask.
6. Once the user is satisfied with the alignment he signals to the application to lock.
7. On receiving the lock signal from the user the application stores the location of cue points in the database.
8. The cue points are treated as feature points on the objects image and are used for tracking the object using an object tracking system.
9. User indicates need for resync and the system goes into resync mode.
It will also be obvious to those skilled in the art that other control methods and apparatuses can be derived from the combinations of the various methods and apparatuses of the present invention as taught by the description and the accompanying drawings and these shall also be considered within the scope of the present invention. Further, description of such combinations and variations is therefore omitted above. It should also be noted that the host for storing the applications include but not limited to a microchip, microprocessor, handheld communication device.
Although the present invention has been fully described in connection with the preferred embodiments thereof with reference to the accompanying drawings, it is to be noted that various changes and modifications are possible and are apparent to those skilled in the art. Such changes and modifications are to be understood as included within the scope of the present invention as defined by the appended claims unless they depart therefrom.

ADVANTAGES
The above system allows the following advantages in a practical mobile camera environment
1. Use of custom designed and customizable cues suited to the object
2. Using User feedback to enable a good initial location estimate of the object
3. Easy to implement and use
4. Greatly reduces the time to achieve "Lock"
5. Allows re-synchronization by user if object tracking starts going off the mark
WE CLAIM
1. A method for a user assisted object tracking in an avatar-based video conferencing system comprising:
(a) capturing an image of an object in a camera ;
(b) superimposing the object captured in the camera to a cue;
(c) adjusting the parameters associated with the cue to make the object fit in the said cue;
(d) locking the image and assigning plurality of cue points if the image and the cue fits;
(e) storing the cue points in a memory and mapping the cue points to plurality of featured points on the object image to form a parameterized cue model; and
(f) deriving the motion and the location of the object from the mapped cue points and the feature points.

2. A method as claimed in claim 1 wherein the cue is adapted to be configured before the tracking or while the said tracking is in progress.
3. A method as claimed in claim 1 wherein the shape and size of the cue is adapted to be controlled through parameters associated with the cue.
4. A method as claimed in claim 1 wherein the camera is moved accordingly to make the object fit in the said cue.
5. A method as claimed in claim 1 wherein the feature points comprise feature point coordinates located at fixed positions in the parameterized cue model.
6. A method as claimed in claim 1 wherein on changing the cue's shape or size, the feature point coordinates also gets changed accordingly.

7. A method as claimed in claim 1 wherein a re-initialization of the object for a fresh object location for tracking is carried out if the tracking of the object stops functioning.
8. A method as claimed in claim 1 wherein re-initialization involves the process of capturing and superimposing the cue with the fresh object location.
9. A system for a user assisted object tracking in an avatar-based video conferencing system comprising:

(a) a camera unit for capturing an image of an object;
(b) display unit for superimposing the object captured in the camera unit to a cue;
(c) memory module to store plurality of cue points which are mapped to plurality of feature points;
(d) an object tracking module for tracking the feature-points of the object; and
(e) a re-initialization module for tracking a fresh object location if the tracking of the object stops functioning.

10. A system as claimed in claim 1 wherein the tracking module outputs the average or regularized motion of the object derived from the motion of the feature points.
11. A method for a user assisted object tracking in an avatar-based video conferencing system substantially as herein described particularly with reference to the drawings.

12. A system for a user assisted object tracking in an avatar-based video conferencing system substantially as herein described particularly with reference to the drawings.
Dated this the 17th day of November 2006

Documents:

2136-CHE-2006 AMENDED PAGES OF SPECIFICATION 14-10-2013.pdf

2136-CHE-2006 AMENDED CLAIMS 14-10-2013.pdf

2136-CHE-2006 EXAMINATION REPORT REPLY RECEIVED 14-10-2013.pdf

2136-CHE-2006 FORM-1 14-10-2013.pdf

2136-CHE-2006 FORM-13 14-10-2013.pdf

2136-CHE-2006 POWER OF ATTORNEY 14-10-2013.pdf

2136-che-2006-abstract.pdf

2136-che-2006-claims.pdf

2136-che-2006-correspondnece-others.pdf

2136-che-2006-description(complete).pdf

2136-che-2006-drawings.pdf

2136-che-2006-form 1.pdf

2136-che-2006-form 26.pdf


Patent Number 257747
Indian Patent Application Number 2136/CHE/2006
PG Journal Number 44/2013
Publication Date 01-Nov-2013
Grant Date 31-Oct-2013
Date of Filing 17-Nov-2006
Name of Patentee SAMSUNG INDIA SOFTWARE OPERATIOS PRIVATE LIMITED
Applicant Address BAGEMANE LAKEVIEW, BLOCK B, NO. 66/1, BAGMANE TECH PARK, C. V. RAMAN NAGAR, BYRASANDRA, BANGALORE- 560 093, KARNATAKA, INDIA
Inventors:
# Inventor's Name Inventor's Address
1 ANSHUL SHARMA EMPLOYED AT SAMSUNG INDIA SOFTWARE OPERATIONS PVT LTD, HAVING ITS OFFICE AT, BAGEMANE LAKEVIEW, BLOCK B, NO. 66/1, BAGMANE TECH PARK, C. V. RAMAN NAGAR, BYRASANDRA, BANGALORE- 560 093, KARNATAKA, INDIA
2 RAVINDRA SHET EMPLOYED AT SAMSUNG INDIA SOFTWARE OPERATIONS PVT LTD, HAVING ITS OFFICE AT, BAGEMANE LAKEVIEW, BLOCK B, NO. 66/1, BAGMANE TECH PARK, C. V. RAMAN NAGAR, BYRASANDRA, BANGALORE- 560 093, KARNATAKA, INDIA
PCT International Classification Number G06F01/00
PCT International Application Number N/A
PCT International Filing date
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 NA