Title of Invention

VOICE RECOGNITION BASED SECURITY SYSTEM FOR MOBILE PHONES

Abstract The present invention relates to the Mobile phone security -Application. More particularly the present invention relates to the voice recognition based security system for mobile phones. This invention is a voice recognition based security system for mobile phones comprising; Voice processing module which generates the voice model based on the spoken words; Neuro-fuzzy pattern recognition module which takes care of the little variations in the tone and other factors; Speaker verification module which verifies the voice model obtained from the speaker with that available in the database; and database update module which fine tunes the existing data with the help of the new data and updating the data accordingly.
Full Text VrlELD OF INVENTION
The present invention relates in general to the filed of mobile communication devices. Further, this invention relates to Mobile phone security-Application. More particularly the present invention relates to the voice recognition based security system for mobile phones.
DESCRIPTION OF THE RELATED ART
The contemporary security measure in cell phones is based on a unique number system. Every handset (cell phone) has a unique 15-digit number embedded in it. This number is used to uniquely identify the handset and disable the handset permanently whenever it gets lost or is stolen. The handset is rendered useless forever. But this is done only when the owner reports the number to the service provider.
The following problems are identified:
1. The very concept is not known to the layman.
2. The number is not easily remembered.
3. The process of obtaining the number from the cell (by punching the keys *#06#) is unknown to many.
4. Action is not taken immediately and by the time it is taken, misuse can be done by way of calls or selling the instrument itself to some third party.
5. If the SIM card is removed immediately, no harm can be done to the ] instrument.
6. The process is not easily reversible, i.e., if the owner gets back the instrument, he too will be unable to use it if the cell has been disabled.
7. The barring process is more often governed by the law of the land.
SUMMARY OF THE INVENTION
2

The proposed invention overcomes all the weaknesses of the existing system.
1. The most important advantage of this invention is that it provides a standalone security to the mobile device as against the contemporary method which requires a third party (the service provider) to take the necessary steps to provide security and to revert back.
2. The security is instantaneous. The phone cannot be misused even for a second.
3. The user does not have to remember any passwords. The process is very simple and reversible.
4. Since voice recognition is being used, no additional external hardware is required like sensors, etc. Hence cost and maintenance is less.
5. Since voice is used, processing overhead is less as compared to image-based recognitions.
6. User safety is ensured since no scanning is needed.
7. Even when the phone is with the user, any unwanted person cannot use the phone without the owner"s permission.
To provide an easy, quick, efficient and fool proof security system to a mobile phone against theft or misuse.
This invention is aimed at providing a means for automatically safeguarding the mobile devices (cell phones and smart phones) against misuse in case of loss or theft, by means of voice based owner authentication system.
The authentication is based on a user-selected-language based text prompted speaker verification system where by access to use the phone is granted only to the owner or registered user after ascertaining that the obtained voice model matches the reference model in the database. If the device is used by more than one user then it can be configured to grant permission to more than one user. The unique features of this invention are the steps taken to overcome some of the classical
3

limitations posed by text prompted speaker recognition techniques and to provide a fool proof security to the device against theft or misuse.
Accordingly, the present invention explains a voice recognition based security system for handheld communication devices comprising:
(a) voice processing module which generates the voice model based on the
spoken words;
(b) neuro-fuzzy pattern recognition module which takes care of the little
variations in the tone and other factors;
(c) speaker verification module which verifies the voice model obtained from the
speaker with that available in the database; and
(d) database update module which fine tunes the existing data with the help of
the new data and updating the data accordingly.
The other objects, features and advantages will become apparent form the ensuing description of the invention given as a non-limiting example, taken in conjunction with the accompanying drawings.
BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS Figure 1 shows the block diagram of training process.
Figure 2 shows the block diagram of the unlocking process.
Figure 3 shows the block diagram of unlocking process.
Figure 4 shows the block diagram of database update process.
DETAILED DESCRIPTION OF THE INVENTION
A preferred embodiment of the present invention will now be explained with reference to the accompanying drawings. It should be understood however that the
4

-disclosed embodiment is merely exemplary of the invention, which may be embodied in various forms. The following description and drawings are not to be construed as limiting the invention and numerous specific details are described to provide a thorough understanding of the present invention, as the basis for the claims and as a basis for teaching one skilled in the art how to make and/or use the invention. However in certain instances, well-known or conventional details are not described in order not to unnecessarily obscure the present invention in detail.
The basic operation
Whenever the user wants to use the phone, he needs to unlock the same. (However if there is an incoming call, then the user may receive the same without having to unlock the instrument.) In order to unlock, the user needs to speak a few words as prompted by the handset into the microphone. The voice processing module generates the voice model based on the spoken words. The speaker verification module verifies the voice model obtained from the speaker with that available in the database. If a match is found (within acceptable limits), then the handset is unlocked and is free to be used. Else access will be denied.
The security system mentioned above consists of following main blocks;
1. Voice processing module
2. Neuro-fuzzy pattern recognition module
3. Speaker verification module
4. Database update module (Neuro-fuzzy tuning module)
Initially, when the handset is first activated, it does not contain any database. Upon activation, the handset prompts the user to train the voice recognition system. Until the training is carried out the phone cannot be used further. The voice recognition system proposed in this claim is a speaker dependent, language based, text prompted voice recognition system with subsequent text independent authentication.
5

The underlying principle of voice recognition system is based on the fact that speech of a person contains certain parameters, which are unique to every individual. These parameters, like frequency components, are independent of the spoken word and the spoken language. A voice model is built based on these extracted parameters.
Once the initial database is generated, voice based locking gets activated. The locking is both manual as well as time dependent, i.e., if no activity is detected for a fixed amount of time, the handset gets automatically locked.
The number of users who can be granted permission to use the instrument can be configured at any time by the master user. Master user is the primary user who first undergoes the training. The master user has the permission to add any user or delete any user who has been added to the user list. A separate database will be created for each user in the list. However the maximum number of users in the list can be set based on the physical resource constraints of the phone.
Training
The initial training is text prompted training. The block diagram of the training process is as shown in the figure 1. The set prompts certain pre-defined words which the user has to repeat. In this process the speaker who trains the system inputs the spoken words. The voice processing module extracts the unique parameters from the spoken words and builds a voice model for the user. In addition, individual models for each of the spoken words are also generated which are specific to those words. This is equivalent to saving the recorded words of the speaker. The training needs to be given more than once for the given set of words.
The more the training the accurate is the model. The training can be carried out anytime in future to fine tune the existing model and make it immune to various conditions mentioned below.
1. Variations in the tone due to cold or throat infection.
2. Mimicking
6

3. Background noise
For better security, an option of choosing the language is provided at the beginning of the training session. The set prompts the words of the chosen language. This feature has the following advantages.
1. It is difficult for the hacker to understand an alien language and pronounce
the words properly with the right accent.
2. It is easy for the user to speak words in a language which he is comfortable
with.
Once the first round of training is over, the next time onwards the words are prompted in the speaker"s own recorded voice and both the model and the recorded voice are updated.
Unlocking process
Whenever the user wants to use the cell, he needs to unlock it. The unlocking process is something like this. The block diagram for this is as shown in the figures. Figure 2.
The set prompts the user with certain words which the user has to repeat. These words are in the user"s own voice. This has got the following advantages:
1. Better security, since accent may not be easily understood by others.
2. Processing is easier since simple pattern recognition can narrow down the
process by eliminating unauthorized user without having to generate the
voice model.
3. All the environmental and other conditions are taken into account during
verification.
The words repeated by the speaker are subjected to a neuro-fuzzy pattern recognition module. The neuro-fuzzy nature takes care of the little variations due to jitter, noise, slight change in the tone and several other minute factors which are likely to happen when a person speaks at different times. This is a very simple
7

technique which is not computationally very intensive. This technique can very easily and quickly distinguish between valid and invalid user. The error generated by the neuro-fuzzy pattern recognition module is compared against a pre-defined threshold. This threshold takes care of the variations in the environment and the valid speaker"s speech variations. If the error is greater than the threshold, then it is an unauthorized user and hence access is denied. If the error is within the threshold, then further processing takes place to validate the user. This is shown in Figure 3.
Once the error is within the acceptable limits, it is processed further. A voice model is built based on the available speech. This model is compared against the reference voice model in the data base in a voice verification module. If they agree within the acceptable limits, access is granted to the user to use the cell else access is denied.
Online database update
One of the drawbacks of the conventional voice recognition systems is that they are not completely foolproof. This is because the conditions at the time of training and that at the time of using are not the same. These conditions are of two types:
1. Environmental conditions like background noise.
2. User specific conditions like change in tone, drag, throat infection, mimicking,
etc.
This is because these conditions are not taken into account during the training process and it is difficult to simulate such conditions during the training. These drawbacks are taken care of in the proposed implementation like this.
Whenever the user is speaking on the line (making or receiving call), the voice processing model continuously generates the voice model which is based on the frequency components of the user"s speech. The neuro-fuzzy module then fine tunes the existing data with the help of the new data (Figure 4).
8

VThus training is a continuous background process. With each session, the database gets stronger and stronger and also immune to the common errors found in voice recognition systems stated above. However if the module encounters a very large difference in the obtained model and existing model, it does not update the reference model. This could be either because of a lot of disturbances or the user has handed over the phone to a different person temporarily.
User options
The phone can be configured for single user or multi-user mode. The first person to train/use the system is called the master user. The master user has the right to add or delete any user. In a multi-user mode, while unlocking, the cell displays a list of users. The user has to select his name and proceed. The authenticated user has the following options once he unlocks the cell.
1. Re-train the system
2. Formatting the voice recognition system. This wipes out the entire recorded database for that user. The master user however has the right to selectively format any or the entire users" database and start the entire process afresh.
3. Add new user (master user only).
4. Disable the online database update feature temporarily if the user wants to give the phone to some third person for use temporarily.
It will also be obvious to those skilled in the art that other control methods and apparatuses can be derived from the combinations of the various methods and apparatuses of the present invention as taught by the description and the accompanying drawings and these shall also be considered within the scope of the present invention. Further, description of such combinations and variations is therefore omitted above. It should also be noted that the host for storing the applications include but not limited to a computer, mobile communication device, mobile server or a multi function device.
Although the present invention has been fully described in connection with the preferred embodiments thereof with reference to the accompanying drawings, it is
9

to be noted that various changes and modifications are possible and are apparent to those skilled in the art. Such changes and modifications are to be understood as included within the scope of the present invention as defined by the appended claims unless they depart there from.

REFERENCES
1. An overview of automatic speaker recognition technology - Douglas A. Reynolds
2. Speaker Recognition: A tutorial - Joseph P. Campbell, JR.
3. Speaker Recognition - Sadaoki Furui
GLOSSARY OF TERMS AND THEIR DEFINITIONS
SIM: Security Identity Module (cell phone usage)
Voice Model: It is a phoneme model built on certain parameters which are unique to an individual (speech). For speaker recognition, features that exhibit high speaker discrimination power, high inter speaker variability, and low intra speaker variability are desired. These phoneme models can be template models (like in dynamic time warping (DTW)), statistical models (as in hidden Markov model (HMM)) or codebook models (as in vector quantization (VQ)).


WE CLAIM
1. A voice recognition based security system for handheid communication
devices comprising;
(a) voice processing module which generates the voice model based on the spoken words;
(b) neuro-fuzzy pattern recognition module which takes care of the little variations in the tone and other factors;
(c) speaker verification module which verifies the voice model obtained from the speaker with that available in the database; and
(d) database update module which fine tunes the existing data with the help of the new data and updating the data accordingly.

2. A system as claimed in claim 1 wherein it involves authentication which is based on a user-selected-language based text prompted speaker verification system where by access to use the phone is granted only to the owner or registered user after ascertaining that the obtained voice model matches the reference model in the database.
3. A system as claimed in claim 1 wherein if the device is used by more than one user then it can be configured to grant permission to more than one user.
4. A system as claimed in claim 1 wherein the said system is a speaker dependent, language based, text prompted voice recognition system with subsequent text independent authentication.
5. A system as claimed in claim 1 wherein voice model is built based on extracted parameters like frequency components which are unique to every individual.
6. A system as claimed in claim 1 wherein voice based locking gets activated when initial database is generated which is both manual as well as time dependent.
7. A system as claimed in claim 1 wherein the number of users who are granted
12

permission to use the instrument is configured at any time by the master user where the said Master user is the primary user who first undergoes the training.
8. A system as claimed in claim 1 wherein the master user has the permission to add any user or delete any user who has been added to the user list.
9. A system as claimed in claim 7 wherein the said training is a text prompted training and the speaker who trains the system inputs the spoken words thereafter voice processing module extracts the unique parameters from the spoken words and builds a voice model for the user.
10. A system as claimed in claim 7 wherein the said training generates individual models for each of the spoken words which are specific to those words.
11. A system as claimed in claim 7 wherein the said training is given more than once for the given set of words.
12. A system as claimed in claim 7 wherein the said training can be carried out at anytime to fine tune the existing model and make it immune to various conditions such as Variations in the tone due to cold or throat infection, mimicking or background noise.
13. A system as claimed in claim 7 wherein the said training provides a choice of a language for better security.
14. A system as claimed in claim 12 wherein the training given after the first round of training involves the words prompted in the speaker"s own recorded voice and both the model and the recorded voice getting updated.
15. A system as claimed in claim 1 wherein the user unlocks the cell for use and the said unlocking involves user repeating certain words prompted in selected user"s voice and these words repeated by the speaker subjected to a neuro-fuzzy pattern recognition module.
13

16.A system as-claimed in claiml wherein the neuro-fuzzy pattern recognition module takes care of the little variations due to jitter, noise, slight change in the tone and several other minute factors which are likely to happen when a person speaks at different times.
17. A system as claimed in claim 16 wherein the neuro-fuzzy pattern recognition module generates an error which is compared against a pre-defined threshold.
18. A system as claimed in claim17 wherein if the error is greater than the threshold, user is an unauthorized user and hence access is denied and if the error is within the threshold, then further processing takes place to validate the user.
19. A system as claimed in claim 18 wherein the said further processing involves building a voice model based on the available speech and the said model is compared against the reference voice model in the database in a voice verification module.
20. A system as claimed in claim 19 wherein if the results of the comparison are within the acceptable limits, access is granted to the user to use the cell else access is denied.
21. A system as claimed in claim 20 wherein authenticated user once unlocks the cell gets the options to re-train the system, formatting the voice recognition system, add new user (master user only) and disable the online database update feature temporarily.
22. A system as claimed in claiml wherein when the user is speaking on the line, the voice processing model continuously generates the voice model which is based on the frequency components of the user"s speech and the database update module then fine tunes the existing data with the help of the new data.
23.A system as claimed in claiml if the data update module encounters a very large difference in the obtained model and existing model, the reference model is not updated.
14

24. A voice recognition based security system for handheld communication devices such as substantially herein descried particularly with reference to the accompanying drawings.

15

Documents:

1481-che-2004 abstract-duplicate.pdf

1481-che-2004 abstract.pdf

1481-che-2004 claims-duplicate.pdf

1481-che-2004 claims.pdf

1481-che-2004 correspondence-others.pdf

1481-che-2004 correspondence-po.pdf

1481-che-2004 description(complete)-duplicate.pdf

1481-che-2004 description(complete).pdf

1481-che-2004 drawings-duplicate.pdf

1481-che-2004 drawings.pdf

1481-che-2004 form-1.pdf

1481-che-2004 form-13.pdf

1481-che-2004 form-19.pdf

1481-che-2004 form-26.pdf

1481-che-2004 others.pdf


Patent Number 215448
Indian Patent Application Number 1481/CHE/2004
PG Journal Number 13/2008
Publication Date 31-Mar-2008
Grant Date 26-Feb-2008
Date of Filing 31-Dec-2004
Name of Patentee SAMSUNG INDIA SOFTWARE OPERATIOS PRIVATE LIMITED
Applicant Address BAGMANE LAKEVIEW, BLOCK B', NO. 66/1, BAGMANE TECH PARK, C V RAMAN NAGAR, BYRASANDRA, BANGALORE-560093,
Inventors:
# Inventor's Name Inventor's Address
1 SACHIN PURANDARDAS KAMAT BAGMANE LAKEVIEW, BLOCK B', NO. 66/1, BAGMANE TECH PARK, C V RAMAN NAGAR, BYRASANDRA, BANGALORE-560093,
PCT International Classification Number G10L 17/00
PCT International Application Number N/A
PCT International Filing date
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 NA