|Title of Invention||
' A SPEECH SYNTHESIZER FOR VOCALLY HANDICAP PERSON
|Abstract||A Speech Synthesizer for vocally handicap person which consists of component parts namely the main synthesizer chip ADSP-2101, Buffers 74LS373, Multivibrators 556 and 74LS76, Counters 74LS160, two LED displays TIL311, Crystal 16 MHz, Memory EPROM 27C020, Transistor 2N2369, Converter DAC AD7545 which are interconnected as shown in Figure 1 of the accompanying drawings to form an integral circuit to constitute the Speech Synthesizer. The said integral circuit may be connected to an Audio-amplifier and a speaker. A set of foot-operated or chin-operated switches may be provided for the spastic persons.|
|Full Text||The present invention relates to a Speech Synthesizer herein designated as 'VANEESHREE SPEECH SYNTHESIZER'for vocally handicap person.
The speech synthesizer is an electronic gadget which is used to synthesize the speech for a vocally handicap person. The speech synthesizer helps in synthesizing intelligible speech. The speech synthesizer may be hooked to an audio-amplifier so that the synthesized speech is audible to every body in the house/premises.
Different types of speech synthesizers are available in U.S.A., Europe and Japan. A large number of these devices are phoneme-based and use the chips SC01, SP0256, and the like. The phonemes of English (or other languges) may be concatenated to make the sentences in Indian languages. These give poor quality of speech which is hardly intelligible. The sentences in English using English-phonemes have relatively better intelligibility but these have mechanical and robotic voice which the patients do not like.
The Vaneeshree Speech synthesizer, according to the present invention, syntheszes the sentences from the reflection coefficients (RCs), pitch and gain obtained from frame to frame and obviates the drawbacks of the conventional phoneme based synthesizers.
Another type of speech synthesizer available in the international market is a CMOS speech synthesizer (TMS50C20) manufactured by Texas Instruments. This 40-pin chip is specially designed and fabricated by Texas Instruments for stand-alone operation with EPROM or the chip may be operated by a Personal Computer. It can use directly the EPROM 27C512 having 8 pages. After synthesis of speech, the latter is sent through a pulse width D/A converter. Outside the chip, a pulse width demodulator and audio-amplifier is required.
The speech synthesizer based on TMS50C20, made by the Texas Instruments, have the following limitations :
1. The chip synthesizes the speech using the reflection coefficients (RCs) in floating point mode. Since the chip is specially designed for this purpose (it does not have utilization in any other place), it is costly. This speech synthesizer unit costs about Rs. 1.50 Lakhs.
2. The chip can directly address the KPROM 27C512 which
contains eight pages. Since two sentences are accomodated
per page, it may have maximum upto fourteen sentences
(page-0 is reserved for the synthesis program).
3. The chip reads the sentence number either through key-board
or through 8-bit parallel/serial port.
4. It generates 8 KHz or 10KHz speech.
The Vaneeshree Speech synthesizer, according to this invention, uses basically a General purpose DSP (Digital Signal Processing) chip ADSP-2101 for synthesis of speech. It works in integer mode and this obviates the limitations of the known speech synthesizer based on TMS50C20 as stated above.
The Speech Synthesizer, according to the present invention, consists of component parts namely the main synthesizer chip ADSP-2101, three Buffers 74LS373U. ), 74LS37312) and 74LS373(3), Multivibrators 556 and 74LS76, two Counters 74LS160(1) and 74LS160(2), two LED displays T1L311, Crystal 16 MHz, Memory EPROM 27C020, Transistor 2N2369, Converter DAC AD7545 which are interconnected as shown in Figure 1 of the accompanying drawings as herein described, to constitute the Speech Synthesizer.
According to a preferred embodiment of the present invention, the said integral circuit is connected to an Audio-amplifier and a speaker so that the synthesized speech is'audible to every body in the house/premises. According to another embodiment, a set of foot-operated or chin-operated switches may be provided especially for the spastic persons whose hands do not work.
As shown in the accompanying drawings, the connection between the electronic chips, indicated
by the broad arrows, are conventional as supplied by the manufacturers of these chips. For
example, the LED display of sentence number by TIL311 is connected to the Counters 74LS160
through broad arrows This means tnat the specific pin-to-pin connections between T1L311 and
74LS160 will "be according to the manufacturer's sheet.
The sub-block connections are detailed below :
1. Multivibrator 556 and 74LS76.
This has got two vibrators , one in each side. The lower-side (pin-1 to pin 7) is an astable connection giving square wave output at approximately 1 Hz. The output is going to the two counters 74LS160(1) and 74LS160(2). Each pulse increments the counter. The upper side (pin-8 to pin-14) is a monostable connection. With clocking of switch SW3, the output states (in pin-9) are changed. This is again used for clocking the T-tvpe J-K flip-flop 74LS76. The output of 74LS76 is from pin-15 which will make Tthe ADSP-2101 eitheFin RESET or in RUN condition.
2. Counters 74LS160 and LED display TIL311.
There are two counters and two displays Each is operated by separate switch SW1 or SW2.
The numbers in theLSD display will change from 0 to 9 and back to 0 again. Two displays
constitute the sentence number.
3. The buffer 74LS373(1).
ADSP-2101 reads the sentence number through this buffer.
4. The crystal 16 MHz.
It is connected to the pins XTAL and CLOCKIN of ADSP-2101 through two capacitors 22 pf. This is the main clock of the DSP chip.
5. EPROM 27C020.
This is the main EPROM memory where the synthesis program and the coded sentences are kept. The data lines O0...07 are connected to DS..D15 of ADSP 2101. The adress lines AO..AU are connected to the similar pins of ADSP 2101. Then Au and A^ are connected to D^anftDti of ADSP-2101. Lastly Aw and An are connected to QB and Qc of 74LS160(1).
6. Buffers 74LS373(2) and 74LS373(3).
The synthesized speech, in digital form, passes through these. Its input lines are connected to D8...D19 of ADSP-2101 and output lines to D0...D11 of AD7545.
7. Transistor 2N2369.
Its base is connected to PMS of ADSP-2101 through 22k register. The collector is connected to pin-11 of buffers 74LS373(2) and 74LS373(3). Its objective is to open the buffers (ie
making it transparent), passing the speech data to the DAC side and then to close the buffer.
8. DAC AD7545.
This D to A converter makes the digital data analog. The two registers 750 fl and 220 fl make about 1V at pin-1. The output voltage is obtained from pin-19 in the range 0-1 volt (all positive).
This is the main synthesizer chip. The pin MMAP is to be grounded. Also BR and IRQ1 are to be connected to +5 volt for stable operation. BMS and RD pins are to be connected to ~CE and ~RD pins of EPROM 27C020. The pin DMS is connected to pin-1 of buffer 74LS373(1). Other pin connections are described above. The supply voltage required is +5V for best operation. All the chip used here have the same supply. Also note that, there is a capacitor of value 0.1/x/,μf between the supply and ground pin of each chip. This is to avoid the effect of transients.
Initially__a__sentence number is set in the two LED displays using the switches SW1 and SW2. Whenever switch SW3 is pressed , the program Starts Tunning. The DSP chip reads the sentence 'number the counters(74LSl60(l) and 74LS160(2)) controlling the CED display. The sentence number is then converted to the address where the RCs , pitch and gain of all frames(of the sentence) are stored. After the synthesis; the digital output passes through the two buffers (74LS373(2) and 74LS373(3)). The analog output is directly connected to audio-amplifier through Band pass filter.
DETAILED FUNCTIONING OF THEWANEESHR^E SPEECH SYNTHESIZER
The user will have to fix the sentence number first. It is done in a way easy for the spastics. If switches SW1( or SW2) are pressed , the numbers go on changing automatically. The user"will have to stop pressing whenever the appropriate number will come. Then he will press SW3 for run. During run the single LED lamp(Ll in Figure 1) will glow. The switch (SW3) actually makes the RESET of ADSP-2101 high for execution. Since ADSP-2101 is permanently set for EPROM operation (MMAP=0), the page 0 is loaded first. There are twoprograms on_this page and 256 random data. The first program reads the sentence-numBeinthrough the buffer 74LS373(1), decodes it and the separated page -numder and location on the page are stored in two registers. The second_ program is actually the synthesis program. The latter and the random data axe temporarily kept in the data"mrmory (DM) side and then the required page (where the needed sentence is stored) is brought using software rebooting scheme of ADSP-2101. This is pecisely done in the following way and is one of the novelty of the present invention.
The sentence number used here are 02...15, 22...35, 42...55 and 62...75 for the memory chip 27C020 (it is a 2M memory which contains 32 pages, the address lines Au and An are to be connected to QB and Qc respectively of counter 74LS160 (1) as shown in fig. 1). It will have a total of 56 sentences. Since QB and Qc may be 0 or 1, the combination 00, 01, 10 and 11 will expose eight pages in each level. It gives a total of 32 pages. Page 0 in each level is used for the different types of synthesis programs. Hence only 7 pages in each of (00, 01, 10, and 11)
level axe available to stgre_the coded sentences. The total 28 pages give 56 sentences with the rate of two sentences per page. This also gives the additional advantages of different sampling frequencies in each level. Some of the sentences needed by the spastics are small enough and these are sampled at higher frequencies of 15 kHz. Indeed sentence numbers 02...15 are all sampled at 15 kHz. Sentences numbers 22..35, 42..55 and 62..75 are sampled at 12 kHz, 10 kHz and 8 kHz respectively. During synthesis, the delays (between bytes synthesized) must be different and these are initially stored in different registers and are carried to the synthesis program. This is very helpful to reduce noise.
The two bytes of sentence numbers are actually the page-number and location on the page eg, sentence number 07 will have the coded sentence in the second half of third page (3x2 + 1 = 7). Once the ten RCs, pitch and gain( for each frame) are obtained, synthesis is started according to two-multiplier lattice model.
In the scheme according to the present invention, all computations are in integers, hence the RCs which are fractional numbers greater than -1 and less than +1, are truncated either after second or third decimal. Precisely the integers formed are either (RCx 128+128) or (RCx 1024+1024). This converts all RCs, positive or negative, into positive integers and these are possible to store in the program memory (PM) side as data. Since each PM location has 24-bits, each location may store 3 RCs in the first and 2 RCs in the second scheme. This helps very much in data compression and is pivotal for high compression ratio 50 : 1.
The synthesized speech data are fed to the two buffers (74LS373(2) and 74LS373(3)). The 12-bit DAC AD7545 works with +5V supply only. Hence the entire device needs only one supply voltage +5V. The analog output is fed to the audio-amplifier.
The synthesis program will be different if 2 RCs, instead of 3, are in the same memory location of PM. Page-0 of each QBQc = 00,01,10 and 11 levels contain these different synthesis programs.
Another important aspect of noise reduction is setting the silence level. If the minimum gain
in the entire sentence is Gmin (say), then a threshhold is taken as (Gmin + 4). All gains below this
threshhold are reduced to zero. This will make the duration between all words a near perfect silent
gaps but little distortion are created in the synthesizeTworHsT*Tlis"£Hers£hbld was experimenTally
vdefeFmined with extensive trial and error.
For the audio-amplifier, BEL chip 1895 or 2895 is used for different audio power.
The Speech Synthesizer, according to the present invention, has the following advantages :
1. High reduction, in cost.
Since the general purpose DSP chip ADSP-2101 is quite cheap , the entire cost of Vaneeshree Speech Synthesizer including the power supply (+5V) and audio-amplifier would be about Rs. 2500/=.
2. Since the execution of synthesis program is in integers, the consequent noise level is more.
This is eliminated by using several techniques as already explained herein.. The one impor
tant technique is the fixation of threshold ( Gmin+4). All noises between words is eliminated"
there by"glving the impression of noisefess audio-output.
3. Another important step of noise reduction is the use of RCs in the mode (RCx 1024 +
1024) or (RCx 128 + 128). The former takes the RCs upto third decimal of places where
as the latter upto second decimal. Though memory requirement in the former case is more( 2 RCs in one memory location of PM), the clarity of output is much better. All small sentences are synthesized in this way.
4. The third step of noise reduction is the use of higher sampling frequency This does not
become apparent in the begining. Higher sampling frequency reduces the duration of each
frame and improves the stationarity of the signal. This results in better quality of synthesis.
5. Using the LED display of sentence number and the direct connection between the counter
(74LS160(1)) and the EPROM (27C020) makes possible different levels of 8-pages. This
results in considerable increase of numbers of sentences, 56 in this model and upto about
250 in the extended model.
6. Most of the 12-bit DAC requires ±12 V in addition to + 5 V supply. The special choice
of AD7545 has made it possible the use of + 5 V only for the entire operation.Hence the
device may be battery operated with a small speaker giving a portable Vneeshree Speech
7. Another important aspect is the hieh precision of pitch value. Conventional algorithm for
the detection of pitch is SIFT. This gives the value of pitch in steps of four eg 36,~40, 44
and so on. In the Speech Synthesizer according to the present invention, an interpolation
program (Lagrange method) is used along with_SIFT. This gives the value of pitch in step"
of one This improves the clarity of synthesis.
1. A Speech Synthesizer for vocally handicap person which
consists of the conponent parts namely the main synthesizer
chip ADSP-2101, three Buffers 74LS373(1), 74LS373(2) and
74LS373(3), Multivibrators 556 and 74LS76, two Counters
74LS160U) and 74LS160(2), two LED displays TIL311, Crystal
16 MHz, Memory EPROM27C020 , Transistor 2N2369, Converter
DAC AD7545 which are interconnected as shown in Figure 1
of the accompanying drawings, as herein described, to
constitute the Speech Synthesizer
2. The Speech Synthesizer as claimed in claim 1 wherein the
Speech Synthesizer is connected to an Audio- amplifier
and a speaker as herein described so that the synthesized
speech is audible loudly in the house or premises.
3. The Speech Synthesizer as claimed in claims 1 and 2
wherein a set of foot-operated or chin-operated switches
4. The Speech Synthesizer substantially as herein described
and illustrated in the accompanying drawings.
|Indian Patent Application Number||784/DEL/1997|
|PG Journal Number||13/2009|
|Date of Filing||26-Mar-1997|
|Name of Patentee||THE DIRECTOR, INDIAN INSTITUTE OF TECHNOLOGY.|
|Applicant Address||INDIAN INSTITUTE OF TECHNOLOGY, KANPUR- 208016|
|PCT International Classification Number||G10L 31/02|
|PCT International Application Number||N/A|
|PCT International Filing date|