Title of Invention	"A SYSTEM AND METHOD FOR MEASURING AN ABILITY
Abstract	A system for measuring an ability of a subject is provided. The system includes a set of tasks (10) that require the subject to provide one or more spoken responses. A speech recognition system (20), which is coupled to receive the spoken responses, provides an estimate (50) of the spoken responses. The estimate (50) may be an estimate of the linguistic content and/or other characteristics of the response. The speech recognition system (20) has an associated operating characteristic relating to its ability to recognize and estimate the content of responses and elements of responses. A scoring device (30) converts the response estimate (50) into one or more item scores (60). A computation device (40) provides a subject score (70) using a scoring computation model that depends upon the expected item-dependent operating characteristics of the speech recognition system (20).

Title of Invention

"A SYSTEM AND METHOD FOR MEASURING AN ABILITY

Abstract

A system for measuring an ability of a subject is provided. The system includes a set of tasks (10) that require the subject to provide one or more spoken responses. A speech recognition system (20), which is coupled to receive the spoken responses, provides an estimate (50) of the spoken responses. The estimate (50) may be an estimate of the linguistic content and/or other characteristics of the response. The speech recognition system (20) has an associated operating characteristic relating to its ability to recognize and estimate the content of responses and elements of responses. A scoring device (30) converts the response estimate (50) into one or more item scores (60). A computation device (40) provides a subject score (70) using a scoring computation model that depends upon the expected item-dependent operating characteristics of the speech recognition system (20).

Full Text	A SYSTEM AND METHOD FOR MEASURING AN ABILITY OF A SUBJECT AND AN APPARATUS AND METHOFD FOR DETERMINING A DIFFICULTY VALUE OF ITEMS IN A TEST Copyright Notice A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the paten! document or the patent disclosure, as it appears in records of the United States Patent and Trademark Office, but otherwise reserves all copyright rights whatsoever, FIELD OF THE INVENTION The present invention relates to a system and method for measuring an abiiit of a subject and an apparatus and method for determining a difficulty value of Items ii a test and particularly to automated assessment of human abilities. A method and apparatus are provided for automated language assessment using speech recognition and a scoring computation model that accounts for the expected accuracy of the speech recognition. In a preferred embodiment, the model is based on Item Response Theory. Background of the Invention Interactive language proficiency testing systems using speech recognition are known, For example, US. Patent No. 5,870,709, issued to Qrdinate Corporation, describes smb. a system. In U.S. Parent No. 5,870,709, the contents of which are incorporated herein by reference, an interactive computer-based system is shown in which spoken responses are elicited from a subject by prompting the subject. The prompts may be, for example, requests for information, a request to read or repeat a word, phrase, sentence, or larger linguistic unit, a request to complete, fill-la, or identify missing elements in graphic or verval aggregates, or my similar presentation dial conventionally serves as a prompt to speak. The system then extracts linguistic content, speaker state, speaker identity, vocal reaction time, rate of speech, fluency, pronunciation skill, native language, and other linguistic, indexical, or paralinguistie information from the incoming speech, signal. 1 The subject's spoken responses may be received a the interactive computer-based system via Telephone or other telecommunication or data information network, or directly through a transducer peripheral to the computer system. It is then desirable TO evaluate the subject's spoken responses and draw inferences about the subject's abilities or states. A prior art approach to automatic pronunciation evaluation is discussed in Bernstein et al, "Automatic Evaluation and Training in English Pronunciation," Int'l. Conf. on Spoken Language Processing, Kobe, Japan (1990), the contents of which are incorporated herein by reference. This approach includes evaluating each utterance from subjects who are reading a preselected set of scripts for which training data has been collected from native speakers. In, this system, a pronunciation grade may be assigned to a subject performance by comparing the subject's responses to a model of the responses from the native speakers. One disadvantage of such an evaluation system is that it may not properly weigh the importance of different items with regard to their relevance to the assessment- A further disadvantage to this evaluation technique is that it typically does not account for the accuracy, or more importantly the inaccuracy, of the speech recognition system. Known speech recognition systems may interpret a response incorrectly. For example, speech recognition systems typically are implemented with a predetermined vocabulary. Such a system is likely to react inaccurately to a response that falls outside of the vocabulary. Speech recognition systems also may make errors in recognizing responses to items that are in the vocabulary, particularly short words. As used herein, "recognizing" a response means recognizing the linguistic content and/or other characteristics of the response. The accuracy of the speech recognition system may be thought of as a measure of the character and quantity of errors made by the speech recognition system. It would therefore be desirable to have an improved automated language assessment method and apparatus. 2 Accordingly, the present invention provides a system for measuring an ability of a subject, comprising: a first sent of task items that require the subject to provide one or more spoken responses; a speech recognition system coupled to receive the spoken response and to provide an estimate of the spoken response, the speech recognition system having an associated accuracy; a scoring device, the scoring device being operable to convert the estimate into an item score; and a computation device, the computation device providing a subject score based on a combination of item scores using a scoring computation model that depends upon an expected item-dependent operating characteristic of speech recognition system. Accordingly, the present invention also provides a method for measuring an ability of a subject, comprising: providing a set of task items; generating a difficulty value for each task item in the set, the difficulty value being based upon the task item and a performance measurement associated with an automatic device that measures task performance; obtaining a response to each task item from the subject; and combining the difficulty values and the responses to form a subject score. Accordingly, the present invention further provides for determining a difficulty value of items in a test, comprising: a set of responses to the items from a number of individuals; an automated grader, wherein the automatic grades receives the set of responses and provides graded responses; and means for reducing the graded responses to a set of item difficulties said item difficulties reflecting an ability of the automatic grader to accurately grade the set of responses. 3 Accordingly, the present invention further provides for determining a difficult value of items in a text, comprising: obtaining a set of responses to the items from a number of individuals; automatically grading the set of responses, thereby generating graded responses; and reducing the graded responses to a set of item difficulties, said item difficulties Including a measurement of accuracy for the act of automatically grading the set o responses. Brief Description of the Drawings The preferred embodiments af the present invention are illustrated by way of example, and not limiorton, in the figures of tibe accompanying drawings in which; Figure 1 illustrates a functional diagram of an apparatus for automated language assessment; Figures 2A and 2B illustrate a set of instructions and a set of tasks, respectively, that maybe provided to a subject of the system shown in Figure i; Figure 3 illustrates a speech recognition system that may be used ia me appararas shown in Figure 1; and Figure 4 is a flow chart illustrating a method for measuring an ability of a subject in accordance with a preferred embodiment of the present invention. Brief Description of the Appendices The preferred embodiments are further illustrated by way of examples, and not limitation, in the appended pseudo-code segments in which: Appendix 1 iltetranes a software implementation that is capable of reducing an estimate of the words of each response to an item score; and Appendix 2 illustrates a software implementation of a computation of a subject score for die subject by combining item scores using Item Response Theory. Detailed Description of the Presently Preferred Embodiment(s) Described herein with reference ID the above mentioned figures and appendices, wherein like numerals designate like pans and components, is a method and apparatus for automated language assessment. More particularly, a method and apparatus are provided for automated language assessment using speech recognition and a scoring computation model that accounts, either implicitly or explicitly, for the accuracy of a speech recognition system. As described further below, the preferred embodiments provide the advantage of allowing a subject's ability to be more accurately assessed then would be otherwise possible with an imperfect automatic speech recognition system. In accordance with one preferred embodiment, the scoring computation model is a statistical model based upon Item Response Theory, Figure 1 illustrates a functional block diagram of an interactive system for measuring the ability of a subject The term "subject" may be used herein to refer to an individual who is taking a test, the examinee. For reasons that will become evident below, the term subject, as used herein, shall not mean an individual who provides sample responses to assist in the construction of a scoring computation model The interactive system includes a set of tasks 10 that require the subject to provide a spoken response. Eisner alternatively or in addition to a spoken response, other types of responses may be taken as input to the system- A speech recognition system 20 is coupled to receive the spoken response. The speech recognition system 20 provides an estimate of the words in the spoken response to a scoring device 30, where the estimate is converted into an item score. Either alternatively or in addition to the scoring device 30, other analysis devices may be used as part of the process of reducing subject responses to item scores. Each task in the ser of tasks 10 includes one or more items- A computation device 40 receives toe item scores for the set of rasks 10 from the scoring device 30. The computation device 40 then 4 provides a subject score based on a combination of the item scores using a scoring ' compulation model that accounts, either implicitly or explicitly, for the accuracy of a speech recognition system 20. In accordance with a preferred embodiment of the present invention, the scoring computation model is constructed and applied using Item Response Theory. Other Techniques may alternatively be utilized, as long as the scoring computation model depends upon the expected item-dependent operating characteristics of the speech recognition system 20. In addition, the set of tasks 10 preferably is provided to the subject in the form of an instruction sheet, verbal instructions, visual instructions and/or any combination of the foregoing. For cases in which the set of tasks 10 is provided to the subject in written form, the set of tasks 10 may be printed in the form of a booklet or brochure, for example. Alternatively, the set of. tasks 10 may be presented to the subject on a monitor, video display or the like. The subject preferably communicates his or her responses via a telephone, although a microphone or other voice transducer may alternatively be used. The speech recognition system 20 may be a commercially available software product thax is run on a general purpose computing platform 42. For example, the speech recognition system 20 may be the Enrropic HTK software product, which is available from Entropic, Inc., located in the Washington, D.C. and Cambridge, United Kingdom. The Eniropic HTK software running on the general purpose computing platform provides an interactive computer-based system, which the subject may access by peripheral transducer, or by telephone or other telecommunication or data information network using known communication-techniques. Like the speech recognition system 20, the scoring device 30 and the computation device 40 are preferably functional modules, as runner described below, associated "with the general purpose computing platform. s Figures 2A and 2B illustrate a set of insnucrions for a subject and a set of tasks that require the subject to provide spoken responses, respectively. The set of tasks shows in Figure 2B is designed to measure facility in spoken English or other aspects of oral language proficiency. A test is administered over a telephone connection by an interactive system, such as the system shown in Figure 1. Figure 2A sets forth the test instructions, while Figure 2B sets forth the test structure and example questions. For this embodiment, the subject dials into the interactive system using a predetermined telephone number in order to take the test. Once a connection is established, the interactive system provides directions to the subject over the telephone connection and the subject provides responses. For the embodiments shown in Figures 2A and 2B, the set of tasks has five sections and corresponding instructions are provided. In part A of the set, the subject will be instructed 10 read selected sentences from among those printed in pan A of Figure 2B. In part B, the subject will be instructed to repeat sentences played by the interactive system. In pan C, the subject Is instructed to say the opposite word for a word provided by the interactive system. In part D, the interactive system generates a series of questions and the subject responds with a single word or a short phrase. Finally, in pan E, the subject will be asked two essay-type questions and will be asked to respond within a predetermined period of time, such as 30 seconds. The set of tasks shown in Figures 2A and 2B is designed for English language learners with at least basic reading skills- Alternative sets of tasks may be devised by those skilled in the art after reviewing this patent specification. For example, other languages or skill levels may be tested by providing an alternative set of tasks 10. For the illustrations shown in Figures 2A and 2B, every item requires the subject to understand a spoken utterance and to speak in response to it. Alternative tests may be devised for testing the ability of the subject to comprehend written or graphically displayed items. These and other alternatives are 6 expressly intended to fall within the scope of the present invention as long as at least a portion of the test requires the subject to provide 8 spoken response. The subject's language skills are then assessed by the interactive system based on the exact words used in spoken responses and a scoring computation model relating to the set of tasks 10. The system may also consider the latency, pace, fluency, and pronunciation of the words in phrases and sentences. The scoring computation model may be constructed in numerous ways. Nonetheless, in accordance with a preferred embodiment of the present invention, the scoring conxputanon model is constructed as follows. Sets of task items are presented to appropriate samples of native and non-native speakers and the responses of these sample speakers to these task items are recorded and analyzed by speech processing and recognition and/or by human transcription and linguistic description. Native speaker samples include individuals selected with reference to the range and incidence of demographic, linguistic, physical or social variables that can have a salient effect on the form or content of the speech as received at the speech recognition system. These demographic, linguistic, physical or social variables include a speaker's age, size, gender, sensory acuity, race, dialect, education, geographic origin or current location, employment, or professional training- Speech samples are also selected according to the time of day at the individual's location, the type and condition of the signal transducer, and the type and operation of the communication channel. Native response samples are used in the development of the scoring computation model to define or verify the linguistic and extra-linguisric content thai is expected or that is scored as correct, and to quantify and ensure equity in test scoring. 7 Non-native speaker samples include: individuals selected with reference to the range and incidence of demographic, linguistic, physical or social variables that can have a salient effect on the form or content of the speech as received ai the speech recognition system. For the non-native speakers, these demographic, linguistic, physical or social variables include the identities of a speaker's first, second, or other languages, level of skills in the target language of the test or any other language or dialect, the age, size, gender, race, dialect, education, geographic origin or current location, employment, or professional training. Speech samples are also selected according to the time of day at the individual's location, the type and condition of the signal transducer, and the type and operation of the communication channel. Non-native response samples are used in the development of the scoring computation model to define or verify the linguistic and extra-linguistic content of the responses that is expected or that is scored as correct, to define or calibrate the range of scoring, and co quantify and ensure equity in test scoring. In accordance with mis embodiment, the scoring computation model is therefore constructed based upon the responses of the sample speakers. Statistical analysis of the responses of the sample speakers allows a scaring computation model to be constructed that accounts for inaccuracies in the automated speech recognition system 20. In addition, the responses of the sample speakers may be used to generaie a difficulty value for each item in the set of task items. By applying a statistical model, such as one of those consistent with Item Response Theory for example, to the responses of the sample speakers, the measure of item difficulty may implicitly take into account the inaccuracy of the speech recognition system 20 as weU as the subjects' difficulty with the given item. Preferably, the information regarding item difficulty is included in the scoring computation model. Returning again to an automated assessment of the subject's language skills, the subject's spoken responses are digitized and passed to the interactive grading system 42. 8 Figure 3 is functional block diagram of this portion of the grading system. The functional elements of the interactive system include the speech recognition system 20, the scoring device 30, and the compulation device 40, As noted above, the speech recognition sysesi 20 may be a commercially available system, such as the Entropic HTK. system. The scoring device 30 and the computation device 40 may be implemented in software. Pseudocode implementations for ihe scoring device and the compulation device are provided to Appendix 1 and Appendix 2 hereto, respectively. The speech recognition system 20 receives ihe spoken response from the subject and provides to the scoring device 30 an estimaie 50 of the words in the spoken response. The scoring device 30 converts ihe estimate 50 into an item score 60. An item is a single task to be performed by the subject, calling for a word, phrase or sentence response. The coarpurarion device 40 provides a subject score 70 based on a combination of item scores using a scoring compuurion model that depends upon the expected item-dependent operating characteristics of the speech recognition system 20, such as the scoring computation model described above, in accordance with a preferred embodiment, the scoring computation model is consistent with Item Response Theory. A pseudo-code implementation of the scoring device 30 is provided in Appendix 1 hereto. For this embodiment, the scaring device 30 converts the estimate into an item score by counting the number of insertions, del scions and substitutions needed to convert the spoken response into one of the correct response;. The purpose of ihe scoring device module 30 is to compare two phrases and compute how many differences there are between the two at the word level. The total number of differences is the smallest number of insertions, substitutions, and deletions of words required to transform the first phrase iota the second phrase. This total number of differences may also be referred to herein as the "item score." Note, however, that in accordance with a preferred 9 embodiment, insertions of words a: the beginning or end of the phrase may not be counted, For example, the number of word differences between: "Ralph was a small mouse" and "Well Ralph was was a house" may be computed as fellows: Insertion of"well" - not counted (leading insertion) Insertion of second "was" -1 insertion Deletion of "small" -1 delection Substitution of "house" for "mouse;" -1 substitution for a total of 3 differences. Fee any given pair of phases,there are multiple sets of ttaasfcimnions that are possible For example, in the above, we could have interpreted. the transfopnaiiaiis as a deletion of "mouse" "and an insertion of "house" rather than a substitution of "house" for mouse". However, the scaring device, as implemented in Appendix. 1 hereto, returns a set of transformations that give the smallest total number of differences. Thus, the substitution alternative would have been chosen over the deletion/insertion alternative. "The count of errors (3 for the example item set forth above) may then be weighted on an item-by-item basis by die computation device, as described below. For purposes of efficiency, the first step in the scoring device module 30 set forth in Appendix 1 hereto is to convert the list of words in each phrase into a list of integers-, where each integer represents a single word- This conversion is performed by "phraseToWordHashO" An advantage to doing this is that it is faster to compare two integers than comparing each letter of two wards. It is to be understood that an item score may be computed in any other way without departing from the present invention. The DiffCountO procedure described in Appendix I 10 hereto is an example of one way in which the item score may be obtained. For purposes of the preferred embodiments described herein, the iiem score may be thought of as any measure or set of measures derived from a subject's; response to a single item. Alternative approaches for obtaining an item score are well within the capabilities of those skilled in the an. For example, an item score may be obtained by determining whether the response as a whole is correct or incorrect, i.e. no errors (score is 0) versus any number of errors (score is 1), or whether the spoken response includes, as a portion thereof the correct response. An item score may also include non-numeric elements such as words, phrases or srrucrural descriptions that are estimated by analysis of item responses. A pseudo-code implementation of the computation device 40 is provided in Appendix 2 hereto. As noted above, the computation device 40 provides a subject score 70, which is indicative of aspects of the subject's language proficiency. In accordance with a preferred embodiment of the present invention as described in Appendix 2 hereto, the subject score 70 is based on a combination of a series of item scores 60 using Item Response Theory. Item Response Theory provides one approach by which the contribution of item scores on individual items to an underlying measure can be established. Specifically, it provides a tool by which the difficulties of items can be mapped onto linear scales in a consistent way. The application of Item Response Theory analysis to an exam scored by automatic speech recognition provides the advantage of combining not only the difficulty the is expected to experience on a given item, but also the diffculty that the automatic speech recognition system.20 is expected to have in correctly recognizing the response, or any pan thereof- As a result, the individual's ability is more accurately assessed than would otherwise be possible with an imperfect automatic speech recognition system. Further details on Item Response Theory may be found in "Introduction to Classical and Modem Test Theory", authored by India Crocker and James Algina, Harcourt Brace Jovanovich College Publishers 11 (1986), Chapter 15; and "Best Test Design.; Rasch Measurement", by Benjamin D. Wright and Mark H. Stone, Mesa Press, Chicago, Illinois (1979), die contents of both of which are incorporated herein by reference. Once a subject's responses have been graded on an item level, for example each response has been graded as right or wrong, or the number of errors in each response has been determined, then the item scores 60 need to be combined imo a subject score 70 for the individual. One way of doing this is to simply total the item scores 60 to give a total number correct or a total number of errors. However, this does not capture the differing difficulty among the items, nor does it capture the item-dependent operating characteristics of the speech recognition system 20. A subject who received more difficult items (or items that are poorly recognized) would end up with a lower subject score 70 than another subject of equal ability who received, by chance, easier items. A better way of combining the item scores 60 is to use a scoring computation model that depends upon the expected item-dependent operating characteristics of the speech recognition system 20 and the item difficulty, such as a scoring compulation model based on Item Response Theory, In particular, the computation device 40 imposes a statistical model on the subject's responses and comes up with a "best" estimate of the subject's ability given the items and the pattern of the responses. Aa well as properly handling the difficulties of the items given to the subject, mis approach offers another important advantage with respect to speech proficiency testing. Because speech recognition systems, such as the speech recognition system 20 in Figures 1 and 3, may be imperfect, items will at times be incorrectly graded (i.e. incorrect responses may be graded as correct or vice versa). The error behavior of the speech recognition system 20 is item dependent-different items exhibit different recognizer error patterns. By applying the scoring computation model to the item scores 60, the computation device 40 implicitly captures and accounts for this error. In accordance with this embodiment, items that are often misrecognized by the speech recognition system 20, which make it appear that the subject committed more errors than the subject actually did, end up being assigned a high difficulty value. The result of this is that the items that are misrecognized do not penalize The subject as severely in terms of subject score 70. Items that are more accurately recognized by the speech recognition system 20 affect the examinee's subject score 70 more significantly. Thus, the statistical operations associated with the application of scoring computation model to the item scores 60 serve to de-emphasize the effects of speech recognizer errors. The computation device 40 module, as set forth in Appendix 2 hereto, applies a scoring computation model to a set of item scores 60 to compute an ability measure along with a confidence interval for that measure. As input, the RaschMeasureO routine takes a set of non-negative integers indicating the item scores for a particular set of items. Also input is an array of item difficulties that are used in the Item Response Theory computation. The RaschMeasureO routine then uses these inputs to estimate the ability of the Test-taker using the Rosen model from Item Response Theory (which, is well known to those skilled in the art). It does this by computing the likelihood of the given set of responses for a range of assumed values for the caller ability (in accordance with one embodiment, the range is fixed from -8.0 w +8.0 in steps of .01). Other ranges and step sizes may alternatively be used. These likelihoods are then normalized to give a probability density (PDF) for the subject's ability. The expected value of the subject's ability can men be computed by integrating under the PDF. This is the value returned. The confidence interval is defined as the 0.1 and 0.9 points on the cumulative density function (the integral of the PDF), and These two values are also returned. The computation device 40 may alternatively apply statistical combination Techniques other than the RaschMeasurcO routine. For example, the UCON model, the PAIR model, and the PROX model, all of which are well known techniques for applying Item Response Theory, 13 may bs used. Other statistical techniques may also be used, such as usmg an explicit measure - of speech recognition accuracy to weight the item scores. The subject score 70, in which the item scores 60 are combined using a scoring computation model thai depends upon the expected iiejn-dependemt operating characteristics of the speech recognition system 20, provides a better measure of the subject's ability than, does the item score 60. la particular, the subject score 10 includes item scores 60 that "re properly weighted with regard to both the item's relevance to the assessment and to the accuracy with which the speech recognition system operates on the item or its elements. Using Item Response Theory, for example, the difficulty of the items can be mapped onto a linear scale in a consistent way. By normalizing the problem of the speech recognition system 20 incorrectly rccognimg the items in the subject's response, the subject's ability can be more accurately assessed. Funtoermore, the Item Response "Theory methods assume ihe undcrlying parametric model and derive from the data the single most representative dimension to explain the observed item scores 60 by including the expected characteristics of both the subject performance and the speech recognition performance. As described above, the scoring compulation model operates on an item-by-item basis. In accordance with a further alternative embodiment of the present invostion,. the scoring computation model is even more finely tuned to operate on elements of the response to an item. For example, an item may be, "Repeat the sentence 'Ralph went to the store."' One subject responds, "Ralph went the store," A second subject responds, "Ralph wait to the." If the item score is determined by counting deletions, as described above with referenee to Appendix 1, then both subjects would receive an item score of one error for this item. The word deleted by the first subject can be said to have been weighted equally to the word deleted by the second subject, in accordance with the alternative embodiment, however, the elements within the item may be weighted differently, For the example above, the deletion of "store" by the second subject would be scored differently, or weighted more heavily, than The deletion of "to" by the first subject This may be particularly appropriate in situations where the speech recognition system 20 has difficulty recognizing short words, such as the word "to." Figure 4 is a flow chart illustrating a method for measuring an ability of a subject in accordance with a preferred embodiment of the present invention. At step 80, a set of tasks is provided to a subject, and at step 90 a device that automatically measures performance of the casks is connected to the subject. A difficulty value was previously determined for each task item at step 100. In accordance with a preferred embodiment of the present invention, the difficulty value is based upon both the task item and upon a performance measure associated with an ability of the automated device to accurately assess performance of the task. For this embodiment, the automated device is the speech recognition system 20, as shown in Figures \ and 3 for example. At step 110, verbal responses to the tasks are obtained from the subject. Step 100 is typically performed in advance of steps 80,90,110, and 120, such as by collecting sample response from native and non-niirivc speakers as described above. The verbal responses and the difficulty values are combined at step 120 to form a subject score 70. The present embodiments preferably encompass logic to implement the described methods in software modules as a set of computer executable software instructions, A Central Processing Unit ("CPU") or general purpose microprocessor implements the logic that controls the operation of the interactive system. The microprocessor executes software that can be programmed by those of skill in the an to provide the described functionality. The software can be represented as a sequence of binary bits maintained on a computer readable medium including magnetic disks, optical disks, organic disks, and any other volatile or (e.g., Random Access memory ("RAM")) non-volatile firmware (e.g., Read Only Memory ("ROM")) storage system readable by ihe CPU. The memory locations where data bits are 15 maintained also include physical locations that have particular electrical, magnetic, optical, or organic properties corresponding to the stored data bits. The software instructions art: executed as data bits by the CPU with a memory system causing a transformation of ihc electrical signal representation, and the maintenance of data bits at memory locations in the memory system to thereby reconfigure or otherwise alter the unit's operation. The executable software code may implement, for example, the methods described above. It should be understood that the programs, processes, methods and apparatus described herein are not related or limited to any particular rype of computer or network apparatus (hardware or software), unless indicated otherwise. Various types of general purpose: or specialized computer apparatus may be used with or perform operations in accordance with the teachings described herein. In view of the wide variety of embodiments to which the principles of the present invention can be applied, it should be understood that the illustrated embodiments are exemplary only, and should not be taken ELS limiting the scope of the present invention- Fox example, the steps of the flow diagrams may be taken in sequences other than those described, and more or fewer elements may be used than are shown in the block diagrams. It should be understood that a hardware embodiment may take a variety of different forms. The hardware: may be implemented as a digital signal processor or general purpose microprocessor with associated memory and bus structures, an integrated circuit with custom gate arrays or an application specific integrated circuit ("ASIC"). Of course, the embodiment may also be implemented with discrete hardware components and circuitry. The claims should not be read as limited to the described order of elements unless stated to that effect. In addition, use of the term "means" in any claim is intended to invoke 35 U-S.C. §112, paragraph 6, and any claim without the word "means" is not so intended, 16 Therefore, all embodiments that come wilhin the scope and spirit of the following claims and cquivalents thereto are claimed as the invention. 17 Ap p e n d i x 1 o include oinclude 'DiftCoync-H* ¦include "ioq.h" static nc pntas"Tt>word"asntcanac cnar 'p. iftc "w. mt naxwordsi I ior i:"p;p--l [ whxl" tto(n]-o &t i-p ** %¦ \|[ i-p"" ".o fcfc p[lt"'n' fc // Skip to nojcc word whllt Cp 4fc "pi"' ' J p-.: while i 'p fct -p.-' o) P>-; J if i"P "- '_'! ( // S)t+p "U££A with word nuUDer tfDilo ("p tfc ¦?:¦¦ -) p-; 1 it t-p o" - i ( whvio i'p fct -p^a' ¦j p~": p": if ip" aojiwords) ( ioBBW9lWG_E"a. 'Too many words in pnraso "'.p); return n. " w(n)>0; continu: } if CP "- 0) tn]-tw[ol"6\|-( (intl"p)"l iwlri\|">3lJtI)"n[r]>"a7Hi) .-) It (w[njt n--: recurn n: 1 statue char O.An""(]>* XCCLC tnt nClanKs^fiz"of(Dlanks)-lr sctic cooc cnar -qpl"Q.- scatic con"t cnr ~n>2"0; scacic itit qwi: ¦c"tic inc m const ioc Dif (3: :5U?l/r=l, conjc inc Dif %: : WfSWT'l; coaac me Di(fa; ^D&wrsl: conac inc OlffS::LOINSWT=O: const inc Ditts::TRTNSWT"Q: o I " '[- " Q.scorei) * " \|"- "¦ a-idins " "/ rwturo a.- - IS // H"curn nwmr ot ins/d"l/ups n""ded to convert wi co trt cacic void naccnicoruc int -wi-int ni.consc me inc scuil.lnc dapeni t Dlfra O3; inc i; inc v2dpp4(S"0, liioat TEST cpwc ¦* 46Xank"(nbl4oW-dovcrxl r p 3 apl; it (-P o- o ') 1 tor 11 I'P ^ ¦ ¦> I CWJt " ". o': for (l-0:w3-9w2>i;p") I if \|'p". o -\| ) SOT ii"Or-(4i If t'p "" ¦ o) 1 couc ¦* te tm > MJ ( d.o4"l"ni-(v3,- ) isa ( it ltb"Ad) 4-ld4n""02-ol; ol"" AC l"CC"il) d-crin"a3-Dl; d.nww"o3-ni: d. tuukwiU: J it tn4j.tf">0) flot-o dona; // Br"k t/i" cfMad 8ttus co 0f"0 ¦ ""ccti RiccBiwi"i.nl-X-"2-l.f^-l .¦xdiEC,r ix 103-ScoreD -^ 4.Scoreni d"d2: " 19 it (detail tfr wl(nj-l]""ta2fn3-ll ¦ --il*1 " r"2"0i I U 8r"" ch" ACCA^I status cd grab a maccn Platen 1-1 . "I (. w2.n2 - l .naxdiff. 3* . dtfiead. Q.depcOW) ; it \d?-<:ut a.score> d-03: ) if latTteaci ¦" Q\| \| // For c/i" rose, follow any iria'-cftes we can get unite {nlO kk n2>0 ii. ovl"'-Zi I ni--;wl--n2--;2--; I 1 if (tCiJ " 0) ( // For cne resc, (ou" any (p-accne we c"n oec or"ii* tnlo * n2>o i* wimi-i i "*3 [nJ-ij i t ni--.- ) I if (nl - 0 \| \| n2 -' Oi ( 4f (nl"=Q\| ( if (ftcriea"ji olse if tactailj d- tnne-n2.- d.ninsan^: t else > QOCO a"an"; ) (or i j"l.- L if ci "=¦ wzrijj t ";"ppitari>i: E"rftk. I for ix"l.i J i£ ta-5cor"(> * wdiffi ma"diff-dScaretI: /v Inseccioo in i2 if liappar"i ( natcnt-l.ni, w2l,n2-\.aaxdift~ (atfiead?Dacfs: : UJl"SWT;Oif fs: : INSWT) ,d2,aif ucn"4df d2.ldi.ns,-else 42-iUn-: if (d2.Scoreu d43. if id.Scored ABMI [ f^a-Scorei i : I it Delation tf l2ppoC3) ( MCCJIiwi-l.nl-i. W2.B3,mM4iCf-0if :s.:nELWT.d2,acn""i4.4craj.i.dpw"n.-d2.ndel--: 11 ( ¦Udi £ f =fl. Score IJ ; ) I // 3uoceicur.ion if tea oo 1 fcfc ns - ii ( d3-ci""r > "i"" ( Wcch(wl-i.nl-l.w2-l.o2-l.flxdift-Di;fs: :SUBWT.a2 .ach""4.accaLl.d"pch .- 1 t( id2.Scored <:> UttUt TTST cout" tblknks(ne;anKs-d"pcn) " "->* ^^ a t"s4if return; J // Couac tn" diCCcrvncv becw"n two phrases in cem of dl"tlona.ins"rcioaa,"ub" inc OiUCouatteansc clue pl. copat cn.ir -p2, D^Cts tb"t) I iot wltlQC)]: inc "2[100]: ffpUpl; // cenmrc (tfir"e"" to ^ord n"tves iac 6i-Pnt" "i" cs>ge " 'wl"[; Utt L; for (i:"O-i coat " wlflj : for li"Qii cout " "2[iJ " ' : CWHC " )' * 11 Find t""t "cch Diff "orsc: worst¦i4inan2-oi; worse .M"o"ni: ) o!¦o ( Ner(t-n4>l3ni-p2: wor"t.aavib"n2: 1 ¦iea": TEST couc " 7\--cnil "'.- -Z r;2. worse -Scorai i . btrS'.. 1. -- ->: ¦ifdef TEST r"Ejrn o""t. Sicor"t t ¦ J "iea"f TEST flincl^d* 'Seaio.n> winunt "rgc. cr.ar "oi-gvllj ( Qttfi a: {voidtOif fCJunt iac-gw\|l\| ."rgv(2) ,d) .- cowc ** Di£S( o o " orqv[l\| o " a ** "ivii; ) oendi* Appendix 2 tinclude ' o UcUd* 'RtSCfe-H* tcitic mild* double olph>"ap'lpni; rscurn otlpM/\|l*4lpMj ; // Conput IMCR """ur of proficiency tfting Ui" nzmvy ocwvrvaiian* of '/ Cc"gary is Q..MC"C -c/ StwpsUl i" tfMr seep clibr"ttQU tor gaiag tra c"t"aory i to il ""ia iU"cMw4"ur"i inc nr""p. conac ojrj"ylc""Ditticulty" 4^d. coast int eac"9ery, £lo#c -v"lu". tiec '"¦">. fle"t -ciBBX) ( dlsatuschMasur-.li " -aaaeNia4auir" vi^ o " nrvip covue tlGMt CI " (tlcMOO.l.- /" SOI B;"Ac"d confidepca int"rv"i o/ caaac tlo"E "inocorc ¦ ieiaaci'1,0; CQtuc (loc BMEOCar* " ((lOACII-O; COMIC tlOt COF"C"P o i£loC).OlJ cause int u"caro"i:"8a"iinti ((¦*"coro-"in"cgr"i/"cqr""t"p-i); " Do nu""rici caiwCirien of PW "^ teubl* wt"u"O.O: QfHibl o; daupl* n}"i>0: dotSl" ai"pO; ao*bi" "pn o rurw oouoledOOi: Cor tcau.nscor"; ] double p-1; (or (IQC l>0:* tor H"O;B"(J(t] scep".S^xt>;k-) for \|U)C J"l;J dcKibi rp-c^Hi-LdtiJ .(UrtiCuley-idUl .sevpslj) .sc"pc;"l): Cor (k"3 :k ) tor ifcO:K pcun-"pk(k) : tnt JB"CCMI if (tdlit ."it "aC"goryUl) ( ttTMk.- > i t Ia""in"cer"> flBqt 8"tcwtg"uf'. ) " c"c9ory(iJ > pdC(jl"(tloacip; if \|p > Kl"pi ( ¦!oo"¦; ) ) da 1sta pk; ¦valu* " (Clue) ivcvm/ptoLtii, doubX" BwQt tor (JH)j }"n"eaT""cc9"upfiw p"UB07 Cor oc"p"u" P"w"panSI; cMwat"miQ"Gar"-] 'cor""t"vi 9iUt" n pdf; - T"Cim: J Tito.I T"ST void aia() ( /o TMC e"u 37. group 3 r"mlt" o/ tit 0 2A U"."tl itwHUUl \|=l J.i. .62, .29. -I.S9. -I.d9. -.S3. -67. 1.09, - -3"". -1.16); Hit category lJcfL.-.Q.l.l.X.l.l.l.,ll: lalae flo"t itenriitt(\|-i3.S9..62. -39,1.09.-1-l". .".-1.""",-.6?. - .3". -1.641 .- int c"c"got)'u('5.o.o.1.0,0.1.1,1,11: vend i-r float v&}.ciin.ci.m&x: (1OK. Cep6(i) " tQ.Q) .- Rasenneasureisiseoi liiomdiEf i /3iz"a£i ic_?di; 1^11. j.ce"di£t, l.Kt^ftS, prmtti'l It tt-lf:% lfnn1 val.citPin.cinwxi; J 25 I claim: 1. A system for measuring an ability of a subject, comprising: a first set of cask items that require the subject to provide one or more spoken responses; a speech recognition system coupled to receive the spoken response and to provide an estimate of the spoken response, the speech recognition system having an associated accuracy; a scoring device, the scoring device beins operable to convert the estimate into an item score; and a computation device, the computation device providing a subject score based on a combination of item scores using a scoring computation model that depends upon an expected item-dependem operating characteristic of the speech recognition system. 2. A system as claimed in claim 1, wherein the scoring computation model is based on Item Response Theory. 3. A system as claimed in claim 1, wherein the speech recognition system, the scoring device and the computation device comprise software modules running on a general purpose computing platform. 4. A system as claimed in claim 1, wherein the scoring computation model is constructed from a plurality of responses provided by a number of native and non-native speakers, the plurality of responses being prompted by a second set of task items. 5. A system as claimed in claim 1, wherein the estimate provided by the speech recognition system comprises an estimate of ihe linguistic content of the spoken response. 26 6. A system as claimed in claim 1, wherein at least one task in the first set of tasks is an item selected from the group consisting of a prompt to read a sentence aloud, a prompt to repeat a word, a prompt to repeat a phrase, a prompt to provide an opposite, a prompt to answer a question. 7. A method of measuring an ability of a subject, comprising the steps of: providing a set of task items ; generating a difficulty value for each task item in the set, the difficulty value being based upon the task item and a performance measurement associated with an automatic device that measures task performance ; obtaining a response to each task item from the subject; and combining the difficulty values and the responses to form a subject score. 8. A method as claimed in claim 7, wherein the performance measurement is a measure of an ability of the automatic device to accurately recognize the responses. 9. A method as claimed in claim 7, wherein the step of generating a difficulty value comprises the step of obtaining a plurality of sample responses from a group of sample speakers. 10. A method as claimed in claim 9, wherein the step of generating a difficulty value comprises the step of applying a statistical model to the plurality of sample responses. 11. A method as claimed in claim 7, wherein the step of combining the difficulty values and the responses comprises the step of applying a statistical model to the plurality of sample responses. -27- 12. A method as claimed in claim 7, wherein the performance measurement associated with the automatic device is based upon an operating characteristic of a speech recognition system. 13. A method of measuring an ability of a subject, comprising the steps of: providing a set of task and a device that automatically measures performance of the tasks ; determining a difficulty value for each task, wherein the difficulty value is based upon the task and upon a performance measure associated with an ability of the automatic device to accurately assess performance of the task ; obtaining verbal responses to the tasks from the subject; and combining the verbal responses and the difficulty values to form a subject score. 14. A method as claimed in claim 13, wherein the device comprises an automated speech recognition system. 15. An apparatus for determining a difficulty value of items in a test, comprising : a set of responses to the items from a number of individuals ; an automated grader, wherein the automatic grades receives the set of responses and provides graded responses ; and means for reducing the graded responses to a set of item difficulties said item difficulties reflecting an ability of the automatic grader to accurately grade the set of responses. 16. A method of determining a difficulty value of items in a text, comprising the steps of: -28- obtaining a set of responses to the items from a number of individuals ; automatically grading the set of responses, thereby generating graded responses ; and reducing the graded responses to a set of item difficulties, said item difficulties having a measurement of accuracy for the art of automatically grading the set of responses. *********** -29- A system for measuring an ability of a subject is provided. The system includes a set of tasks (10) that require the subject to provide one or more spoken responses. A speech recognition system (20), which is coupled to receive the spoken responses, provides an estimate (50) of the spoken responses. The estimate (50) may be an estimate of the linguistic content and/or other characteristics of the response. The speech recognition system (20) has an associated operating characteristic relating to its ability to recognize and estimate the content of responses and elements of responses. A scoring device (30) converts the response estimate (50) into one or more item scores (60). A computation device (40) provides a subject score (70) using a scoring computation model that depends upon the expected item-dependent operating characteristics of the speech recognition system (20).

Full Text

A SYSTEM AND METHOD FOR MEASURING AN ABILITY OF A SUBJECT AND AN APPARATUS AND METHOFD FOR DETERMINING A DIFFICULTY VALUE OF ITEMS IN A TEST
Copyright Notice
A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the paten! document or the patent disclosure, as it appears in records of the United States Patent and Trademark Office, but otherwise reserves all copyright rights whatsoever,

FIELD OF THE INVENTION
The present invention relates to a system and method for measuring an abiiit of a subject and an apparatus and method for determining a difficulty value of Items ii a test and particularly to automated assessment of human abilities. A method and apparatus are provided for automated language assessment using speech recognition and a scoring computation model that accounts for the expected accuracy of the speech recognition. In a preferred embodiment, the model is based on Item Response Theory.
Background of the Invention
Interactive language proficiency testing systems using speech recognition are known, For example, US. Patent No. 5,870,709, issued to Qrdinate Corporation, describes smb. a
system. In U.S. Parent No. 5,870,709, the contents of which are incorporated herein by reference, an interactive computer-based system is shown in which spoken responses are elicited from a subject by prompting the subject. The prompts may be, for example, requests for information, a request to read or repeat a word, phrase, sentence, or larger linguistic unit, a request to complete, fill-la, or identify missing elements in graphic or verval aggregates, or my similar presentation dial conventionally serves as a prompt to speak. The system then extracts linguistic content, speaker state, speaker identity, vocal reaction time, rate of speech, fluency, pronunciation skill, native language, and other linguistic, indexical, or paralinguistie information from the incoming speech, signal.
1

The subject's spoken responses may be received a the interactive computer-based system via Telephone or other telecommunication or data information network, or directly through a transducer peripheral to the computer system. It is then desirable TO evaluate the subject's spoken responses and draw inferences about the subject's abilities or states.
A prior art approach to automatic pronunciation evaluation is discussed in Bernstein et al, "Automatic Evaluation and Training in English Pronunciation," Int'l. Conf. on Spoken Language Processing, Kobe, Japan (1990), the contents of which are incorporated herein by reference. This approach includes evaluating each utterance from subjects who are reading a preselected set of scripts for which training data has been collected from native speakers. In, this system, a pronunciation grade may be assigned to a subject performance by comparing the subject's responses to a model of the responses from the native speakers.
One disadvantage of such an evaluation system is that it may not properly weigh the importance of different items with regard to their relevance to the assessment- A further disadvantage to this evaluation technique is that it typically does not account for the accuracy, or more importantly the inaccuracy, of the speech recognition system. Known speech recognition systems may interpret a response incorrectly. For example, speech recognition systems typically are implemented with a predetermined vocabulary. Such a system is likely to react inaccurately to a response that falls outside of the vocabulary. Speech recognition systems also may make errors in recognizing responses to items that are in the vocabulary, particularly short words. As used herein, "recognizing" a response means recognizing the linguistic content and/or other characteristics of the response. The accuracy of the speech recognition system may be thought of as a measure of the character and quantity of errors made by the speech recognition system. It would therefore be desirable to have an improved automated language assessment method and apparatus.
2

Accordingly, the present invention provides a system for measuring an ability of a subject, comprising:
a first sent of task items that require the subject to provide one or more spoken responses;
a speech recognition system coupled to receive the spoken response and to provide an estimate of the spoken response, the speech recognition system having an associated accuracy;
a scoring device, the scoring device being operable to convert the estimate into an item score; and
a computation device, the computation device providing a subject score based on a combination of item scores using a scoring computation model that depends upon an expected item-dependent operating characteristic of speech recognition system.
Accordingly, the present invention also provides a method for measuring an ability of a subject, comprising:
providing a set of task items;
generating a difficulty value for each task item in the set, the difficulty value being based upon the task item and a performance measurement associated with an automatic device that measures task performance;
obtaining a response to each task item from the subject; and
combining the difficulty values and the responses to form a subject score.
Accordingly, the present invention further provides for determining a difficulty value of items in a test, comprising:
a set of responses to the items from a number of individuals;
an automated grader, wherein the automatic grades receives the set of responses and provides graded responses; and
means for reducing the graded responses to a set of item difficulties said item difficulties reflecting an ability of the automatic grader to accurately grade the set of responses.
3

Accordingly, the present invention further provides for determining a difficult value of items in a text, comprising:
obtaining a set of responses to the items from a number of individuals;
automatically grading the set of responses, thereby generating graded responses; and
reducing the graded responses to a set of item difficulties, said item difficulties Including a measurement of accuracy for the act of automatically grading the set o responses.
Brief Description of the Drawings
The preferred embodiments af the present invention are illustrated by way of example, and not limiorton, in the figures of tibe accompanying drawings in which;
Figure 1 illustrates a functional diagram of an apparatus for automated language assessment;
Figures 2A and 2B illustrate a set of instructions and a set of tasks, respectively, that maybe provided to a subject of the system shown in Figure i;
Figure 3 illustrates a speech recognition system that may be used ia me appararas shown in Figure 1; and
Figure 4 is a flow chart illustrating a method for measuring an ability of a subject in accordance with a preferred embodiment of the present invention.
Brief Description of the Appendices
The preferred embodiments are further illustrated by way of examples, and not limitation, in the appended pseudo-code segments in which:
Appendix 1 iltetranes a software implementation that is capable of reducing an estimate of the words of each response to an item score; and
Appendix 2 illustrates a software implementation of a computation of a subject score for die subject by combining item scores using Item Response Theory.

Detailed Description of the Presently Preferred Embodiment(s)
Described herein with reference ID the above mentioned figures and appendices, wherein like numerals designate like pans and components, is a method and apparatus for automated language assessment. More particularly, a method and apparatus are provided for automated language assessment using speech recognition and a scoring computation model that accounts, either implicitly or explicitly, for the accuracy of a speech recognition system. As described further below, the preferred embodiments provide the advantage of allowing a subject's ability to be more accurately assessed then would be otherwise possible with an imperfect automatic speech recognition system. In accordance with one preferred embodiment, the scoring computation model is a statistical model based upon Item Response Theory,
Figure 1 illustrates a functional block diagram of an interactive system for measuring the ability of a subject The term "subject" may be used herein to refer to an individual who is taking a test, the examinee. For reasons that will become evident below, the term subject, as used herein, shall not mean an individual who provides sample responses to assist in the construction of a scoring computation model
The interactive system includes a set of tasks 10 that require the subject to provide a spoken response. Eisner alternatively or in addition to a spoken response, other types of responses may be taken as input to the system- A speech recognition system 20 is coupled to receive the spoken response. The speech recognition system 20 provides an estimate of the words in the spoken response to a scoring device 30, where the estimate is converted into an item score. Either alternatively or in addition to the scoring device 30, other analysis devices may be used as part of the process of reducing subject responses to item scores. Each task in the ser of tasks 10 includes one or more items- A computation device 40 receives toe item scores for the set of rasks 10 from the scoring device 30. The computation device 40 then
4

provides a subject score based on a combination of the item scores using a scoring ' compulation model that accounts, either implicitly or explicitly, for the accuracy of a speech recognition system 20.
In accordance with a preferred embodiment of the present invention, the scoring computation model is constructed and applied using Item Response Theory. Other Techniques may alternatively be utilized, as long as the scoring computation model depends upon the expected item-dependent operating characteristics of the speech recognition system 20. In addition, the set of tasks 10 preferably is provided to the subject in the form of an instruction sheet, verbal instructions, visual instructions and/or any combination of the foregoing. For cases in which the set of tasks 10 is provided to the subject in written form, the set of tasks 10 may be printed in the form of a booklet or brochure, for example. Alternatively, the set of. tasks 10 may be presented to the subject on a monitor, video display or the like. The subject preferably communicates his or her responses via a telephone, although a microphone or other voice transducer may alternatively be used.
The speech recognition system 20 may be a commercially available software product thax is run on a general purpose computing platform 42. For example, the speech recognition system 20 may be the Enrropic HTK software product, which is available from Entropic, Inc., located in the Washington, D.C. and Cambridge, United Kingdom. The Eniropic HTK software running on the general purpose computing platform provides an interactive computer-based system, which the subject may access by peripheral transducer, or by telephone or other telecommunication or data information network using known communication-techniques. Like the speech recognition system 20, the scoring device 30 and the computation device 40 are preferably functional modules, as runner described below, associated "with the general purpose computing platform.
s

Figures 2A and 2B illustrate a set of insnucrions for a subject and a set of tasks that require the subject to provide spoken responses, respectively. The set of tasks shows in Figure 2B is designed to measure facility in spoken English or other aspects of oral language proficiency. A test is administered over a telephone connection by an interactive system, such as the system shown in Figure 1. Figure 2A sets forth the test instructions, while Figure 2B sets forth the test structure and example questions.
For this embodiment, the subject dials into the interactive system using a predetermined telephone number in order to take the test. Once a connection is established, the interactive system provides directions to the subject over the telephone connection and the subject provides responses. For the embodiments shown in Figures 2A and 2B, the set of tasks has five sections and corresponding instructions are provided. In part A of the set, the subject will be instructed 10 read selected sentences from among those printed in pan A of Figure 2B. In part B, the subject will be instructed to repeat sentences played by the interactive system. In pan C, the subject Is instructed to say the opposite word for a word provided by the interactive system. In part D, the interactive system generates a series of questions and the subject responds with a single word or a short phrase. Finally, in pan E, the subject will be asked two essay-type questions and will be asked to respond within a predetermined period of time, such as 30 seconds.
The set of tasks shown in Figures 2A and 2B is designed for English language learners with at least basic reading skills- Alternative sets of tasks may be devised by those skilled in the art after reviewing this patent specification. For example, other languages or skill levels may be tested by providing an alternative set of tasks 10. For the illustrations shown in Figures 2A and 2B, every item requires the subject to understand a spoken utterance and to speak in response to it. Alternative tests may be devised for testing the ability of the subject to comprehend written or graphically displayed items. These and other alternatives are
6

expressly intended to fall within the scope of the present invention as long as at least a portion of the test requires the subject to provide 8 spoken response.
The subject's language skills are then assessed by the interactive system based on the exact words used in spoken responses and a scoring computation model relating to the set of tasks 10. The system may also consider the latency, pace, fluency, and pronunciation of the words in phrases and sentences.
The scoring computation model may be constructed in numerous ways. Nonetheless, in accordance with a preferred embodiment of the present invention, the scoring conxputanon model is constructed as follows.
Sets of task items are presented to appropriate samples of native and non-native speakers and the responses of these sample speakers to these task items are recorded and analyzed by speech processing and recognition and/or by human transcription and linguistic description.
Native speaker samples include individuals selected with reference to the range and incidence of demographic, linguistic, physical or social variables that can have a salient effect on the form or content of the speech as received at the speech recognition system. These demographic, linguistic, physical or social variables include a speaker's age, size, gender, sensory acuity, race, dialect, education, geographic origin or current location, employment, or professional training- Speech samples are also selected according to the time of day at the individual's location, the type and condition of the signal transducer, and the type and operation of the communication channel. Native response samples are used in the development of the scoring computation model to define or verify the linguistic and extra-linguisric content thai is expected or that is scored as correct, and to quantify and ensure equity in test scoring.
7

Non-native speaker samples include: individuals selected with reference to the range and incidence of demographic, linguistic, physical or social variables that can have a salient effect on the form or content of the speech as received ai the speech recognition system. For the non-native speakers, these demographic, linguistic, physical or social variables include the identities of a speaker's first, second, or other languages, level of skills in the target language of the test or any other language or dialect, the age, size, gender, race, dialect, education, geographic origin or current location, employment, or professional training. Speech samples are also selected according to the time of day at the individual's location, the type and condition of the signal transducer, and the type and operation of the communication channel. Non-native response samples are used in the development of the scoring computation model to define or verify the linguistic and extra-linguistic content of the responses that is expected or that is scored as correct, to define or calibrate the range of scoring, and co quantify and ensure equity in test scoring.
In accordance with mis embodiment, the scoring computation model is therefore constructed based upon the responses of the sample speakers. Statistical analysis of the responses of the sample speakers allows a scaring computation model to be constructed that accounts for inaccuracies in the automated speech recognition system 20. In addition, the responses of the sample speakers may be used to generaie a difficulty value for each item in the set of task items. By applying a statistical model, such as one of those consistent with Item Response Theory for example, to the responses of the sample speakers, the measure of item difficulty may implicitly take into account the inaccuracy of the speech recognition system 20 as weU as the subjects' difficulty with the given item. Preferably, the information regarding item difficulty is included in the scoring computation model.
Returning again to an automated assessment of the subject's language skills, the subject's spoken responses are digitized and passed to the interactive grading system 42.
8

Figure 3 is functional block diagram of this portion of the grading system. The functional elements of the interactive system include the speech recognition system 20, the scoring device 30, and the compulation device 40, As noted above, the speech recognition sysesi 20 may be a commercially available system, such as the Entropic HTK. system. The scoring device 30 and the computation device 40 may be implemented in software. Pseudocode implementations for ihe scoring device and the compulation device are provided to Appendix 1 and Appendix 2 hereto, respectively.
The speech recognition system 20 receives ihe spoken response from the subject and provides to the scoring device 30 an estimaie 50 of the words in the spoken response. The scoring device 30 converts ihe estimate 50 into an item score 60. An item is a single task to be performed by the subject, calling for a word, phrase or sentence response. The coarpurarion device 40 provides a subject score 70 based on a combination of item scores using a scoring compuurion model that depends upon the expected item-dependent operating characteristics of the speech recognition system 20, such as the scoring computation model described above, in accordance with a preferred embodiment, the scoring computation model is consistent with Item Response Theory.
A pseudo-code implementation of the scoring device 30 is provided in Appendix 1 hereto. For this embodiment, the scaring device 30 converts the estimate into an item score by counting the number of insertions, del scions and substitutions needed to convert the spoken response into one of the correct response;.
The purpose of ihe scoring device module 30 is to compare two phrases and compute how many differences there are between the two at the word level. The total number of differences is the smallest number of insertions, substitutions, and deletions of words required to transform the first phrase iota the second phrase. This total number of differences may also be referred to herein as the "item score." Note, however, that in accordance with a preferred
9

embodiment, insertions of words a: the beginning or end of the phrase may not be counted, For example, the number of word differences between:
"Ralph was a small mouse" and "Well Ralph was was a house" may be computed as fellows:
Insertion of"well" - not counted (leading insertion)
Insertion of second "was" -1 insertion
Deletion of "small" -1 delection
Substitution of "house" for "mouse;" -1 substitution for a total of 3 differences.
Fee any given pair of phases,there are multiple sets of ttaasfcimnions that are possible For example, in the above, we could have interpreted. the transfopnaiiaiis as a deletion of "mouse" "and an insertion of "house" rather than a substitution of "house" for mouse". However, the scaring device, as implemented in Appendix. 1 hereto, returns a set of transformations that give the smallest total number of differences. Thus, the substitution alternative would have been chosen over the deletion/insertion alternative. "The count of errors (3 for the example item set forth above) may then be weighted on an item-by-item basis by die computation device, as described below.
For purposes of efficiency, the first step in the scoring device module 30 set forth in Appendix 1 hereto is to convert the list of words in each phrase into a list of integers-, where each integer represents a single word- This conversion is performed by "phraseToWordHashO" An advantage to doing this is that it is faster to compare two integers than comparing each letter of two wards.
It is to be understood that an item score may be computed in any other way without departing from the present invention. The DiffCountO procedure described in Appendix I
10

hereto is an example of one way in which the item score may be obtained. For purposes of the preferred embodiments described herein, the iiem score may be thought of as any measure or set of measures derived from a subject's; response to a single item. Alternative approaches for obtaining an item score are well within the capabilities of those skilled in the an. For example, an item score may be obtained by determining whether the response as a whole is correct or incorrect, i.e. no errors (score is 0) versus any number of errors (score is 1), or whether the spoken response includes, as a portion thereof the correct response. An item score may also include non-numeric elements such as words, phrases or srrucrural descriptions that are estimated by analysis of item responses.
A pseudo-code implementation of the computation device 40 is provided in Appendix 2 hereto. As noted above, the computation device 40 provides a subject score 70, which is indicative of aspects of the subject's language proficiency. In accordance with a preferred embodiment of the present invention as described in Appendix 2 hereto, the subject score 70 is based on a combination of a series of item scores 60 using Item Response Theory.
Item Response Theory provides one approach by which the contribution of item scores on individual items to an underlying measure can be established. Specifically, it provides a tool by which the difficulties of items can be mapped onto linear scales in a consistent way. The application of Item Response Theory analysis to an exam scored by automatic speech recognition provides the advantage of combining not only the difficulty the is expected to experience on a given item, but also the diffculty that the automatic speech recognition system.20 is expected to have in correctly recognizing the response, or any pan thereof- As a result, the individual's ability is more accurately assessed than would otherwise be possible with an imperfect automatic speech recognition system. Further details on Item Response Theory may be found in "Introduction to Classical and Modem Test Theory", authored by India Crocker and James Algina, Harcourt Brace Jovanovich College Publishers
11

(1986), Chapter 15; and "Best Test Design.; Rasch Measurement", by Benjamin D. Wright and Mark H. Stone, Mesa Press, Chicago, Illinois (1979), die contents of both of which are incorporated herein by reference.
Once a subject's responses have been graded on an item level, for example each response has been graded as right or wrong, or the number of errors in each response has been determined, then the item scores 60 need to be combined imo a subject score 70 for the individual. One way of doing this is to simply total the item scores 60 to give a total number correct or a total number of errors. However, this does not capture the differing difficulty among the items, nor does it capture the item-dependent operating characteristics of the speech recognition system 20. A subject who received more difficult items (or items that are poorly recognized) would end up with a lower subject score 70 than another subject of equal ability who received, by chance, easier items. A better way of combining the item scores 60 is to use a scoring computation model that depends upon the expected item-dependent operating characteristics of the speech recognition system 20 and the item difficulty, such as a scoring compulation model based on Item Response Theory, In particular, the computation device 40 imposes a statistical model on the subject's responses and comes up with a "best" estimate of the subject's ability given the items and the pattern of the responses.
Aa well as properly handling the difficulties of the items given to the subject, mis approach offers another important advantage with respect to speech proficiency testing. Because speech recognition systems, such as the speech recognition system 20 in Figures 1 and 3, may be imperfect, items will at times be incorrectly graded (i.e. incorrect responses may be graded as correct or vice versa). The error behavior of the speech recognition system 20 is item dependent-different items exhibit different recognizer error patterns. By applying the scoring computation model to the item scores 60, the computation device 40 implicitly captures and accounts for this error. In accordance with this embodiment, items that are often

misrecognized by the speech recognition system 20, which make it appear that the subject committed more errors than the subject actually did, end up being assigned a high difficulty value. The result of this is that the items that are misrecognized do not penalize The subject as severely in terms of subject score 70. Items that are more accurately recognized by the speech recognition system 20 affect the examinee's subject score 70 more significantly. Thus, the statistical operations associated with the application of scoring computation model to the item scores 60 serve to de-emphasize the effects of speech recognizer errors.
The computation device 40 module, as set forth in Appendix 2 hereto, applies a scoring computation model to a set of item scores 60 to compute an ability measure along with a confidence interval for that measure. As input, the RaschMeasureO routine takes a set of non-negative integers indicating the item scores for a particular set of items. Also input is an array of item difficulties that are used in the Item Response Theory computation.
The RaschMeasureO routine then uses these inputs to estimate the ability of the Test-taker using the Rosen model from Item Response Theory (which, is well known to those skilled in the art). It does this by computing the likelihood of the given set of responses for a range of assumed values for the caller ability (in accordance with one embodiment, the range is fixed from -8.0 w +8.0 in steps of .01). Other ranges and step sizes may alternatively be used. These likelihoods are then normalized to give a probability density (PDF) for the subject's ability. The expected value of the subject's ability can men be computed by integrating under the PDF. This is the value returned. The confidence interval is defined as the 0.1 and 0.9 points on the cumulative density function (the integral of the PDF), and These
two values are also returned.
The computation device 40 may alternatively apply statistical combination Techniques other than the RaschMeasurcO routine. For example, the UCON model, the PAIR model, and the PROX model, all of which are well known techniques for applying Item Response Theory,
13

may bs used. Other statistical techniques may also be used, such as usmg an explicit measure -
of speech recognition accuracy to weight the item scores.
The subject score 70, in which the item scores 60 are combined using a scoring computation model thai depends upon the expected iiejn-dependemt operating characteristics of the speech recognition system 20, provides a better measure of the subject's ability than, does the item score 60. la particular, the subject score 10 includes item scores 60 that "re properly weighted with regard to both the item's relevance to the assessment and to the accuracy with which the speech recognition system operates on the item or its elements. Using Item Response Theory, for example, the difficulty of the items can be mapped onto a linear scale in a consistent way. By normalizing the problem of the speech recognition system 20 incorrectly rccognimg the items in the subject's response, the subject's ability can be more accurately assessed. Funtoermore, the Item Response "Theory methods assume ihe undcrlying parametric model and derive from the data the single most representative dimension to explain the observed item scores 60 by including the expected characteristics of both the subject performance and the speech recognition performance.
As described above, the scoring compulation model operates on an item-by-item basis. In accordance with a further alternative embodiment of the present invostion,. the scoring computation model is even more finely tuned to operate on elements of the response to an item. For example, an item may be, "Repeat the sentence 'Ralph went to the store."' One subject responds, "Ralph went the store," A second subject responds, "Ralph wait to the." If the item score is determined by counting deletions, as described above with referenee to Appendix 1, then both subjects would receive an item score of one error for this item. The word deleted by the first subject can be said to have been weighted equally to the word deleted by the second subject, in accordance with the alternative embodiment, however, the elements within the item may be weighted differently, For the example above, the deletion of

"store" by the second subject would be scored differently, or weighted more heavily, than The deletion of "to" by the first subject This may be particularly appropriate in situations where the speech recognition system 20 has difficulty recognizing short words, such as the word "to."
Figure 4 is a flow chart illustrating a method for measuring an ability of a subject in accordance with a preferred embodiment of the present invention. At step 80, a set of tasks is provided to a subject, and at step 90 a device that automatically measures performance of the casks is connected to the subject. A difficulty value was previously determined for each task item at step 100. In accordance with a preferred embodiment of the present invention, the difficulty value is based upon both the task item and upon a performance measure associated with an ability of the automated device to accurately assess performance of the task. For this embodiment, the automated device is the speech recognition system 20, as shown in Figures \ and 3 for example. At step 110, verbal responses to the tasks are obtained from the subject. Step 100 is typically performed in advance of steps 80,90,110, and 120, such as by collecting sample response from native and non-niirivc speakers as described above. The verbal responses and the difficulty values are combined at step 120 to form a subject score 70.
The present embodiments preferably encompass logic to implement the described methods in software modules as a set of computer executable software instructions, A Central Processing Unit ("CPU") or general purpose microprocessor implements the logic that controls the operation of the interactive system. The microprocessor executes software that can be programmed by those of skill in the an to provide the described functionality. The software can be represented as a sequence of binary bits maintained on a computer readable medium including magnetic disks, optical disks, organic disks, and any other volatile or (e.g., Random Access memory ("RAM")) non-volatile firmware (e.g., Read Only Memory ("ROM")) storage system readable by ihe CPU. The memory locations where data bits are
15

maintained also include physical locations that have particular electrical, magnetic, optical, or organic properties corresponding to the stored data bits. The software instructions art: executed as data bits by the CPU with a memory system causing a transformation of ihc electrical signal representation, and the maintenance of data bits at memory locations in the memory system to thereby reconfigure or otherwise alter the unit's operation. The executable software code may implement, for example, the methods described above.
It should be understood that the programs, processes, methods and apparatus described herein are not related or limited to any particular rype of computer or network apparatus (hardware or software), unless indicated otherwise. Various types of general purpose: or specialized computer apparatus may be used with or perform operations in accordance with the teachings described herein.
In view of the wide variety of embodiments to which the principles of the present invention can be applied, it should be understood that the illustrated embodiments are exemplary only, and should not be taken ELS limiting the scope of the present invention- Fox example, the steps of the flow diagrams may be taken in sequences other than those described, and more or fewer elements may be used than are shown in the block diagrams.
It should be understood that a hardware embodiment may take a variety of different forms. The hardware: may be implemented as a digital signal processor or general purpose microprocessor with associated memory and bus structures, an integrated circuit with custom gate arrays or an application specific integrated circuit ("ASIC"). Of course, the embodiment may also be implemented with discrete hardware components and circuitry.
The claims should not be read as limited to the described order of elements unless stated to that effect. In addition, use of the term "means" in any claim is intended to invoke 35 U-S.C. §112, paragraph 6, and any claim without the word "means" is not so intended,
16

Therefore, all embodiments that come wilhin the scope and spirit of the following claims and cquivalents thereto are claimed as the invention.
17

Ap p e n d i x 1
o include oinclude 'DiftCoync-H* ¦include "ioq.h"
static *nc pntas"Tt>word"asntcanac cnar 'p. iftc "w. mt naxwordsi I
ior i:"p;p--l [
whxl" tto(n]-*o &t i-p ** %¦ |[ i-p"" ".o fcfc p[lt*"'n' *fc
// Skip to nojcc word whllt Cp 4fc "pi"' ' J
p-.: while i 'p fct -p*.-' o)
P>-; J if i"P "- '_'! (
// S)t+p "U££A* with word nuUDer tfDilo ("p tfc ¦?:¦¦ -)
p-; 1 it t-p o" - i (
whvio i'p fct -p^a' ¦j
p~": p":
if ip" aojiwords) (
ioBBW9lWG_E"a. 'Too many words in pnraso "*'.p); return n.* "
w(n)>0; continu*: } if CP "- 0)
*tn]-tw[ol"*6|-( (intl"p)"l iwlri|">3lJtI)"n*[r]>"a7Hi) .-) It (w[njt
n--:
recurn n: 1
statue char O.An""(]>*
XC*CLC tnt nClanKs^fiz"of(Dlanks)-lr
sc*tic coo*c cnar -qpl"Q.-
scatic con"t cn*r ~n>2"0;
scacic itit *qwi:
¦c"tic inc m const ioc Dif (3: :5U?l/r=l, con*jc inc Dif %*: : WfSWT'l; coaac me Di(fa; ^D&wrsl: conac inc OlffS::LOINSWT=O: const inc Ditts::TRTNSWT"Q:
o I
* " '[- " Q.scorei) * " *|"- "¦ a-idins " "/* rwturo a.-
- IS

// H"curn nwmr ot ins/d"l/*ups n""ded to convert wi co trt
*cacic void naccnicoruc int -wi-int ni.consc me inc scuil.lnc dapeni t
Dlfra O3;
inc i;
inc v2dpp*4(S"0, liioat TEST
cpwc ¦** 46Xank"(nbl4oW-dovcrxl *r *p 3 apl;
it (-P o- o ')
1
tor 11 I'P ^ ¦ ¦> I CWJt " ". o':
for (l-0:w3-9w2>i;p*") I
if |'p". o -|
)
SOT ii"Or-(*4i If t'p "" ¦ o) 1 couc ¦* te tm > MJ (
d.o4"l"ni-(v3,-
) *isa (
it l*tb"Ad)
4-ld4n""02-ol; ol"" AC l"CC"il) d-crin*"a3-Dl;
d.nww"o3-ni: d. tuukwiU: J
it tn**4j.tf">0) flot-o dona;
// Br*"k t/i" *cfMad 8t*tus co 0f"0 ¦ ""ccti RiccBiwi"i.nl-X-"2-l.f^-l .¦*xdiEC,r ix 103-ScoreD -^ 4.Scoreni
d"d2: "
19

it (detail tfr wl(nj-l]""ta2fn3-ll *¦ --il**1 " r"2"0i I
U 8r""* ch" ACCA^I status cd grab a maccn
Platen 1-1 . "I (. w2.n2 - l .naxdiff. 3* . dtfiead. Q.depcOW) ;
it \d?-*<:ut a.score> d-03: )
if latTteaci ¦" Q| |
// For c/i" rose, follow any iria'-cftes we can get unite {nl*O kk n2>0 ii. ovl"'-Zi I ni--;wl--n2--;*2--; I 1
if (*tC*iJ " 0) (
// For cne resc, (ou" any (p-accne* we c"n oec or"ii* tnl*o ** n2>o i* wimi-i i "**3 [nJ-ij i t ni--.-
) I
if (nl *- 0 | | n2 -' Oi (
4f (nl"=Q| ( if (ftcriea"ji
olse if tactailj d- tnne-n2.-
d.ninsan^: t else >
QOCO a"an"; )
(or i j"l.- L if c*i "=¦ wzrijj t
";"ppitari>i:
E"r*ftk. I for ix"l.i J
i£ ta-5cor"(> * w*diffi ma"diff-dScaretI:
/v Inseccioo in i*2 if l*iapp*ar"i (
natcnt-l.ni, w2*l,n2-\.aaxdift~ (atfiead?Dacfs: : UJl"SWT;Oif fs: : INSWT) ,d2,aif ucn"4df
d2.ldi.ns**,-else
42-iUn**-: if (d2.Scoreu d*43.
if id.Scored ABMI [ f^a-Scorei i : I

it Delation
tf l*2*ppo*C3) (
MCCJIiwi-l.nl-i. W2.B3,mM4iCf-0if :s.:nELWT.d2,acn""i4.4craj.i.d*pw"n.-d2.ndel--:
11 ( ¦Udi £ f =fl. Score IJ ; ) I
// 3uoceicur.ion if tea oo 1 fcfc ns - ii ( d3-ci""r > "i"" (
W*cch(wl-i.nl-l.w2-l.o2-l.fl*xdift-Di;fs: :SUBWT.a2 .ach""4.accaLl.d"pch .-
1
t( id2.Scored <:> UttUt TTST
cout" tblknks(ne;anKs-d"pcn) " "->* ^^ a t"s4if
return; J
// Couac tn" diCCcrvncv becw"*n two phrases in cem of d*l"tlona.ins"rcioaa,"ub"
inc OiUCouatteansc clue *pl. copat cn.ir -p2, D^Cts tb"*t)
I
iot wltlQC)]:
inc "2[100]:
ffpUpl;
// cenmrc (tfir"e"" to ^ord n*"tves iac 6i-Pnt*" "i" cs>ge " 'wl"[*; Utt L;
for (i:"O-i coat " wlflj *: for li"Qii cout " "2[iJ " ' *: CWHC " *)' * 11 Find t""t *"cch Diff* "orsc:
worst¦i4ina*n2-oi; worse .M"o"ni: ) o!¦o (
Ner(t-n4*>l3ni-p2:
wor"t.aavib"n2: 1

¦iea": TEST
couc " 7\*--cni*l "'.- -Z r;2. worse -Scorai i . btrS'.. 1. -- ->:
¦ifdef TEST
r"Ejrn o""t. Sicor"t t ¦ J
"iea"f TEST
flincl^d* 'Seaio.n> winunt "rgc. cr.ar "oi-gvllj (
Qttfi a:
{voidtOif fCJunt iac-gw|l| ."rgv(2) ,d) .-
cowc ** *Di£S( o o " orqv[l| * o " a ** "ivii;
) oendi*

Appendix 2
tinclude ' o UcUd* 'RtSCfe-H*
tcitic mild* double o*lph*>"ap'*lpn*i;
rscurn otlpM/|l**4lpMj ;
// Conput* IMCR ""*"ur* of proficiency tfting Ui" nzmvy ocwvrvaiian* of
'/ C*c"gary is Q..MC"C*
-c/ StwpsUl i" tfMr seep c*libr"ttQU tor gaiag tra* c"t"aory i to i*l ""ia iU"cMw4"ur"i inc nr""p. conac ojrj"y*lc""Ditticulty" 4^d.
coast int *eac"9ery, £lo#c -v"lu". tie*c '"¦">. fle"t -ciBBX) ( dlsat*uschMa*sur*-.li "* -aaaeNia4auir" vi^ o " nrvip covue tlGMt CI " (tlcMOO.l.- /" SOI B*;"Ac"d confidepca int"rv"i o/
caaac tlo"E "inocorc ¦ ieiaaci'1,0;
CQtuc (lo*c BMEOCar* " ((lOACII-O;
COMIC tlO*t *COF"*C"P o i£lo*C).OlJ
cause int u"caro"i:"8a"iinti ((¦**"coro-"in"cgr"i/"cqr""t"p-i); " Do nu""ric*i caiwCirien of PW "^

teubl* wt"u*"O.O:
QfHibl* o;
daupl* n}"i>0:
dot*Sl" ai"p*O;
ao**bi" "pn o rurw oouoledOOi:
Cor t*cau.nscor"; ] double p-1;
(or (IQC l>0:* tor H"O;B*"(J(t] scep".S^x*t>;k*-)
for |U)C J"l;J dcKibi* rp-c^Hi-LdtiJ .(UrtiCuley-idUl .sevpslj) .sc"pc;"l): Cor (k"3 :k )
tor ifc*O:K pcun-"pk(k) :
tnt JB"CCMI if (tdlit ."it "aC"goryUl) (
ttTMk.-
>
i t Ia"*"in"cer">
flBqt *8"tcwtg"uf'. *) " c"c*9ory(iJ * >
pdC(jl"(tloacip; if |p > Kl"pi (
¦!oo"¦; ) )
da 1sta pk;
¦valu* " (Clue) ivcvm/ptoLtii, doubX" B*w*Qt tor (JH)j }"n"eaT""cc9"upfiw p"UB*07
Cor oc"p"u" P"w*"panSI; *cMwat"miQ"Gar"-] '*cor""t"vi
9iUt" n pdf; - T"Cim: J
Tito.I T"ST
void a*ia() (
/o TMC e"u 37. group 3 r"mlt" o/
tit 0
2A

U"."tl itwHUUl |=l J.i*. .62, .29. -I.S9. -I.d9. -.S3. -67. 1.09, - -3"". -1.16);
Hit category lJcfL.-.Q.l.l.X.l.l.l.,ll: lalae
flo"t itenriitt(|-i3.S9..62. -39,1.09.-1-l". .".-1.""",-.6?. - .3". -1.641 .-
int c"c"got)'u*('5.o.o.1.0,0.1.1,1,11: vend i-r
float v&}.ci*in.ci.m&x:
(1O*K. *Cep6(i) " tQ.Q) .-
Rasenneasureisiseoi liiomdiEf i /3iz"a£i ic_*?di*; 1^11. j.ce"di£t,
l.Kt^ftS,
prmtti'l It tt-lf:% lfnn1 val.citPin.cinwxi; J
25

I claim:
1. A system for measuring an ability of a subject, comprising:
a first set of cask items that require the subject to provide one or more spoken responses;
a speech recognition system coupled to receive the spoken response and to provide an estimate of the spoken response, the speech recognition system having an associated accuracy;
a scoring device, the scoring device beins operable to convert the estimate into an item score; and
a computation device, the computation device providing a subject score based on a combination of item scores using a scoring computation model that depends upon an expected item-dependem operating characteristic of the speech recognition system.
2. A system as claimed in claim 1, wherein the scoring computation model is
based on Item Response Theory.
3. A system as claimed in claim 1, wherein the speech recognition system, the
scoring device and the computation device comprise software modules running on a general
purpose computing platform.
4. A system as claimed in claim 1, wherein the scoring computation model is
constructed from a plurality of responses provided by a number of native and non-native
speakers, the plurality of responses being prompted by a second set of task items.
5. A system as claimed in claim 1, wherein the estimate provided by the speech
recognition system comprises an estimate of ihe linguistic content of the spoken response.
26

6. A system as claimed in claim 1, wherein at least one task in the first set of
tasks is an item selected from the group consisting of a prompt to read a
sentence aloud, a prompt to repeat a word, a prompt to repeat a phrase, a
prompt to provide an opposite, a prompt to answer a question.
7. A method of measuring an ability of a subject, comprising the steps of:
providing a set of task items ;
generating a difficulty value for each task item in the set, the difficulty value being based upon the task item and a performance measurement associated with an automatic device that measures task performance ; obtaining a response to each task item from the subject; and combining the difficulty values and the responses to form a subject score.
8. A method as claimed in claim 7, wherein the performance measurement is
a measure of an ability of the automatic device to accurately recognize the
responses.
9. A method as claimed in claim 7, wherein the step of generating a difficulty
value comprises the step of obtaining a plurality of sample responses from a
group of sample speakers.
10. A method as claimed in claim 9, wherein the step of generating a difficulty
value comprises the step of applying a statistical model to the plurality of sample
responses.
11. A method as claimed in claim 7, wherein the step of combining the
difficulty values and the responses comprises the step of applying a statistical
model to the plurality of sample responses.
-27-

12. A method as claimed in claim 7, wherein the performance measurement
associated with the automatic device is based upon an operating characteristic
of a speech recognition system.
13. A method of measuring an ability of a subject, comprising the steps of:
providing a set of task and a device that automatically measures
performance of the tasks ;
determining a difficulty value for each task, wherein the difficulty value is
based upon the task and upon a performance measure associated with an ability
of the automatic device to accurately assess performance of the task ; obtaining verbal responses to the tasks from the subject; and combining the verbal responses and the difficulty values to form a subject
score.
14. A method as claimed in claim 13, wherein the device comprises an
automated speech recognition system.
15. An apparatus for determining a difficulty value of items in a test,
comprising :
a set of responses to the items from a number of individuals ;
an automated grader, wherein the automatic grades receives the set of responses and provides graded responses ; and
means for reducing the graded responses to a set of item difficulties said item difficulties reflecting an ability of the automatic grader to accurately grade the set of responses.
16. A method of determining a difficulty value of items in a text, comprising
the steps of:
-28-

obtaining a set of responses to the items from a number of individuals ;
automatically grading the set of responses, thereby generating graded responses ; and
reducing the graded responses to a set of item difficulties, said item difficulties having a measurement of accuracy for the art of automatically
grading the set of responses.
***********
-29-
A system for measuring an ability of a subject is provided. The system includes a set of tasks (10) that require the subject to provide one or more spoken responses. A speech recognition system (20), which is coupled to receive the spoken responses, provides an estimate (50) of the spoken responses. The estimate (50) may be an estimate of the linguistic content and/or other characteristics of the response. The speech recognition system (20) has an associated operating characteristic relating to its ability to recognize and estimate the content of responses and elements of responses. A scoring device (30) converts the response estimate (50) into one or more item scores (60). A computation device (40) provides a subject score (70) using a scoring computation model that depends upon the expected item-dependent operating characteristics of the speech recognition system (20).

Documents:

« Previous Patent

Next Patent »

Patent Number

207198

Indian Patent Application Number

IN/PCT/2001/00049/KOL

PG Journal Number

22/2007

Publication Date

01-Jun-2007

Grant Date

31-May-2007

Date of Filing

12-Jan-2001

Name of Patentee

ORDINATE CORPORATION

Applicant Address

1040, NOEL DRIVE, SUIT 102, MENLO PARK, CALIFORNIA 94025

Inventors:

#	Inventor's Name	Inventor's Address
1	TOWNSHEND BRENT	156, UNIVERSITY DRIVE, MENLO PARK, CALIFORNIA 94025,

PCT International Classification Number

G 01L 15/08

PCT International Application Number

PCT/US00/13115

PCT International Filing date

2000-05-12

PCT Conventions:

#	PCT Application Number	Date of Convention	Priority Country
1	09/311,617	1999-05-13	U.S.A.