Title of Invention	APPARATUS AND A METHOD FOR INFORMATION RETRIEVAL
Abstract	. An apparatus for information retrieval comprising: an input unit, a database D storing the nodes and labels, a thesaurus storage unit for storing thesaurus T for defining a degree of similarity among the labels of nodes, a display unit, and a processing unit, wherein the processing unit receives, through the input unit, input for a retrieval query Q including information related to nodes, labels of nodes and links among the nodes; the processing unit finds a set F of solution candidates as a result of searching the database in response to the retrieval query Q by making reference to the thesaurus storage unit storing thesaurus T for defining a degree of similarity among the labels of nodes, using similarity among the labels defined by a subset R of thesaurus T according to links that are input, f and making reference to the database D storing the nodes and labels that are input; the processing unit displays a set F of solution candidates that are found on the display unit; the processing unit receives, through the input unit, input information related to whether some elements in the set F of solution candidates are 30 representing the solutions; the processing unit deletes some elements in the set F of solution candidates from the set F of solution candidates according to input information; the processing unit deletes, adds or changes the contents of the subset R and/or the retrieval query Q based on input information related to deleting, adding or changing the subset R of the thesaurus T and/or the retrieval query Q input through the input unit; and the processing unit returns to the step of retrieval if there is a request for re-retrieval from the user to repeat the processing, or ending the processing if there is no request.

Title of Invention

APPARATUS AND A METHOD FOR INFORMATION RETRIEVAL

Abstract

. An apparatus for information retrieval comprising: an input unit, a database D storing the nodes and labels, a thesaurus storage unit for storing thesaurus T for defining a degree of similarity among the labels of nodes, a display unit, and a processing unit, wherein the processing unit receives, through the input unit, input for a retrieval query Q including information related to nodes, labels of nodes and links among the nodes; the processing unit finds a set F of solution candidates as a result of searching the database in response to the retrieval query Q by making reference to the thesaurus storage unit storing thesaurus T for defining a degree of similarity among the labels of nodes, using similarity among the labels defined by a subset R of thesaurus T according to links that are input, f and making reference to the database D storing the nodes and labels that are input; the processing unit displays a set F of solution candidates that are found on the display unit; the processing unit receives, through the input unit, input information related to whether some elements in the set F of solution candidates are 30 representing the solutions; the processing unit deletes some elements in the set F of solution candidates from the set F of solution candidates according to input information; the processing unit deletes, adds or changes the contents of the subset R and/or the retrieval query Q based on input information related to deleting, adding or changing the subset R of the thesaurus T and/or the retrieval query Q input through the input unit; and the processing unit returns to the step of retrieval if there is a request for re-retrieval from the user to repeat the processing, or ending the processing if there is no request.

Full Text	Form-2 THE PATENTS-ACT 1970 (39 of 1970) As amended by the Patents (Amendment) Act, 2002 COMPLETE SPECIFICATIOIN (See Section 10;Rule 13) TITLE Apparatus and a method for information retrieval JAPAN SCIENCE AND TECHNOLOGY AGENCY, 4-1-8 Hon-cho, Kawaguchi-shi Saitama 332-0012, Japan, a Japanese Corporation INVENTOR Under Section 28(2) HASIDA Koiti, 5-3-311, Hinobe Urayasu-shi, Chiba 279-0013, Japan The following specification particularly describes the nature of this invention and the manner in which it is to be performed: This invention relates to an apparatus and a method for information retrieval. Technical Field This invention relates to an information retrieval method, an information retrieval program and to a computer-readable recording medium on which the information retrieval program is recorded. More specifically, the invention relates to an interactive information retrieval method related to labeled graphs, an information retrieval program and to a computer-readable recording medium on which the information retrieval program is recorded. Background of the Invention In the traditional information retrieval, a query consists of keywords and ID numbers combined with logical connectives such as AND and OR. Character-string matching and statistical methods have been basic technologies there. For the interaction with the user, keywords / words and phrases characterizing several subsets of the set of solution candidates are found by statistical methods, and presented as hints to let the user select some of them to augment the query. Related art has been disclosed in the following documents: 2 Yoshihiko Hayashi, Yoshitsugu Obashi, "Technical Trend of Retrieval Service on WWW", Information Processing, Vol. 39, No. 9, 1998, and Sumio Fujita, "Approach to Retrieving / Classifying Information by utilizing Natural Language Processing", Information Processing, Vol. 40, No. 4, 1999. Summary of the Invention Difficulty in the information retrieval usually stems from the difficulty in filling the gap in the expression between the retrieval query and the solution (difficulty in predicting the expression of solution from the retrieval query). Suppose a candidate "President Tanaka was hit by a car in the U.S.A." is detected for a retrieval request "a Japanese businessman involved in an accident while he is on business trip overseas". In this case, a complex inference is necessary, but automating such an inference is technically impossible for the time being. 3 Therefore, there will be no other way to conduct such an inference but to rely upon the interaction between the human user and the machine. To realize the interaction, the machine must provide a hint to the user concerning what to do at each stage of the interaction. The above conventional method of giving a hint based on the statistic method can deal with the general nature of a set of candidates but cannot deal with the structure specific to a particular retrieval query. To give the user an effective hint for the interaction, further, the structure, specific to the retrieval query must be reflected on the retrieval. For example, the retrieval query "a Japanese businessman involved in an accident while he is on business trip overseas" has a semantic structure containing relations between "a Japanese" and a "businessman", "businessman" and "on business trip", "overseas" and "on business trip", and "on business trip" and "accident". However, such a structure has almost not been employed by the conventional information retrieval. In particular, it has never been systematically used as a clue to the interaction. 4 An objective of this invention is to improve the efficiency and accuracy of retrieval by conducting an effective interaction by giving proper information to the user in the information retrieval. Another objective of this invention is to conduct the information retrieval maintaining a high efficiency and a high pin-point accuracy by utilizing and the semantic structure specific to the retrieval query, and by interactively revising the retrieval query and retrieval space while automatically narrowing down the retrieval space. A further objective of this invention is to treat the retrieval query and the database to search as graphs without formal structure like a sentence of a natural language, and to improve the efficiency and accuracy of retrieval enabling the user to conduct a retrieval engine and a suitable interaction with the structure as a clue. According to the first means for solution of the invention, there are provided an information retrieval method, an information retrieval program and a computer-readable recording medium on which the information retrieval program is recorded, including: 5 a step in which the processing unit receives, through the input unit, input for a retrieval query Q including information related to nodes, labels of nodes and links among the nodes; a step in which the processing unit finds a set F of solution candidates as a result of searching the database in response to the retrieval query Q by making reference to the thesaurus storage unit storing thesaurus T for defining a degree of similarity among the labels of nodes, using similarity among the labels defined by a subset R of thesaurus T, and making reference to a database D storing the nodes and labels that are input; a step in which the processing unit displays a set F of solution candidates that are found on a display unit; a step in which the processing unit receives, through the input unit, input information related to whether some elements in the set F of solution candidates represent the solutions; 6 a step in which the processing unit deletes some elements in the set F of solution candidates from the set F of solution candidates according to input information; a step in which the processing unit deletes, adds or changes the content of the subset R and/or the retrieval query Q based on input information related to deleting, adding or changing the subset R of the thesaurus T and/or the retrieval query Q input through the input unit; and a step of returning to the step of retrieval if there is a request for re-retrieval from the user, or ending the processing if there is no such request. According to the second means for solution of the invention, there are provided an information retrieval method, an information retrieval program and a computer-readable recording medium on which the information retrieval program is recorded, including: 7 a step in which a processing unit receives, through an input unit, input for a retrieval query Q containing information related to nodes, labels of nodes and links among the nodes; a step in which the processing unit finds a set F of solution candidates as a result of searching the database in response to the retrieval query Q by making reference to the thesaurus storage unit storing thesaurus T for defining a degree of similarity among the labels of nodes, using the degree of similarity among the labels defined in a portion of the thesaurus T determined to be usable according to the links that are input, and making reference to a database D storing the nodes and labels that are input; a step in which the processing unit displays a set F of solution candidates that are found on the display unit; a step in which the processing unit receives, through the input unit, input information related to whether some elements in the set F of solution candidates are representing the solutions; 8 a step in which the processing unit deletes some elements in the set F of solution candidates from the set F of solution candidates according to input information; a step in which the processing unit deletes, adds or changes the content of the thesaurus T and/or the retrieval query Q based on input information related to deleting, adding or changing the thesaurus T and/or the retrieval query Q input through the input unit; and a step of returning to the step of retrieval if there is a request for re-retrieval from the user to repeat the processing, or ending the processing if there is no request. In this invention, the processing unit further can execute the steps of: displaying the retrieval query Q on a display unit; receiving, through an input unit, input information which, when there is no link connecting two nodes of the retrieval query Q, instructs to insert a link; 9 inserting the link according to input information; receiving, through the input unit, input information for instructing the deletion of a link in the retrieval query Q; deleting the link according to the input information; receiving, through the input unit, input information for instructing the addition of a new node to the retrieval query Q; adding the node to the retrieval query Q according to the input information; receiving, through the input unit, input information for instructing the deletion of a node in the retrieval query Q; and deleting the node from the retrieval query Q according to the input information. 10 In this invention, the processing unit can execute the steps of: displaying, on the display unit, a list of labels M of nodes in the database D, such that values T (L, M) representing the degree of similarity between M and the label L of a node in the retrieval query Q is defined by the thesaurus T in the thesaurus storage unit; receiving, through the input unit, input information instructing that some labels M be selected or not selected, input information instructing a change in the value T (L,M) for some labels M, and input information for specifying some arbitrary labels; and permitting the definition of T (L,M) to be used in the thesaurus T for the selected label M, inhibiting the definition of T(L,M) from being used for the unselected element M, changing the value of T (L,M) into a specified value for the specified label M, or setting the value of T (L,M) to 1 while permitting the definition of T (L,N) to be used for each specified label N. 11 In this invention, the processing unit can execute the steps of: displaying the following list on the display unit, for each node x in the retrieval query Q {L \| For a node y and a node z e F (x), L is the label of y and the link y"z is contained in the database D.} receiving, through an input unit, input information for instructing that some labels be selected; and adding a node Y with L as a label and a link x-Y to the retrieval query Q for each of the selected labels L according to the input information. In the invention, further, for each label M in the above list, when the size of the following set is smaller than a predetermined value, the processing unit can display, on the display unit, labels of some nodes around y in addition to the label M as elements of the above list, for every element y of the following set {y \| The label of y is M. For a node z e F (x), the link y"z is contained in the database D.} 12 In this invention, the processing unit can execute the steps of : displaying, on a display unit, a list of labels of nodes z which are included in the shortest paths connecting a node f(x) to a node f(y) in the range of a solution candidate f for a solution candidate f of which the range f(Q) does not include the node z, for each of the links x-y in the retrieval query Q; receiving, through the input unit, input information for so instructing that some of such labels be selected; and adding a node z with the element of the list as a label, and links x-z and z-y to the retrieval query Q according to the input information. Brief Description of the Drawings Fig 1 is a diagram illustrating nodes, links and a retrieval query Q. Fig 2 is a diagram illustrating thesaurus expansion of labels included in the retrieval query Q. 13 Fig 3 is a diagram illustrating the solution candidates and a set F of solution candidates for the retrieval query Q. Fig 4 is a diagram illustrating the architecture of the retrieval system. Fig 5 is a flowchart of an information retrieval processing. Fig 6 is a diagram illustrating a display screen. Detailed Description of Preferred Embodiments of the Invention This embodiment considers a graph (network) with labels at the nodes as the above mentioned semantic structure. It is presumed that both the retrieval query Q and the database D to be retrieved are such graphs. Based on an approximate matching or the like among the graphs, further, the retrieval query Q and the retrieval space are allowed to be modified interactively and effectively. In the case of retrieval a sentence, for example, the nodes are objects of referenced by words, a link is a semantic relationship between them, and a label is a word. 14 In this embodiment, the "retrieve / retrieval" is to find a subgraph of the database D resembling the retrieval query Q. It is considered that each node of the retrieval query Q corresponds to some of the nodes of the partial graph. Such a correspondence relationship is expressed by a function mapping each node in the retrieval query Q to a node in the database D, and the function is called a solution candidate. It is further presumed that the scores (e.g., degree of similarity, degree of relationship, values related to probability) of the candidates are defined. A set of several solution candidates having high scores is referred to as a set F of solution candidates, and there are established, F(x) = (f(x) \| feF} (x is a node in retrieval query Q, and f(x) is a node in the database corresponding to the node x), and f(Q) = (f(x) \| x is a node in retrieval query Q} (feF) The retrieval query Q, the set F of solution candidates and the like will now be concretely described. 15 Fig 1 is a diagram illustrating the nodes, links and retrieval query Q. * The nodes x in the retrieval query Q and labels thereof are, for example, "function", "analysis", "meaning" and "automatic". * The links in the retrieval query Q are "function - analysis", "analysis - meaning" and "analysis - automatic". * The retrieval query Q is as shown, constituted by the nodes and labels. Fig 2 is a diagram illustrating nodes f(x) in the database corresponding to the nodes x in retrieval query Q in the solution candidates f, and sets F(x) of nodes in the database corresponding to x in the set F of candidates. * When, for example, x is node (labeled with) "function", f(x) is expressed as f(function), (f1 (function), f2 (function) —), one of 16 "function", "program", "functor", "relation", "subroutine", "projection" and "surjection" as a label. * When, for example, x is node (label) "function", F(x) is expressed as F (function), and stands for the set ("function", "program", "functor", "relation", "subroutine", "projection", "surjection") of the f(function) over all feF. Fig 3 is a diagram illustrating ranges f(Q) of solution candidates f in the retrieval query Q and the set F of solution candidates, f '(Q), f "(Q)and f"'(Q)are ranges of solution candidates f, f" and f". * f (Q) s correspond to "analyze ... the language ... with a program", "a iunction representing ... an intended investment", "wish to automatically rearrange ... the contents", "presuming ... implicit will", "program ... the meaning of data ... that cannot be comprehended", and "stands for ...a method used for the analysis", respectively. * F is a set of f, expressed as a set of f(Q), and stands for ("analyze ... the language ... with a program", "a function representing 17 ...an intended investment", "wish to automatically rearrange ...the contents", "presuming ... implicit will", "program ... the meaning of data ... that cannot be comprehended ", stands for ...a method used for the analysis"). In the embodiment described below, further, the thesaurus T is, for example, a partial function from a combination of a label L and a label M of nodes in the graph to a numerical value T (L, M) representing the degree of similarity between the two labels, and is used for the calculation of scores. At the time of finding a set F of solution candidates, there is used a subset R of thesaurus T instead of using the whole thesaurus T. For example, the thesaurus T includes a portion R that can be used being determined in advance by the user through the input unit or the storage unit, and the other portion that cannot be used. The set F of solution candidates is found not by using the whole thesaurus T but by using a utilizable portion R of the thesaurus T. Several methods have been known ("execution of retrieval" in a flowchart of Fig 5 described later and the description related to step S2 thereof) for finding a set F of solution candidates from the definition of score, expression of graph, database D, thesaurus T or a subset R of T and retrieval query Q, and can be suitably employed, though they are not described here in detail. For example, a score representing the similarity between the label "function" and "analysis" is given as a numerical value T (function, analysis) by the thesaurus T stored in the thesaurus storage unit 6. Fig 4 is a diagram illustrating the constitution of a retrieval apparatus. The retrieval system includes a display unit 1, an input unit 2, a processing unit (CPU) 3, amain storage unit 4, a thesaurus storage unit 5, a database (object to be retrieved) 6 and a bus 7. The processing unit 3 is connected to the input unit 2, display unit 1, main storage unit 4, thesaurus storage unit 5 and database (search space) 6 through the bus 7, and receives and outputs various kinds of information. The display unit 1 is a display device for displaying, for example, retrieval input, retrieval output, interim results of retrieval and 19 the like on a screen. The input unit 2 is means for receiving various kinds of data or the like necessary for, for example, the retrieval query, instruction and retrieval the conditions or the like, and a suitable device is used such as a keyboard, a mouse, a pointing device or the like. The input unit 2 may further be provided with an output unit for sending data to other units, storage medium and the like. The main storage unit 4 stores various data such as retrieval program, initial setting and parameters, as well as data related to the retrieval conditions such as the final results of retrieval and interim results. The thesaurus storage unit 5 stores the thesaurus T which includes the data representing I relationships among the nodes necessary for the retrieval, degree of relation or degree of non-relation, degree of similarity or degree of difference, probability, certainty and the like. The database 6 stores the data (database D) to be retrieved, i.e., storing the nodes, labels, links and the like. Fig 5 is a flowchart illustrating a retrieval processing. The retrieval is conducted according to the following procedure. 20 As the initial input, first, the database D is stored in advance in the database storage unit 6, and the thesaurus T or a subset R, that is, part of the thesaurus T is stored in advance in the thesaurus storage unit 5. At step SI, the CPU 3 initializes the set G of the deleted solution candidates to null, and receives from the user the input of retrieval query Q containing information related to the nodes, labels of nodes and links among the nodes. The CPU 3 stores the data related to the retrieval query Q in a suitable storage portion of the main storage unit 4, and reads them therefrom as required. At step S2, when the user clicks a "RETRIEVAL EXECUTION BUTTON" displayed on the display unit 1 the CPU 3 initiates the retrieval (or re-retrieval) as requested by the user. The CPU 3 makes reference to the thesaurus storage unit 5 and the database storage unit 6 according to the retrieval query Q that is input, and finds a set F of solution candidates as a result of searching the database D according to the retrieval query Q by using a degree of similarity among the labels defined in a portion R that is usable of the thesaurus T (this method 21 has been known as mentioned above, and is not described here). Here, neither the solution candidates which are the elements of the set G of the deleted solution candidates nor the solution candidates including the elements of the set G of the deleted solution candidates, are included in the set F of solution candidates (the solution candidate is a function which is a set of ordered pairs and, hence, an inclusion relation holds among the solution candidates). At step S3, the CPU 3 offers the following information (1) to (5) to the user through the display unit 1 as a clue to the interaction (the display of a list of (2), (4) and (5) complies with the order of decreasing maximum score values of solution candidates, for example, including nodes having labels which are the elements of the list). The user is allowed to check whether the solution candidates in the set F of solution candidates are solutions, or to change the set F of solution candidates, the set G of deleted solution candidates, the thesaurus T or the subset R of thesaurus T and the retrieval query Q in an itemized manner depending upon the following information. The CPU 3 displays information related to each of the case list elements on the display unit 1. The CPU 3 deletes, adds or changes the case list elements according to 22 information input by the user through the input unit 2, stores the data in the main storage unit 4, and suitably reads the thesaurus related to the data and the data related to the search space from the thesaurus storage unit 5 and the database 6. Fig 6 is a diagram illustrating a display screen. This diagram illustrates a display of the procedure at step S3 concerning the retrieval of a sentence of a natural language and an interface for supporting the interaction. (1) to (5) in the drawing correspond to (1) to (5) described below. (1) Set F of solution candidates. Displayed here is a list of solution candidates having high scores. In the drawing, bold characters represent words corresponding to the thesaurus expansion ofwordsinthe retrieval query. The user can carry out the operation on the display in a manner as described below. * Check whether some elements in the set F of solution candidates are solutions. This can be done by, for example, using only those data that are displayed on the list. When they cannot be checked by those 23 data only, the individual solution candidates are clicked to display a wider surrounding range. * Some elements in the set F of solution candidates are deleted from the set F of solution candidates and are registered as elements of the set G of the deleted solution candidates. This in Fig 6 is to exclude the solution candidates (represented by black circles in the drawing) included in F from F (as represented by open circles). (2) Retrieval query Q. Here, a > retrieval query is displayed. The user is allowed to add or delete nodes, and to insert or delete links as described below. * In some cases where there is no link connecting two nodes in the retrieval query Q, such insert such a link. * Delete some link from the retrieval query Q. * Add some new node to the retrieval query Q. 24 * Delete some node from the retrieval query Q. (3) Displayed here are the results having high scores obtained by expanding the labels ("functions", etc. in Fig 6) of nodes in the retrieval query Q by using thesaurus. More specifically, this is a list of labels (elements) M of nodes of the database D in which T (L,M) is defined in the thesaurus T concerning the labels L of nodes x (for each of the nodes x of the retrieval query Q). The user can specify weather the elements be included in the retrieval range (represented by black circles in Fig 6) or not (represented by open circles). * Subset R of the thesaurus T, which has not defined R(L, M) for several elements M of the list, is expanded for their definition such that R(L,M) = T(L,M). Or, concerning some elements M of the list, the definition of T(L,M) can be used in the thesaurus T. * For some elements M of the list that have been defined for their R(L,M), the definition of R is reduced and R(L,M) is not defined. Equivalently, for some elements M of the list, use of the definition T(L,M)is inhibited. 25 (4) Displayed here are labels of nodes that can be added to the retrieval query being directly connected to the nodes (such as "functions" in Fig 6) of the retrieval query Q. In more details, the list is the list of the labels L for which there exists a node zeF(x) and a node y such that the link y - z is included in the database D (for each in the nodes x of the retrieval query Q) and the label of y is L. When there are small number of such that there exists a node zeF (x) and the link y - z is in the database D and the label of y is L) corresponding to the label L, the labels of some nodes around y may be added to L as an element of the list for every such y. The user can specify whether the retrieval query Q can be expanded (black circles) or not (open circles) depending upon the elements of the list in a manner as described below. * A node Y with M as a label and a link x-Y are added to the retrieval query Q concerning some element M of the list. Namely, the retrieval query Q is expanded by M. M can also be directly input instead of being selected from the list. 26 (5) Displayed here are labels of nodes to be inserted between the two nodes (between the "function" and the "analysis" in Fig 6) of the retrieval query Q. In more detail, this is a list of labels of nodes z included in the shortest paths connecting the node f(x) to the node f(y) in the solution candidate f(for each of the links x-y of the retrieval query Q) and not contained in the range f (Q) for some solution candidate f. The user can specify whether each element of the list be inserted in the retrieval query Q (represented by black circles in Fig 6) or not (represented by open circles) in a manner as described below. * A node z having the specified element of this list as a label, a link x-z and a link z-y are added to the retrieval query Q. Namely, this element is inserted in the retrieval query Q. Step S4 returns back to step S2 if the user requests the re-retrieval by clicking the "RETRIEVAL EXECUTION BUTTON". On the other hand, the processing ends if there is no request for the re-retrieval. 27 The information retrieval method and the information retrieval apparatus / system of the invention can be provided as an information retrieval program for having a computer execute the procedures, as a computer-readable recording medium on which an information retrieval program is recorded, as a program product which includes an information retrieval program and can be loaded in an internal memory of a computer, and as a computer such as a server including a program. Industrial Applicability The invention makes it possible to improve the efficiency and accuracy of information retrieval by conducting an effective interaction by giving appropriate information to the user in the information retrieval as described above. The invention makes it possible to conduct the information retrieval maintaining a high efficiency and a high pin¬point accuracy by interactively inputting or revising the retrieval query and retrieval range while automatically narrowing down the retrieval space by utilizing a database and a graph structure specific to the retrieval query. 28 The invention further makes it possible to treat the retrieval query and the database as graphs having an indefinite structure like natural - language sentences, and to improve the efficiency and accuracy of retrieval enabling the user to interact with the retrieval engine with the structure as a clue. 29 We Claim: 1. An apparatus for information retrieval comprising: an input unit, a database D storing the nodes and labels, a thesaurus storage unit for storing thesaurus T for defining a degree of similarity among the labels of nodes, a display unit, and a processing unit, wherein the processing unit receives, through the input unit, input for a retrieval query Q including information related to nodes, labels of nodes and links among the nodes; the processing unit finds a set F of solution candidates as a result of searching the database in response to the retrieval query Q by making reference to the thesaurus storage unit storing thesaurus T for defining a degree of similarity among the labels of nodes, using similarity among the labels defined by a subset R of thesaurus T according to links that are input, f and making reference to the database D storing the nodes and labels that are input; the processing unit displays a set F of solution candidates that are found on the display unit; the processing unit receives, through the input unit, input information related to whether some elements in the set F of solution candidates are 30 representing the solutions; the processing unit deletes some elements in the set F of solution candidates from the set F of solution candidates according to input information; the processing unit deletes, adds or changes the contents of the subset R and/or the retrieval query Q based on input information related to deleting, adding or changing the subset R of the thesaurus T and/or the retrieval query Q input through the input unit; and the processing unit returns to the step of retrieval if there is a request for re-retrieval from the user to repeat the processing, or ending the processing if there is no request. 2. An apparatus for information retrieval according to claim 1, wherein the processing unit displays a retrieval query Q on the display unit; the processing unit receives, through the input unit, input information which, when there is no link connecting two nodes of the retrieval query Q, instructs to insert a link; the processing unit inserts the link according to input information; the processing unit receives, through the input unit, input information for instructing the deletion of the link in the retrieval query Q; 31 the processing unit deletes the link according to input information; the processing unit receives, through the input unit, input information for instructing the addition of a new node to the retrieval query Q; the processing unit adds the node to the retrieval query Q according to input information; the processing unit receives, through the input unit, input information for instructing the deletion of the node of the retrieval query Q without an end point of the link; and the processing unit deletes the node from the retrieval query Q according to input information. 3. An apparatus for information retrieval according to claim 1 or 2, wherein the processing unit displays, on a display unit, a list of labels M of nodes of a database D, such that the value T(L, M) representing the degree of similarity between a label L of a node x and a label M of a node in the database D is defined in the thesaurus T in the thesaurus storage unit, for every node x of the retrieval query Q; the processing unit receives, through an input unit, input information for so instructing that each such label M be selected or not selected; and 32 the processing unit, according to the input information, expands the definition of the subset R to include R(L, M) = T(L, M) concerning those M for which R(L, M) has not been defined in the subset R of the thesaurus T, or reduces the definition of R so as not to define R(L, M) concerning those M for which R(L, M) has been defined. 4. An apparatus for information retrieval according to any one of claims 1 to 3, wherein the processing unit displays, on a display unit, a list of labels of nodes y, such that there exists a node zeF(x) and a link y-z is a link in the database D, for each of the nodes x in the retrieval query Q; the processing unit receives, through the input unit, input information for so instructing that some of such labels be selected; and the processing unit adds a node y with L as a label and a link x-y to the retrieval query Q for each of the selected labels L according to the input information. 5. An apparatus for information retrieval according to any one of claims 1 to 4, wherein the processing unit displays, on a display unit, a list of labels of nodes 33 z which are included in the shortest paths connecting a node f(x) to a node f(y) in the range of a solution candidate f and not contained in the range f(Q) of some solution candidate f, for each of the links x-y of the retrieval query Q; the processing unit receives, through an input unit, input information for so instructing that some of such labels be selected; and the processing unit adds a node z with the selected element of the list as a label, and links x-z and z-y to the retrieval query Q according to the input information. 6. An apparatus for information retrieval comprising : an input unit, a database D storing the nodes and labels, a thesaurus storage unit for storing thesaurus T for defining a degree of similarity among the labels of nodes, a display unit, and a processing unit, wherein the processing unit receives, through the input unit, input for a retrieval query Q including information related to nodes, labels of nodes and links among the nodes; the processing unit finds a set F of solution candidates as a result of searching the database in response to the retrieval query Q by making reference to the thesaurus storage unit storing thesaurus T for defining a 34 degree of similarity among the labels of nodes, using the degree of similarity among the labels defined in a portion of thesaurus T determined to be usable according to the retrieval query Q that is input, and making reference to the database D; the processing unit displays the set F of solution candidates that are found on the display unit; the processing unit receives, through the input unit, input information related to whether some elements in the set F of solution candidates represent solutions; the processing unit deletes some elements from the set F of solution candidates according to input information; the processing unit deletes, adds or changes the content of the thesaurus T and/or the retrieval query Q based on input information related to deleting, adding or changing the thesaurus T and/or the retrieval query Q input through the input unit; and the processing unit returns to the step of retrieval if there is a request for re-retrieval from the user to repeat the processing, or ends the processing if there is no request. 7. An apparatus for information retrieval according to claim 6, wherein 35 the processing unit displays a retrieval query Q on a display unit; the processing unit receives, through an input unit, input information which, when there is no link connecting two nodes of the retrieval query Q, instructs to insert a link; the processing unit inserts the link according to input information; the processing unit receives, through the input unit, input information for instructing the deletion of the link of the retrieval query Q; the processing unit deletes the link according to input information; the processing unit receives, through the input unit, input information for instructing the addition of a new node to the retrieval query Q; the processing unit adds the node to the retrieval query Q according to input information; the processing unit receives, through the input unit, input information for instructing the deletion of the node of the retrieval query Q; and the processing unit deletes the node from the retrieval query Q according to input information. 8. An apparatus for information retrieval according to claim 6 or 7, wherein the processing unit displays, on a display unit, a list of labels M of 36 nodes of the database D, such that the value T(L, M) representing the degree of similarity between a label L of a node x and a label M of a node in the database D is defined in the thesaurus T in the thesaurus storage unit, for every node x in the retrieval query Q; the processing unit receives, through the input unit, input information instructing that some such labels M be selected or not selected, input information instructing a change in the value T(L, M) for several such labels M, and input information for specifying several such arbitrary new labels N; and the processing unit permits the definition of T(L, N) to be used in the thesaurus T for the selected label M, inhibiting the definition of T(L, M) from being used for the unselected element M, changing the value of T(L, M) into a specified value for the specified label M, or setting the value of T(L, N) to 1 while permitting the definition of T(L, N) to be used for the specified label N. 9. An apparatus for information retrieval according to any one of claims 6 to 8, wherein: the processing unit displays the following list on the display unit, for each node x in the retrieval query Q 37 { L \| For a node y and a node z F(x), L is the label of y and the link y-z is contained in the database D. } the processing unit receives, through an input unit, input information for so instructing that some labels be selected; and the processing unit adds a node Y with L as a label and a link x-Y to the retrieval query Q for each of the selected labels L according to input information. 10. An apparatus for information retrieval according to any one of claims 6 to 9, wherein, for each label M in the above list, when the size of the following set is smaller than a predetermined value, the processing unit can display, on the display unit, labels of some nodes around y in addition to the label M as elements of the above list, for every element y of the following set. { y \| The label of y is M. For a node z F(x), the link y-z is contained in the database D. } 38 11. An apparatus for information retrieval according to any one of claims 6 to 10, wherein the processing unit displays, on a display unit, a list of labels of nodes z which are included in the shortest paths connecting a node f(x) to a node f(y) in the range of a solution candidate f andnot contained in the range f(Q) of some solution candidate f, for each of the links x-y of the retrieval query Q; the processing unit receives, through the input unit, input information for so instructing that some of such labels be selected; and the processing unit adds a node z with the selected element of the list as a label, and links x-z and z-y to the retrieval query Q according to the input information. 12. A method for retrieval of information recorded on a computer- readable recording medium comprising: a step in which a processing unit receives, through an input unit, input for a retrieval query Q including information related to nodes, labels of nodes and links among the nodes; a step in which the processing unit finds a set F of solution candidates as a result of searching the database in response to the retrieval query Q by 39 making reference to a thesaurus storage unit storing thesaurus T for defining a degree of similarity among the labels of nodes, using similarity among the labels defined by a subset R of thesaurus T, and makes reference to the database D storing the nodes and labels that are input; a step in which the processing unit displays a set F of solution candidates that are found on a display unit; a step in which the processing unit receives, through the input unit, input information related to whether some elements in the set F of solution candidates represent solution; a step in which the processing unit deletes some elements from the set F of solution candidates according to input information; a step in which the processing unit deletes, adds or changes the content of the subset R and/or the retrieval query Q based on input information related to deleting, adding or changing the subset R of the thesaurus T and/or the retrieval query Q input through the input unit; and a step of returning to the step of retrieval if there is a request for re-retrieval from the user to repeat the processing, or ending the processing if there is no request. 13. A method for retrieval of information recorded on a computer- 40 readable recording medium comprising: a step in which a processing unit receives, through the input unit, input for a retrieval query Q including information related to nodes, labels of nodes and links among the nodes; a step in which the processing unit finds a set F of solution candidates as a result of searching the database in response to the retrieval query Q by making reference to the thesaurus storage unit storing thesaurus T for defining a degree of similarity among the labels of nodes, using the degree of similarity among the labels defined in a portion of thesaurus T determined to be usable, and making reference to a database D storing nodes and labels that are input; a step in which the processing unit displays a set F of solution candidates that are found on a display unit; a step in which the processing unit receives, through the input unit, input information related to whether some elements in the set F of solution candidates represent the solutions; a step in which the processing unit deletes some elements from the set F of solution candidates according to input information; a step in which the processing unit deletes, adds or changes the content of the thesaurus T and/or the retrieval query Q based on input 41 information related to deleting, adding or changing the thesaurus T and/or the retrieval query Q input through the input unit; and a step of returning to the step of retrieval if there is a request for re-retrieval from the user to repeat the processing, or ending the processing if there is no such request. Dated this l0th day of May 2004 (Jose M A) of Khaitan & Co Agent for the Applicants 42

Full Text

Form-2
THE PATENTS-ACT 1970 (39 of 1970)
As amended by the Patents (Amendment) Act, 2002
COMPLETE SPECIFICATIOIN
(See Section 10;Rule 13)
TITLE
Apparatus and a method for information retrieval
JAPAN SCIENCE AND TECHNOLOGY AGENCY,
4-1-8 Hon-cho, Kawaguchi-shi Saitama 332-0012,
Japan, a Japanese Corporation
INVENTOR Under Section 28(2)
HASIDA Koiti, 5-3-311, Hinobe Urayasu-shi, Chiba 279-0013, Japan
The following specification particularly describes the nature of this invention and the manner in which it is to be performed:

This invention relates to an apparatus and a method for information retrieval.
Technical Field
This invention relates to an information retrieval method, an
information retrieval program and to a computer-readable recording
medium on which the information retrieval program is recorded. More
specifically, the invention relates to an interactive information retrieval
method related to labeled graphs, an information retrieval program and
to a computer-readable recording medium on which the information
retrieval program is recorded.
Background of the Invention
In the traditional information retrieval, a query consists of keywords and ID numbers combined with logical connectives such as AND and OR. Character-string matching and statistical methods have been basic technologies there. For the interaction with the user, keywords / words and phrases characterizing several subsets of the set of solution candidates are found by statistical methods, and presented as hints to let the user select some of them to augment the query. Related art has been disclosed in the following documents:
2

Yoshihiko Hayashi, Yoshitsugu Obashi, "Technical Trend of Retrieval Service on WWW", Information Processing, Vol. 39, No. 9, 1998, and
Sumio Fujita, "Approach to Retrieving / Classifying Information by utilizing Natural Language Processing", Information Processing, Vol. 40, No. 4, 1999.
Summary of the Invention
Difficulty in the information retrieval usually stems from the difficulty in filling the gap in the expression between the retrieval query and the solution (difficulty in predicting the expression of solution from the retrieval query). Suppose a candidate "President Tanaka was hit by a car in the U.S.A." is detected for a retrieval request "a Japanese businessman involved in an accident while he is on business trip overseas". In this case, a complex inference is necessary, but automating such an inference is technically impossible for the time being.
3

Therefore, there will be no other way to conduct such an inference but to rely upon the interaction between the human user and the machine. To realize the interaction, the machine must provide a hint to the user concerning what to do at each stage of the interaction. The above conventional method of giving a hint based on the statistic method can deal with the general nature of a set of candidates but cannot deal with the structure specific to a particular retrieval query.
To give the user an effective hint for the interaction, further, the structure, specific to the retrieval query must be reflected on the retrieval. For example, the retrieval query "a Japanese businessman involved in an accident while he is on business trip overseas" has a semantic structure containing relations between "a Japanese" and a "businessman", "businessman" and "on business trip", "overseas" and "on business trip", and "on business trip" and "accident". However, such a structure has almost not been employed by the conventional information retrieval. In particular, it has never been systematically used as a clue to the interaction.
4

An objective of this invention is to improve the efficiency and accuracy of retrieval by conducting an effective interaction by giving proper information to the user in the information retrieval.
Another objective of this invention is to conduct the information retrieval maintaining a high efficiency and a high pin-point accuracy by utilizing and the semantic structure specific to the retrieval query, and by interactively revising the retrieval query and retrieval space while automatically narrowing down the retrieval space.
A further objective of this invention is to treat the retrieval query and the database to search as graphs without formal structure like a sentence of a natural language, and to improve the efficiency and accuracy of retrieval enabling the user to conduct a retrieval engine and a suitable interaction with the structure as a clue.
According to the first means for solution of the invention, there are provided an information retrieval method, an information retrieval program and a computer-readable recording medium on which the information retrieval program is recorded, including:
5

a step in which the processing unit receives, through the input unit, input for a retrieval query Q including information related to nodes, labels of nodes and links among the nodes;
a step in which the processing unit finds a set F of solution candidates as a result of searching the database in response to the retrieval query Q by making reference to the thesaurus storage unit storing thesaurus T for defining a degree of similarity among the labels of nodes, using similarity among the labels defined by a subset R of thesaurus T, and making reference to a database D storing the nodes and labels that are input;
a step in which the processing unit displays a set F of solution candidates that are found on a display unit;
a step in which the processing unit receives, through the input unit, input information related to whether some elements in the set F of solution candidates represent the solutions;
6

a step in which the processing unit deletes some elements in the set F of solution candidates from the set F of solution candidates according to input information;
a step in which the processing unit deletes, adds or changes the content of the subset R and/or the retrieval query Q based on input information related to deleting, adding or changing the subset R of the thesaurus T and/or the retrieval query Q input through the input unit; and
a step of returning to the step of retrieval if there is a request for re-retrieval from the user, or ending the processing if there is no such request.
According to the second means for solution of the invention, there are provided an information retrieval method, an information retrieval program and a computer-readable recording medium on which the information retrieval program is recorded, including:
7

a step in which a processing unit receives, through an input unit, input for a retrieval query Q containing information related to nodes, labels of nodes and links among the nodes;
a step in which the processing unit finds a set F of solution candidates as a result of searching the database in response to the retrieval query Q by making reference to the thesaurus storage unit storing thesaurus T for defining a degree of similarity among the labels of nodes, using the degree of similarity among the labels defined in a portion of the thesaurus T determined to be usable according to the links that are input, and making reference to a database D storing the nodes and labels that are input;
a step in which the processing unit displays a set F of solution candidates that are found on the display unit;
a step in which the processing unit receives, through the input unit, input information related to whether some elements in the set F of solution candidates are representing the solutions;
8

a step in which the processing unit deletes some elements in the set F of solution candidates from the set F of solution candidates according to input information;
a step in which the processing unit deletes, adds or changes the content of the thesaurus T and/or the retrieval query Q based on input information related to deleting, adding or changing the thesaurus T and/or the retrieval query Q input through the input unit; and
a step of returning to the step of retrieval if there is a request for re-retrieval from the user to repeat the processing, or ending the processing if there is no request.
In this invention, the processing unit further can execute the steps of:
displaying the retrieval query Q on a display unit;
receiving, through an input unit, input information which, when there is no link connecting two nodes of the retrieval query Q, instructs to insert a link;
9

inserting the link according to input information;
receiving, through the input unit, input information for instructing the deletion of a link in the retrieval query Q;
deleting the link according to the input information;
receiving, through the input unit, input information for instructing the addition of a new node to the retrieval query Q;
adding the node to the retrieval query Q according to the input information;
receiving, through the input unit, input information for instructing the deletion of a node in the retrieval query Q; and
deleting the node from the retrieval query Q according to the input information.
10

In this invention, the processing unit can execute the steps of: displaying, on the display unit, a list of labels M of nodes in the database D, such that values T (L, M) representing the degree of similarity between M and the label L of a node in the retrieval query Q is defined by the thesaurus T in the thesaurus storage unit;
receiving, through the input unit, input information instructing that some labels M be selected or not selected, input information instructing a change in the value T (L,M) for some labels M, and input information for specifying some arbitrary labels; and
permitting the definition of T (L,M) to be used in the thesaurus T for the selected label M, inhibiting the definition of T(L,M) from being used for the unselected element M, changing the value of T (L,M) into a specified value for the specified label M, or setting the value of T (L,M) to 1 while permitting the definition of T (L,N) to be used for each specified label N.
11

In this invention, the processing unit can execute the steps of: displaying the following list on the display unit, for each node x in the retrieval query Q
{L | For a node y and a node z e F (x), L is the label of y and the link y"z is contained in the database D.} receiving, through an input unit, input information for instructing that some labels be selected; and
adding a node Y with L as a label and a link x-Y to the retrieval query Q for each of the selected labels L according to the input information.
In the invention, further, for each label M in the above list, when the size of the following set is smaller than a predetermined value, the processing unit can display, on the display unit, labels of some nodes around y in addition to the label M as elements of the above list, for every element y of the following set
{y | The label of y is M. For a node z e F (x), the link y"z is contained in the database D.}
12

In this invention, the processing unit can execute the steps of : displaying, on a display unit, a list of labels of nodes z which are included in the shortest paths connecting a node f(x) to a node f(y) in the range of a solution candidate f for a solution candidate f of which the range f(Q) does not include the node z, for each of the links x-y in the retrieval query Q;
receiving, through the input unit, input information for so instructing that some of such labels be selected; and
adding a node z with the element of the list as a label, and links x-z and z-y to the retrieval query Q according to the input information.
Brief Description of the Drawings
Fig 1 is a diagram illustrating nodes, links and a retrieval query
Q.
Fig 2 is a diagram illustrating thesaurus expansion of labels included in the retrieval query Q.
13

Fig 3 is a diagram illustrating the solution candidates and a set F of solution candidates for the retrieval query Q.
Fig 4 is a diagram illustrating the architecture of the retrieval system.
Fig 5 is a flowchart of an information retrieval processing.
Fig 6 is a diagram illustrating a display screen.
Detailed Description of Preferred Embodiments of the Invention
This embodiment considers a graph (network) with labels at the nodes as the above mentioned semantic structure. It is presumed that both the retrieval query Q and the database D to be retrieved are such graphs. Based on an approximate matching or the like among the graphs, further, the retrieval query Q and the retrieval space are allowed to be modified interactively and effectively. In the case of retrieval a sentence, for example, the nodes are objects of referenced by words, a link is a semantic relationship between them, and a label is a word.
14

In this embodiment, the "retrieve / retrieval" is to find a subgraph of the database D resembling the retrieval query Q. It is considered that each node of the retrieval query Q corresponds to some of the nodes of the partial graph. Such a correspondence relationship is expressed by a function mapping each node in the retrieval query Q to a node in the database D, and the function is called a solution candidate. It is further presumed that the scores (e.g., degree of similarity, degree of relationship, values related to probability) of the candidates are defined. A set of several solution candidates having high scores is referred to as a set F of solution candidates, and there are established,
F(x) = (f(x) | feF} (x is a node in retrieval query Q, and f(x) is a node in the database corresponding to the node x), and
f(Q) = (f(x) | x is a node in retrieval query Q} (feF)
The retrieval query Q, the set F of solution candidates and the like will now be concretely described.
15

Fig 1 is a diagram illustrating the nodes, links and retrieval query
Q.
* The nodes x in the retrieval query Q and labels thereof are, for example, "function", "analysis", "meaning" and "automatic".
* The links in the retrieval query Q are "function - analysis", "analysis - meaning" and "analysis - automatic".
* The retrieval query Q is as shown, constituted by the nodes and labels.
Fig 2 is a diagram illustrating nodes f(x) in the database corresponding to the nodes x in retrieval query Q in the solution candidates f, and sets F(x) of nodes in the database corresponding to x in the set F of candidates.
* When, for example, x is node (labeled with) "function", f(x) is
expressed as f(function), (f1 (function), f2 (function) —), one of
16

"function", "program", "functor", "relation", "subroutine", "projection" and "surjection" as a label.
* When, for example, x is node (label) "function", F(x) is expressed
as F (function), and stands for the set ("function", "program", "functor",
"relation", "subroutine", "projection", "surjection") of the f(function) over
all feF.
Fig 3 is a diagram illustrating ranges f(Q) of solution candidates f in the retrieval query Q and the set F of solution candidates, f '(Q), f "(Q)and f"'(Q)are ranges of solution candidates f, f" and f".
* f (Q) s correspond to "analyze ... the language ... with a program", "a iunction representing ... an intended investment", "wish to automatically rearrange ... the contents", "presuming ... implicit will", "program ... the meaning of data ... that cannot be comprehended", and "stands for ...a method used for the analysis", respectively.
* F is a set of f, expressed as a set of f(Q), and stands for ("analyze ... the language ... with a program", "a function representing
17

...an intended investment", "wish to automatically rearrange ...the contents", "presuming ... implicit will", "program ... the meaning of data ... that cannot be comprehended ", stands for ...a method used for the analysis").
In the embodiment described below, further, the thesaurus T is, for example, a partial function from a combination of a label L and a label M of nodes in the graph to a numerical value T (L, M) representing the degree of similarity between the two labels, and is used for the calculation of scores. At the time of finding a set F of solution candidates, there is used a subset R of thesaurus T instead of using the whole thesaurus T. For example, the thesaurus T includes a portion R that can be used being determined in advance by the user through the input unit or the storage unit, and the other portion that cannot be used. The set F of solution candidates is found not by using the whole thesaurus T but by using a utilizable portion R of the thesaurus T. Several methods have been known ("execution of retrieval" in a flowchart of Fig 5 described later and the description related to step S2 thereof) for finding a set F of solution candidates from the definition of score, expression of graph, database D, thesaurus T or a

subset R of T and retrieval query Q, and can be suitably employed, though they are not described here in detail.
For example, a score representing the similarity between the label "function" and "analysis" is given as a numerical value T (function, analysis) by the thesaurus T stored in the thesaurus storage unit 6.
Fig 4 is a diagram illustrating the constitution of a retrieval apparatus.
The retrieval system includes a display unit 1, an input unit 2, a processing unit (CPU) 3, amain storage unit 4, a thesaurus storage unit 5, a database (object to be retrieved) 6 and a bus 7.
The processing unit 3 is connected to the input unit 2, display unit 1, main storage unit 4, thesaurus storage unit 5 and database (search space) 6 through the bus 7, and receives and outputs various kinds of information. The display unit 1 is a display device for displaying, for example, retrieval input, retrieval output, interim results of retrieval and
19

the like on a screen. The input unit 2 is means for receiving various kinds of data or the like necessary for, for example, the retrieval query, instruction and retrieval the conditions or the like, and a suitable device is used such as a keyboard, a mouse, a pointing device or the like. The input unit 2 may further be provided with an output unit for sending data to other units, storage medium and the like. The main storage unit 4 stores various data such as retrieval program, initial setting and parameters, as well as data related to the retrieval conditions such as the final results of retrieval and interim results. The thesaurus storage unit 5 stores the thesaurus T which includes the data representing I relationships among the nodes necessary for the retrieval, degree of relation or degree of non-relation, degree of similarity or degree of difference, probability, certainty and the like. The database 6 stores the data (database D) to be retrieved, i.e., storing the nodes, labels, links and the like.
Fig 5 is a flowchart illustrating a retrieval processing. The retrieval is conducted according to the following procedure.
20

As the initial input, first, the database D is stored in advance in the database storage unit 6, and the thesaurus T or a subset R, that is, part of the thesaurus T is stored in advance in the thesaurus storage unit 5.
At step SI, the CPU 3 initializes the set G of the deleted solution candidates to null, and receives from the user the input of retrieval query Q containing information related to the nodes, labels of nodes and links among the nodes. The CPU 3 stores the data related to the retrieval query Q in a suitable storage portion of the main storage unit 4, and reads them therefrom as required.
At step S2, when the user clicks a "RETRIEVAL EXECUTION BUTTON" displayed on the display unit 1 the CPU 3 initiates the retrieval (or re-retrieval) as requested by the user. The CPU 3 makes reference to the thesaurus storage unit 5 and the database storage unit 6 according to the retrieval query Q that is input, and finds a set F of solution candidates as a result of searching the database D according to the retrieval query Q by using a degree of similarity among the labels defined in a portion R that is usable of the thesaurus T (this method
21

has been known as mentioned above, and is not described here). Here, neither the solution candidates which are the elements of the set G of the deleted solution candidates nor the solution candidates including the elements of the set G of the deleted solution candidates, are included in the set F of solution candidates (the solution candidate is a function which is a set of ordered pairs and, hence, an inclusion relation holds among the solution candidates).
At step S3, the CPU 3 offers the following information (1) to (5) to the user through the display unit 1 as a clue to the interaction (the display of a list of (2), (4) and (5) complies with the order of decreasing maximum score values of solution candidates, for example, including nodes having labels which are the elements of the list). The user is allowed to check whether the solution candidates in the set F of solution candidates are solutions, or to change the set F of solution candidates, the set G of deleted solution candidates, the thesaurus T or the subset R of thesaurus T and the retrieval query Q in an itemized manner depending upon the following information. The CPU 3 displays information related to each of the case list elements on the display unit 1. The CPU 3 deletes, adds or changes the case list elements according to
22

information input by the user through the input unit 2, stores the data in the main storage unit 4, and suitably reads the thesaurus related to the data and the data related to the search space from the thesaurus storage unit 5 and the database 6.
Fig 6 is a diagram illustrating a display screen. This diagram illustrates a display of the procedure at step S3 concerning the retrieval of a sentence of a natural language and an interface for supporting the interaction. (1) to (5) in the drawing correspond to (1) to (5) described below.
(1) Set F of solution candidates.
Displayed here is a list of solution candidates having high scores. In the drawing, bold characters represent words corresponding to the thesaurus expansion ofwordsinthe retrieval query. The user can carry out the operation on the display in a manner as described below.
* Check whether some elements in the set F of solution candidates are solutions. This can be done by, for example, using only those data that are displayed on the list. When they cannot be checked by those
23

data only, the individual solution candidates are clicked to display a wider surrounding range.
* Some elements in the set F of solution candidates are deleted
from the set F of solution candidates and are registered as elements of
the set G of the deleted solution candidates. This in Fig 6 is to
exclude the solution candidates (represented by black circles in the
drawing) included in F from F (as represented by open circles).
(2) Retrieval query Q.
Here, a > retrieval query is displayed. The user is allowed to add or delete nodes, and to insert or delete links as described below.
* In some cases where there is no link connecting two nodes in the retrieval query Q, such insert such a link.
* Delete some link from the retrieval query Q.
* Add some new node to the retrieval query Q.
24

* Delete some node from the retrieval query Q.
(3) Displayed here are the results having high scores obtained by expanding the labels ("functions", etc. in Fig 6) of nodes in the retrieval query Q by using thesaurus. More specifically, this is a list of labels (elements) M of nodes of the database D in which T (L,M) is defined in the thesaurus T concerning the labels L of nodes x (for each of the nodes x of the retrieval query Q). The user can specify weather the elements be included in the retrieval range (represented by black circles in Fig 6) or not (represented by open circles).
* Subset R of the thesaurus T, which has not defined R(L, M) for several elements M of the list, is expanded for their definition such that R(L,M) = T(L,M). Or, concerning some elements M of the list, the definition of T(L,M) can be used in the thesaurus T.
* For some elements M of the list that have been defined for their R(L,M), the definition of R is reduced and R(L,M) is not defined. Equivalently, for some elements M of the list, use of the definition T(L,M)is inhibited.
25

(4) Displayed here are labels of nodes that can be added to the retrieval query being directly connected to the nodes (such as "functions" in Fig 6) of the retrieval query Q. In more details, the list is the list of the labels L for which there exists a node zeF(x) and a node y such that the link y - z is included in the database D (for each in the nodes x of the retrieval query Q) and the label of y is L. When there are small number of such that there exists a node zeF (x) and the link y - z is in the database D and the label of y is L) corresponding to the label L, the labels of some nodes around y may be added to L as an element of the list for every such y. The user can specify whether the retrieval query Q can be expanded (black circles) or not (open circles) depending upon the elements of the list in a manner as described below.
* A node Y with M as a label and a link x-Y are added to the retrieval query Q concerning some element M of the list. Namely, the retrieval query Q is expanded by M. M can also be directly input instead of being selected from the list.
26

(5) Displayed here are labels of nodes to be inserted between the two nodes (between the "function" and the "analysis" in Fig 6) of the retrieval query Q. In more detail, this is a list of labels of nodes z included in the shortest paths connecting the node f(x) to the node f(y) in the solution candidate f(for each of the links x-y of the retrieval query Q) and not contained in the range f (Q) for some solution candidate f. The user can specify whether each element of the list be inserted in the retrieval query Q (represented by black circles in Fig 6) or not (represented by open circles) in a manner as described below.
* A node z having the specified element of this list as a label, a link x-z and a link z-y are added to the retrieval query Q. Namely, this element is inserted in the retrieval query Q.
Step S4 returns back to step S2 if the user requests the re-retrieval by clicking the "RETRIEVAL EXECUTION BUTTON". On the other hand, the processing ends if there is no request for the re-retrieval.
27

The information retrieval method and the information retrieval apparatus / system of the invention can be provided as an information retrieval program for having a computer execute the procedures, as a computer-readable recording medium on which an information retrieval program is recorded, as a program product which includes an information retrieval program and can be loaded in an internal memory of a computer, and as a computer such as a server including a program.
Industrial Applicability
The invention makes it possible to improve the efficiency and accuracy of information retrieval by conducting an effective interaction by giving appropriate information to the user in the information retrieval as described above. The invention makes it possible to conduct the information retrieval maintaining a high efficiency and a high pin¬point accuracy by interactively inputting or revising the retrieval query and retrieval range while automatically narrowing down the retrieval space by utilizing a database and a graph structure specific to the retrieval query.
28

The invention further makes it possible to treat the retrieval query and the database as graphs having an indefinite structure like natural - language sentences, and to improve the efficiency and accuracy of retrieval enabling the user to interact with the retrieval engine with the structure as a clue.
29

We Claim:
1. An apparatus for information retrieval comprising:
an input unit, a database D storing the nodes and labels, a thesaurus storage unit for storing thesaurus T for defining a degree of similarity among the labels of nodes, a display unit, and a processing unit, wherein
the processing unit receives, through the input unit, input for a retrieval query Q including information related to nodes, labels of nodes and links among the nodes;
the processing unit finds a set F of solution candidates as a result of searching the database in response to the retrieval query Q by making reference to the thesaurus storage unit storing thesaurus T for defining a degree of similarity among the labels of nodes, using similarity among the labels defined by a subset R of thesaurus T according to links that are input, f and making reference to the database D storing the nodes and labels that are input;
the processing unit displays a set F of solution candidates that are found on the display unit;
the processing unit receives, through the input unit, input information related to whether some elements in the set F of solution candidates are
30

representing the solutions;
the processing unit deletes some elements in the set F of solution candidates from the set F of solution candidates according to input information;
the processing unit deletes, adds or changes the contents of the subset R and/or the retrieval query Q based on input information related to deleting, adding or changing the subset R of the thesaurus T and/or the retrieval query Q input through the input unit; and
the processing unit returns to the step of retrieval if there is a request for re-retrieval from the user to repeat the processing, or ending the processing if there is no request.
2. An apparatus for information retrieval according to claim 1, wherein
the processing unit displays a retrieval query Q on the display unit;
the processing unit receives, through the input unit, input information which, when there is no link connecting two nodes of the retrieval query Q, instructs to insert a link;
the processing unit inserts the link according to input information;
the processing unit receives, through the input unit, input information for instructing the deletion of the link in the retrieval query Q;
31

the processing unit deletes the link according to input information;
the processing unit receives, through the input unit, input information for instructing the addition of a new node to the retrieval query Q;
the processing unit adds the node to the retrieval query Q according to input information;
the processing unit receives, through the input unit, input information for instructing the deletion of the node of the retrieval query Q without an end point of the link; and
the processing unit deletes the node from the retrieval query Q according to input information.
3. An apparatus for information retrieval according to claim 1 or
2, wherein
the processing unit displays, on a display unit, a list of labels M of nodes of a database D, such that the value T(L, M) representing the degree of similarity between a label L of a node x and a label M of a node in the database D is defined in the thesaurus T in the thesaurus storage unit, for every node x of the retrieval query Q;
the processing unit receives, through an input unit, input information for so instructing that each such label M be selected or not selected; and
32

the processing unit, according to the input information, expands the definition of the subset R to include R(L, M) = T(L, M) concerning those M for which R(L, M) has not been defined in the subset R of the thesaurus T, or reduces the definition of R so as not to define R(L, M) concerning those M for which R(L, M) has been defined.
4. An apparatus for information retrieval according to any one of claims
1 to 3, wherein
the processing unit displays, on a display unit, a list of labels of nodes y, such that there exists a node zeF(x) and a link y-z is a link in the database D, for each of the nodes x in the retrieval query Q;
the processing unit receives, through the input unit, input information for so instructing that some of such labels be selected; and
the processing unit adds a node y with L as a label and a link x-y to the retrieval query Q for each of the selected labels L according to the input information.
5. An apparatus for information retrieval according to any one of claims
1 to 4, wherein
the processing unit displays, on a display unit, a list of labels of nodes
33

z which are included in the shortest paths connecting a node f(x) to a node f(y) in the range of a solution candidate f and not contained in the range f(Q) of some solution candidate f, for each of the links x-y of the retrieval query
Q;
the processing unit receives, through an input unit, input information for so instructing that some of such labels be selected; and
the processing unit adds a node z with the selected element of the list as a label, and links x-z and z-y to the retrieval query Q according to the input information.
6. An apparatus for information retrieval comprising :
an input unit, a database D storing the nodes and labels, a thesaurus
storage unit for storing thesaurus T for defining a degree of similarity among
the labels of nodes, a display unit, and a processing unit, wherein
the processing unit receives, through the input unit, input for a
retrieval query Q including information related to nodes, labels of nodes and
links among the nodes;
the processing unit finds a set F of solution candidates as a result of
searching the database in response to the retrieval query Q by making
reference to the thesaurus storage unit storing thesaurus T for defining a
34

degree of similarity among the labels of nodes, using the degree of similarity among the labels defined in a portion of thesaurus T determined to be usable according to the retrieval query Q that is input, and making reference to the database D;
the processing unit displays the set F of solution candidates that are found on the display unit;
the processing unit receives, through the input unit, input information related to whether some elements in the set F of solution candidates represent solutions;
the processing unit deletes some elements from the set F of solution candidates according to input information;
the processing unit deletes, adds or changes the content of the thesaurus T and/or the retrieval query Q based on input information related to deleting, adding or changing the thesaurus T and/or the retrieval query Q input through the input unit; and
the processing unit returns to the step of retrieval if there is a request for re-retrieval from the user to repeat the processing, or ends the processing if there is no request.
7. An apparatus for information retrieval according to claim 6, wherein
35

the processing unit displays a retrieval query Q on a display unit;
the processing unit receives, through an input unit, input information which, when there is no link connecting two nodes of the retrieval query Q, instructs to insert a link;
the processing unit inserts the link according to input information;
the processing unit receives, through the input unit, input information for instructing the deletion of the link of the retrieval query Q;
the processing unit deletes the link according to input information;
the processing unit receives, through the input unit, input information for instructing the addition of a new node to the retrieval query Q;
the processing unit adds the node to the retrieval query Q according to input information;
the processing unit receives, through the input unit, input information for instructing the deletion of the node of the retrieval query Q; and
the processing unit deletes the node from the retrieval query Q according to input information.
8. An apparatus for information retrieval according to claim 6 or 7, wherein
the processing unit displays, on a display unit, a list of labels M of
36

nodes of the database D, such that the value T(L, M) representing the degree of similarity between a label L of a node x and a label M of a node in the database D is defined in the thesaurus T in the thesaurus storage unit, for every node x in the retrieval query Q;
the processing unit receives, through the input unit, input information instructing that some such labels M be selected or not selected, input information instructing a change in the value T(L, M) for several such labels M, and input information for specifying several such arbitrary new labels N; and
the processing unit permits the definition of T(L, N) to be used in the thesaurus T for the selected label M, inhibiting the definition of T(L, M) from being used for the unselected element M, changing the value of T(L, M) into a specified value for the specified label M, or setting the value of T(L, N) to 1 while permitting the definition of T(L, N) to be used for the specified label N.
9. An apparatus for information retrieval according to any one of claims 6 to 8, wherein:
the processing unit displays the following list on the display unit, for each node x in the retrieval query Q
37

{ L | For a node y and a node z F(x), L is the label of y and the link y-z is contained in the database D. }
the processing unit receives, through an input unit, input information for so instructing that some labels be selected; and
the processing unit adds a node Y with L as a label and a link x-Y to the retrieval query Q for each of the selected labels L according to input information.
10. An apparatus for information retrieval according to any one of claims 6 to 9, wherein, for each label M in the above list, when the size of the following set is smaller than a predetermined value, the processing unit can display, on the display unit, labels of some nodes around y in addition to the label M as elements of the above list, for every element y of the following set.
{ y | The label of y is M.
For a node z F(x),
the link y-z is contained in the database D. }
38

11. An apparatus for information retrieval according to any one of claims
6 to 10, wherein
the processing unit displays, on a display unit, a list of labels of nodes z which are included in the shortest paths connecting a node f(x) to a node f(y) in the range of a solution candidate f andnot contained in the range f(Q) of some solution candidate f, for each of the links x-y of the retrieval query
Q;
the processing unit receives, through the input unit, input information for so instructing that some of such labels be selected; and
the processing unit adds a node z with the selected element of the list as a label, and links x-z and z-y to the retrieval query Q according to the input information.
12. A method for retrieval of information recorded on a computer-
readable recording medium comprising:
a step in which a processing unit receives, through an input unit, input for a retrieval query Q including information related to nodes, labels of nodes and links among the nodes;
a step in which the processing unit finds a set F of solution candidates as a result of searching the database in response to the retrieval query Q by
39

making reference to a thesaurus storage unit storing thesaurus T for defining a degree of similarity among the labels of nodes, using similarity among the labels defined by a subset R of thesaurus T, and makes reference to the database D storing the nodes and labels that are input;
a step in which the processing unit displays a set F of solution candidates that are found on a display unit;
a step in which the processing unit receives, through the input unit, input information related to whether some elements in the set F of solution candidates represent solution;
a step in which the processing unit deletes some elements from the set F of solution candidates according to input information;
a step in which the processing unit deletes, adds or changes the content of the subset R and/or the retrieval query Q based on input information related to deleting, adding or changing the subset R of the thesaurus T and/or the retrieval query Q input through the input unit; and
a step of returning to the step of retrieval if there is a request for re-retrieval from the user to repeat the processing, or ending the processing if there is no request.
13. A method for retrieval of information recorded on a computer-
40

readable recording medium comprising:
a step in which a processing unit receives, through the input unit, input for a retrieval query Q including information related to nodes, labels of nodes and links among the nodes;
a step in which the processing unit finds a set F of solution candidates as a result of searching the database in response to the retrieval query Q by making reference to the thesaurus storage unit storing thesaurus T for defining a degree of similarity among the labels of nodes, using the degree of similarity among the labels defined in a portion of thesaurus T determined to be usable, and making reference to a database D storing nodes and labels that are input;
a step in which the processing unit displays a set F of solution candidates that are found on a display unit;
a step in which the processing unit receives, through the input unit, input information related to whether some elements in the set F of solution candidates represent the solutions;
a step in which the processing unit deletes some elements from the set F of solution candidates according to input information;
a step in which the processing unit deletes, adds or changes the content of the thesaurus T and/or the retrieval query Q based on input
41

information related to deleting, adding or changing the thesaurus T and/or the retrieval query Q input through the input unit; and
a step of returning to the step of retrieval if there is a request for re-retrieval from the user to repeat the processing, or ending the processing if there is no such request.
Dated this l0th day of May 2004
(Jose M A)
of Khaitan & Co
Agent for the Applicants
42

Documents:

278-mumnp-2004-assignment(15-5-2004).pdf

278-mumnp-2004-cancelled pages(10-11-2005).pdf

278-mumnp-2004-claims(granted)-(10-11-2005).pdf

278-mumnp-2004-claims(grated)-(10-11-2005).doc

278-mumnp-2004-correspondence(25-7-2006).pdf

278-mumnp-2004-correspondence(ipo)-(8-1-2008).pdf

278-mumnp-2004-drawing(10-11-2005).pdf

278-mumnp-2004-form 1(13-5-2004).pdf

278-mumnp-2004-form 1(15-7-2004).pdf

278-mumnp-2004-form 19(14-5-2004).pdf

278-mumnp-2004-form 2(granted)-(10-11-2005).pdf

278-mumnp-2004-form 2(grated)-(10-11-2005).doc

278-mumnp-2004-form 26(25-6-2004).pdf

278-mumnp-2004-form 3(13-5-2004).pdf

278-mumnp-2004-form 3(4-4-2005).pdf

278-mumnp-2004-form 5(4-4-2005).pdf

278-mumnp-2004-form 8(10-11-2004).pdf

278-mumnp-2004-form-pct-ipea-409(13-5-2004).pdf

278-mumnp-2004-petition under rule 137(4-4-2005).pdf

abstract1.jpg

« Previous Patent

Next Patent »

Patent Number

213556

Indian Patent Application Number

278/MUMNP/2004

PG Journal Number

09/2008

Publication Date

29-Feb-2008

Grant Date

08-Jan-2008

Date of Filing

13-May-2004

Name of Patentee

JAPAN SCIENCE AND TECHNOLOGY AGENCY

Applicant Address

4-1-8 HON-CHO, KAWAGUCHI-SHI SAITAMA 332-0012

Inventors:

#	Inventor's Name	Inventor's Address
1	HASIDA KOITI	5-3-311, HINOBE URAYASU-SHI, CHIBA 279-0013,

PCT International Classification Number

G06F 17/30

PCT International Application Number

PCT/JP02/04945

PCT International Filing date

2002-05-22

PCT Conventions:

#	PCT Application Number	Date of Convention	Priority Country
1	2001-319290	2001-10-17	Japan