|Title of Invention||
EFFICIENT EXTRACTION OF XML CONTENT STORED IN A DATABASE
|Abstract||A method and system are provided for extracting a valid, self-contained fragment for a node in a XML document stored in a database management system. An XML index is used to identify a location in which XML fragment data corresponding to the node is located. Ancestors of the node are identified and examined for any information needed for the proper interpretation of the fragment. If an ancestor node contains such needed information, this information is patched into the XML fragment to ensure that the fragment is a valid, self-contained XML fragment.|
|Full Text||EFFICIENT EXTRACTION OF XML CONTENT STORED IN A LOB
FIELD OF THE INVENTION
 The present invention relates to managing information and, more specifically, to extracting valid, self-contained XML fragments identified by XPath path expressions from stored XML data. BACKGROUND
 In recent years, database systems that allow storage and querying of extensible Markup Language data ("XML data") have been developed. Though there are many evolving standards for querying XML, all of them include some variation of XPath. XPath is a language that describes a way to locate and process items in XML documents by using an addressing syntax based on a path through the document's logical structure or hierarchy. The portion of an XML document identified by an XPath "path expression" is the portion that resides, within the structure of the XML document, at the end of any path that matches the path expression.  XML documents that are managed by a relational database server are typically stored as unstructured serialized data in some form of a LOB (Large Object) datatype. For example, an XML document may be stored in unstructured storage, such as a CLOB (Character LOB) or a BLOB (Binary LOB), or the document may be stored as an O-R (object relational structure that uses an XML schema).
 No matter how the XML document is stored, in order to fulfill many XPath queries, a method of identifying and extracting a fragment of a stored XML document matching an XPath path expression is needed.
 Unfortunately, even database systems that have built-in support for storing XML data are usually not optimized for handle path-based queries, and the query performance of the databases systems leaves much to be desired. In specific cases where an XML schema definition may be available, the structure and data types used in XML instance documents may be used to optimize XPath queries. However, in cases where an XML schema definition is not available, and the documents to be searched do not conform to any schema, there are no efficient techniques for path-based querying.
 Ad-hoc mechanisms, like a full scan of all documents, or text keyword-based indexes, may be used to increase the performance of querying documents when no XML schema definition is available. However, these mechanisms do not fulfill the need for an
50277-2764 (OID 2004-100-0i-PCT)
Efficient method of quickly identifying and extracting a fragment of a stored XML document that matches an XPath path expression.
 Even if a method of quickly identifying a location for a fragment of stored XML data were available, a method of efficiently extracting the fragment from the identified location is still needed. The fragment, as it exists at the identified location, may not be a valid, self-contained XML document. For example, namespace prefixes used within a fragment may be declared outside of that fragment, and therefore the fragment retrieved from the identified location will not have all the needed declarations.
 Based on the foregoing, there is a clear need for a system and method for identifying and extracting valid, self-contained XML fragments that match an XPath path expression.  The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
BRIEF DESCRIPTION OF THE DRAWINGS
 The present invention is illustrated by way of example, and not by way of limitation,
in the figure of the accompanying drawing and in which like reference numerals refer to similar
elements and in which:
 FIG. 1 is a block diagram of a system upon which the techniques described herein
may be implemented; and
 FIG. 2 is a flowchart illustrating steps for efficiently providing a self-contained XML
fragment in response to a request.
 In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.
EXAMPLE XML DOCUMENTS
 For the purpose of explanation, examples shall be given hereafter with reference to the following two XML documents:
 As indicated above, pol .xml and po2.xml are merely two examples of XML documents. The techniques described herein are not limited to XML documents having any particular types, structure or content. Examples shall be given hereafter of how such documents could be indexed and accessed according to various embodiments of the invention.
THE XML INDEX
 U.S. Patent Application Serial No. 10/884,311, entitled INDEX FOR ACCESSING XML DATA, filed on July 2, 2004, (hereinafter the "XML Index application"), describes various embodiments of an index that may be used to efficiently access XML documents managed by a relational database server, based on XPath queries. Such an index shall be referred to herein as an XML index.
 An XML index as described in the XML Index application may be used to process XPath queries regardless of the format and data structures used to store the actual XML data (the "base structures"). For example, the actual XML data can reside in structures within or outside of a database, in any form, such as CLOB (character LOB storing the actual XML text), O-R (object relational structured form in the presence of an XML schema), or BLOB (binary LOB storing some binary form of the XML data).
 According to one embodiment, an XML index is a domain index that improves the performance of queries that include XPath-based predicates and/or XPath-based fragment extraction. An XML index can be built, for example, over both XML Schema-based as well as schema-less XMLType columns which are stored either as CLOB or structured storage. In one embodiment, an XML index is a logical index that results from the cooperative use of a path index, a value index, and an order index.
 The path index provides the mechanism to lookup nodes based on simple (navigational) path expressions. The value index provides the lookup based on value equality or range. There could be multiple secondary value indexes - one per datatype. The order index associates hierarchical ordering information with indexed nodes. The order index is used to determine parent-child, ancestor-descendant and sibling relationships between XML nodes.  When the user submits a query involving XPaths (as predicate or fragment identifier), the XPath statement is decomposed into a SQL query that accesses the XML index table. The generated query typically performs a set of path, value and order-constrained lookups and merges their results appropriately.
 For the purpose of explanation, the techniques described herein are described in a context in which an XML index, as described in the XML Index application, is used to index the XML documents. However, the techniques described herein are not limited to any specific index structure or mechanism, and can be used to identify and extract valid self-contained XML fragments regardless of what method of querying is used.
THE PATH TABLE
 According to one embodiment, a logical XML index includes a PATH table, and a set of secondary indexes. As mentioned above, each indexed XML document may include many indexed nodes. The PATH table contains one row per indexed node. For each indexed node, the row in the PATH table for the node contains various pieces of information associated with the node.
 According to one embodiment, the information contained in the PATH table includes (1) a PATHID that indicates the path to the node, (2) "location data" for locating the fragment data for the node within the base structures, and (3) "hierarchy data" that indicates the position of the node within the structural hierarchy of the XML document that contains the node. Optionally, the PATH table may also contain value information for those nodes that are associated with values. Each of these types of information shall be described in greater detail below.
 The structure of an XML document establishes parent-child relationships between the nodes within the XML document. The "path" for a node in an XML document reflects the series of parent-child links, starting from a "root" node, to arrive at the particular node. For example, the path to the "User" node in po2.xml is /PurchaseOrder/Actions/Action/User, since the "User" node is a child of the "Action" node, the "Action" node is a child of the "Actions" node, and the "Actions" node is a child of the "PurchaseOrder" node.
 The set of XML documents that an XML index indexes is referred to herein as the "indexed XML documents". According to one embodiment, an XML index may be built on all of the paths within all of the indexed XML documents, or a subset of the paths within the indexed XML documents. Techniques for specifying which paths are indexed are described hereafter. The set of paths that are indexed by a particular XML index are referred to herein as the "indexed XML paths".
 According to one embodiment, each of the indexed XML paths is assigned a unique path identifier ("PATHID"). For example, the paths that exist in pol .xml and po2.xml may be assigned PATHIDs as illustrated in the following table:
 Various techniques may be used to identify paths and assign PATHIDs to paths. For example, a user may explicitly enumerate paths, and specify corresponding PATHIDs for the paths thus identified. Alternatively, the database server may parse each XML document as the document is added to the set of indexed XML documents. During the parsing operation, the database server identifies any paths that have not already been assigned a PATHID, and automatically assigns new PATHIDs to those paths. The PATHID-to-path mapping may be stored within the database in a variety of ways. According to one embodiment, the PATHID-to-path mapping is stored as metadata separate from the XML indexes themselves.  According to one embodiment, the same access structures are used for XML documents that conform to different schemas. Because the indexed XML documents may conform to different schemas, each XML document will typically only contain a subset of the paths to which PATHIDs have been assigned.
 The location data associated with a node indicates (1) where the XML document that contains the node resides within the base structures, and (2) where the XML fragment that corresponds to the node is located within the stored XML document. Thus, the nature of the location data will vary from implementation to implementation based on the nature of the base structures. Location information is typically added to the PATH table as XML documents are parsed.
 For the purpose of explanation, it shall be assumed that (1) the base structures are tables within a relational database, and (2) each indexed XML document is stored in a corresponding row of a base table. In such a context, the location data for a node may include, for example, (1) the identifier of the row ("RID") in the base table in which the XML document containing the node is stored, and (2) a locator that provides fast access within the stored XML document, to the fragment data that corresponds to the node.
 A locator is conceptually a piece of information that "points" into the original document, and is typically used to retrieve fragment data starting from that point. The locator is dependent on the actual storage used for the XML documents, and can be different for CLOB, OR or BLOB forms of storage. For example, the locator for a node in an XML document that is stored in a CLOB could be the starting character offset within the CLOB at which the node starts. In addition, a byte length for the node may be stored as part of the locator. Together, this information provides starting and ending locations within a stored XML document, and can be used to efficiently extract an XML fragment. For example, a locator may be used to retrieve a XML fragment containing a node that matches a specified XPath query by extracting data, beginning at the character offset specified by the locator, and reading the data for the number of bytes indicated by the locator.
 Locators can be more complex than character or byte offsets, however. For example, a locator could include certain flags. As another example, if the XML document is stored shredded into relational table(s), the locator could contain appropriate table and/or row identifier(s), etc.
 The PATH table row for a node also includes information that indicates where the node resides within the hierarchical structure of the XML document containing the node. Such hierarchical information is referred to herein as the "OrderKey" of the node.  According to one embodiment, the hierarchical order information is represented using a Dewey-type value. Specifically, in one embodiment, the OrderKey of a node is created by appending a value to the OrderKey of the node's immediate parent, where the appended value indicates the position, among the children of the parent node, of that particular child node.  For example, assume that a particular node D is the child of a node C, which itself is a child of a node B that is a child of a node A. Assume further that node D has the OrderKey 22.214.171.124. The final "3" in the OrderKey indicates that the node D is the third child of its parent node C. Similarly, the 4 indicates that node C is the fourth child of node B. The 2 indicates that Node B is the second child of node A. The leading 1 indicates that node A is the root node (i.e. has no parent).
 As mentioned above, the OrderKey of a child may be easily created by appending to the OrderKey of the parent a value that corresponds to the number of the child. Similarly, the OrderKey of the parent is easily derived from the OrderKey of the child by removing the last number in the OrderKey of the child.
.  According to one embodiment, the composite numbers represented by each OrderKey are converted into byte-comparable values, so that a mathematical comparison between two OrderKeys indicates the relative position, within the structural hierarchy of an XML document, of the nodes to which the OrderKeys correspond.
 For example, the node associated with the OrderKey 126.96.36.199 precedes the node associated with the OrderKey 1.3.1 in the hierarchical structure of an XML document. Thus, the database server uses a conversion mechanism that converts OrderKey 188.8.131.52 to a first value, and to convert OrderKey 1.3.1 to a second value, where the first value is less than the second value. By comparing the second value to the first value, the database server can easily determine that the node associated with the first value precedes the node associated with the second value. Various conversion techniques may be used to achieve this result, and the invention is not limited to any particular conversion technique.
 Some nodes within an indexed document may be attribute nodes or nodes that correspond to simple elements. As used herein, a "simple element" is an element that does not have any attributes or children elements, and whose value is a single text string. For example, in "pol.xml", the "Reference" element is a simple element with a single text value of "SBELL-2002100912333601PDT".
 According to one embodiment, for attribute nodes and simple elements, the PATH table row also stores the actual value of the attributes and simple elements. Such values may be stored, for example, in a "value column" of the PATH table. The secondary "value indexes", which shall be described in greater detail hereafter, are built on the value column.
PATH TABLE EXAMPLE
 According to one embodiment, the PATH table includes columns defined as specified in the following table:
 As explained above, the PATHID is an identifier assigned to the node, and uniquely represents a fully expanded path to the node. The ORDERKEY is a system representation of the Dewey ordering number associated with the node. According to one embodiment, the internal representation of the OrderKey also preserves document ordering.  The VALUE column stores the effective text value for simple element (i.e. no element children) nodes and attribute nodes. According to one embodiment, adjacent text nodes are coalesced by concatenation. As described in the XML Index application, a mechanism is provided to allow a user to customize the effective text value that gets stored in VALUE column by specifying options during index creation e.g. behavior of mixed text, whitespace, case-sensitive, etc can be customized. The user can store the VALUE column in any number of formats, including a bounded RAW column or a BLOB. If the user chooses bounded storage, then any overflow during index creation is flagged as an error.
 The following table is an example of a PATH table that (1) has the columns described above, and (2) is populated with entries for pol .xml and po2.xml. Specifically, each row of the PATH table corresponds to an indexed node of either pol .xml or po2.xml. In this example, it is assumed that pol.xml and po2.xml are respectively stored at rows Rl and R2 of a base table.
 In this example, the rowid column stores a unique identifier for each row of the PATH table. Depending on the database system in which the PATH table is created, the rowid column may be an implicit column. For example, the disk location of a row may be used as the unique identifier for the row. As shall be described in greater detail hereafter, the secondary Order and Value indexes use the rowid values of the PATH table to locate rows within the PATH table.
 In the embodiment illustrated above, the PATHID, ORDERJCEY and VALUE of a node are all contained in a single table. In alternative embodiment, separate tables may be used to map the PATHID, ORDERKEY and VALUE information to corresponding location data (e.g. the base table RID and LOCATOR).
 In the embodiment illustrated above, the information in the "RID" and the "LOCATOR" columns of the PATH table is used to identify a location where the indexed node is stored. In this example, each row in a base table corresponds to an indexed XML document. Each row in the base table rows uses a CLOB to store the associated XML document. The RID column in the PATH table identifies the row in the base table where the XML document is stored as a CLOB, and the LOCATOR column stores a character offset into the CLOB where the indexed node starts and a character length for the node.
 For example, the above-mentioned sample XML documents pol .xml and pol .xml are stored in unstructured serialized form in rows Rl and R2 of the base table as CLOB data structures. The node identified by rowid "1" in the PATH table is located starting at character 1 of the CLOB stored in base table row Rl, and has a length of 350 characters. As another example, the node identified by rowid "9" is located in row R2 of the base table, and starts at character 72 with a length of 36 characters. This row of the PATH table corresponds to the first
 The example shown in the populated PATH table above illustrates an embodiment in which locator information is not stored for simple elements and attribute nodes. In other embodiments, locator information could be stored and maintained for all nodes, including simple elements. In addition, the example shown in the populated PATH table illustrates an embodiment in which the LOCATOR column stores both offset and length information. In alternative embodiments, only offset information may be stored. Alternatively, as discussed
above, other types of locator information may be stored in the LOCATOR column. The techniques described herein are not dependent on any particular type of location data.
 The PATH table includes the information required to locate the XML documents, and/or XML fragments, that satisfy a wide range of queries. However, without secondary access structures, using the PATH table to satisfy such queries will often require full scans of the PATH table. Therefore, according to one embodiment, a variety of secondary indexes are created by the database server to accelerate the queries that (1) perform path lookups and/or (2) identify order-based relationships. According to one embodiment, the following secondary indexes are created on the PATH table.
• PATHID_INDEX on (PATHID, RID)
• ORDERKEYJNDEX on (RID, ORDER_KEY)
• VALUE INDEXES
• PARENT_ORDERKEY_INDEX on (RID,
 The PATHID JNDEX is built on the PATHID, RID columns of the PATH table. Thus, entries in the PATHIDINDEX are in the form (keyvalue, rowid), where keyvalue is a composite value representing a particular PATHID/RID combination, and rowid identifies a particular row of the PATH table.
 When (1) the base table row and (2) the PATHID of a node are known, the PATHID_INDEX may be used to quickly locate the row, within the PATH table, for the node. For example, based on the key value "3.R1", the PATHIDINDEX may be traversed to find the entry that is associated with the key value "3.R1". Assuming that the PATH table is populated as illustrated above, the index entry would have a rowid value of 3. The rowid value of 3 points to the third row of the PATH table, which is the row for the node associated with the PATHID 3 and the RID Rl.
THE ORDERKEY JNDEX
 The ORDERKEYINDEX is built on the RID and ORDER_KEY columns of the PATH table. Thus, entries in the ORDERKEY_INDEX are in the form (keyvalue, rowid),
where keyvalue is a composite value representing a particular RID/ORDERKEY combination, and rpwid identifies a particular row of the PATH table.
 When (1) the base table row and (2) the ORDERKEY of a node are known, the ORDERKEYJNDEX may be used to quickly locate the row, within the PATH table, for the node. For example, based on the key value "Rl.' 1.2'", the ORDERKEYJNDEX may be traversed to find the entry that is associated with the key value "Rl.' 1.2'". Assuming that the PATH table is populated as illustrated above, the index entry would have a rowid value of 3. The rowid value of 3 points to the third row of the PATH table, which is the row for the node associated with the ORDERKEY 1.2 and the RID Rl.
THE VALUE INDEXES
 Just as queries based on path lookups can be accelerated using the PATHID INDEX, queries based on value lookups can be accelerated by indexes built on the VALUE column of the PATH table. However, the VALUE column of the PATH table can hold values for a variety of data types. Therefore, according to one embodiment, a separate value index is built for each data type stored in the VALUE column. Thus, in an implementation in which the VALUE column holds strings, numbers and timestamps, the following value (secondary) indexes are also created:
• STRINGJNDEX on SYS_XMLVALUE_TO_STRING(value)
• NUMBERJNDEX on SYS_XMLVALUE_TO_NUMBER(value)
• TIMESTAMPJNDEX on SYS_XMLVALUE__TO_TIMESTAMP(value)
 These value indexes are used to perform datatype based comparisons (equality and range). For example, the NUMBER value index is used to handle number-based comparisons within user XPaths. Entries in the NUMBERJNDEX may, for example, be in the form (number, rowid), where the rowid points to a row, within the PATH table, for a node associated with the value of "number". Similarly, entries within the STRINGINDEX may have the form (string, rowid), and entries within the TIMESTAMP_INDEX may have the form (timestamp, rowid).
 The format of the values in the PATH table may not correspond to the native format of the data type. Therefore, when using the value indexes, the database server may call conversion functions to convert the value bytes from stored format to the specified datatype. In addition, the database server applies any necessary transformations, as shall be described
hereafter. According to one embodiment, the conversion functions operate on both RAW and BLO3 values and return NULL if the conversion is not possible.
 By default, the value indexes are created when the XML index is created. However, users can suppress the creation of one or more of value indexes based on the knowledge of query workload. For example, if all XPath predicates involve string comparisons only, the NUMBER and TIMESTAMP value indexes can be avoided.
 According to one embodiment, the set of secondary indexes built on the PATH table include a PARENT J3RDERKEY JNDEX. Similar to the ORDERJCEY index, the PARENT_ORDERKEYJNDEX is built on the RID and ORDERJCEY columns of the PATH table. Consequently, the index entries of the PARENT_ORDERKEY JNDEX have the form (keyvalue, rowid), where keyvalue is a composite value that corresponds to a particular RID/ORDER KEY combination. However, unlike the ORDER_KEY index, the rowid in a PARENT_ORDERKEYJNDEX entry does not point to the PATH table row that has the particular RID/ORDER KEY combination. Rather, the rowid of each PARENTORDERKEY JNDEX entry points to the PATH table row of the node that is the immediate parent of the node associated with the RID/ORDER KEY combination.  For example, in the populated PATH table illustrated above, the RID/ORDERJCEY combination "Rl.' 1.2'" corresponds to the node in row 3 of the PATH table. The immediate parent of the node in row 3 of the PATH table is the node represented by row 1 of the PATH table. Consequently, the PARENTJ3RDERKEYJNDEX entry associated with the "Rl.' 1.2'" key value would have a rowid that points to row 1 of the PATH table (i.e. rowid = 1).
USING THE XML INDEX TO PROCESS XPATH QUERIES
 As described above, an XML index improves the performance of XPath-based queries and fragment extraction by capturing the essential parts of an XML document - tags, values and nesting information - in PATH, VALUE and ORDER indexes. The PATH index is used to index the tags and provides a mechanism to identify fragments based on simple path expressions. The VALUE index allows the XML values to be indexed. The ORDER index associates hierarchical ordering information with indexed nodes, and is used to determine parent-child, ancestor-descendant and sibling relationships between XML nodes.
 When a user submits a query involving XPaths, the XPath expressions can be decomposed into SQL queries accessing the XML index table. The generated queries typically perform a set of path, value and order-constrained lookups and merge the results appropriately.  In particular, co-pending application U.S. Patent Application Serial No. 10/944,170, entitled "EFFICIENT QUERY PROCESSING OF XML DATA USING XML INDEX", filed September 16, 2004, (hereinafter the "Query Processing" application), describes various embodiments of a method for performing an "index-enabled" query that uses the XML index to identify the XML data corresponding to a specified path. In particular, the Query Processing application describes techniques for using the XML Index to evaluate the XPath operators.  More specifically, the Query Processing application describes techniques for (1) decomposing a generic path expression into simpler components such as simple paths, predicates and structural joins; (2) generating a SQL query against tables of the XML index, which may involve expressing the structural joins using SQL predicates on Dewey order keys of the indexed paths components; and (3) fragment extraction using locators that point to the original data.
 Index-enabled queries are generated based on path expressions, and access the PATH table of the XML index. The path expression of a path-based query, or fragments thereof, are matched against templates. Each template is associated with a rule. When a fragment of a specified path is in a format that matches a template, the corresponding rule is then used to generate SQL for an index-enabled query. This process is described in detail in the Query Processing application.
USING THE XML INDEX TO PROCESS EXTRACT() OPERATOR  One XPath operator that may be evaluated using the techniques described in the Query Processing application is the extract() operator. The result of an XPath extract() operator is an XMLType containing the XML fragment(s) of the XML document(s) that satisfy the specified XPath expression.
 As described in the Query Processing application, the extract() operator can be rewritten as an SQL query on the XML Index tables. For example, the extract() operator for an XPath query on the /PurchaseOrder/Actions nodes may be translated into an SQL query as follows:
Select extract(value(p), VPurchaseOrder/Actions') from po_tab p;
select xmlagg(select SYS-XMLINDEX_MKXML(rid, order_key, locator, value)
where pathid = :B1 and rid = p.rowid) from po_tab p
where :B1 = pathid(7PurchaseOrder/Actions') (pathid() is an internal function used to look up
the PATHID associated with the concerned path) and po_tab is the base table that contains the
stored XML documents.
 The SYS-XMLINDEX_MKXML() operator builds an XMLType image based on
the index column values. In one embodiment, this lookup may be implemented using the
SYS-XMLINDEXJ3ETFRAG() operator. Given a row identifier and a locator, the
SYS_XMLINDEX_GETFRAG() operator constructs an XMLType image consisting of an
XML fragment corresponding to the row identifier and locator.
 XMLAGG() is an operator that concatenates the fragments generated by the
SYS_XMLINDEX_MKXML() operator. Using the example above, for each row that contains
the node VPurchaseOrder/Actions', a fragment is retrieved from the base table and aggregated
into a single XMLType image.
 For example, using the populated PATH table above, the output of:
select extract(value(T), VPurchaseOrder/Reference') from xmltab T
would result in:
In one embodiment, the output returned is a single long string created by concatenating the above results, including start and end tags.
 The techniques described herein are used to implement the
SYS_XMLINDEX_GETFRAG() operator that obtains the actual text fragment corresponding to a node.
EFFICIENT EXTRACTION PROCESS
 Process 200 shown in FIG. 2 illustrates the steps of one technique for extracting an XML fragment, according to an embodiment of the invention. As shown, a node is first identified at step 210. Any technique, such as those describe in the XML Index and Query Processing applications, can be used to identify a node that matches a path expression.  Next, the node is examined at step 215 to determine if it is a simple element or a complex element. As mentioned above, simple elements are elements having no children or attributes, and whose value is a single text value. A complex element is an element that either has attributes or has element children.
 If the node is a simple element, then the fragment can be constructed without consulting the original XML document, using information stored in the XML index, as shown by step 220. If the node is a complex element, the original XML document stored in a base table is consulted to extract the fragment, as shown by step 230, and the extracted fragment is patched as needed for proper interpretation. Each process is described in more detail below.  Although the embodiment of the process shown in FIG. 2 takes advantage of the information stored in the XML index to construct the fragment without consulting the original XML document, it is not a requirement that simple and complex elements be treated differently. Fragments matching any type of element, simple or complex, can be extracted from the stored XML data.
SIMPLE ELEMENT FRAGMENTS
 When stored XML documents are indexed with an XML index, the values of simple elements are present in the VALUE column of the PATH table. Therefore, the XML fragment for simple elements can be constructed without consulting the base table that stores the original XML document. The fragment is built by adding appropriate start and end tags to the value obtained from the VALUE column of the PATH table for the identified node.  For example, the node VPurchaseOrder/Reference' is a simple element in the XML documents pol .xml and po2.xml above. The PATHID for the expression VPurchaseOrder/Reference' is first determined. In this example, the PATHID is "2". The PATH table is examined to determine if any nodes correspond to this PATHID (step 210). In this example, nodes with rowids of "2" and "7" are a match for PATHID=2. The process of FIG. 2 is executed for each matching node.
 At step 215, for both node 2 and node 7, it can be determined that each is a simple element by examining the LOCATOR and VALUE columns for these rows, as there is no Locator information, and the VALUE column contains a simple text string. For each of these
Simple element nodes, the process continues to step 220. In step 220, a fragment for the node can b,e built by creating a string that contains a start tag, a value and an end tag. The start tag is created by extracting the last component of the path associated with this PATHID (in this example "Reference"). The VALUE corresponding to this node in the PATH table is put in the fragment after the start tag. For example, the VALUE component of the fragment for node 2 is "SBELL-2002100912333601PDT". A close tag consisting of the close character 7' and the component string determined above (e.g. "Reference") completes the fragment string. By following this process, the fragment for node 2 is determined to be "
 Queries that extract only attributes may be treated like simple elements. However, elements containing attributes are treated as complex elements, discussed in more detail below.  Because the system can add the namespace and a generated prefix, simple elements do not need patching for proper interpretation, and the process continues to step 290 for simple elements.
EXTRACTING COMPLEX ELEMENTS USING THE XML INDEX
 For complex element nodes, the fragment must be parsed from base table that stores the XML document associated with the complex element. As discussed above, each row in PATH table corresponds to a node in an XML document, and includes a RID of the row in the base table that contains the original XML document and a locator for finding the node within the XML document stored in the base table.
 For example, an XPath extract() on the node /PurchaseOrder/Reference/Actions should result in the aggregated fragment:
 Unlike the simple elements described above, however, these fragments are extracted from,the stored XML documents. For example, the path expression
"/PurchaseOrder/Reference/Actions" corresponds to PATHID 3. From the PATH table, nodes with rowids 3 and 8 match this PATHID. The VALUE column for these rows is empty, and the LOCATOR column provides offset and length information for extracting the fragments. Therefore at step 215, it is determined that each of these nodes corresponds to a complex element, and the process continues to step 230.
 At step 230, fragment text corresponding to the node is located and read. For example, for node 3, the RID column indicates that the stored XML data is located at row Rl of the base table, and the LOCATOR field indicates that the fragment starts at character 64 and has a length of 56. The fragment text corresponding to node 3 can thus be created by extracting characters 64-120 from the CLOB in row Rl of the base table that contains "pol .xml". The XML fragment corresponding to node 8 can likewise be created by extracting characters 63-152 from the CLOB in row R2 of the base table that contains upo2.xml".
 In these examples, the extracted XML fragments happen to be valid. However, in many cases, the XML fragment extracted using these methods may not be self-contained. For example, the extracted fragment may contain or use references that are not defined within the fragment. The methods described herein allow for "patching" the fragments created using the above techniques to ensure that the resulting fragments are valid and self-contained.
PREFIXES AND NAMESPACES
 Since element names in XML are not fixed, a name conflict can occur when two different documents use the same names describing two different types of elements. One standard method of avoiding name conflicts is to use a prefix with the name.  For example, Tables 1 and 2 illustrate XML documents that both use a "table" element.
 If these two XML documents were both stored in database, there could potentially be an element name conflict because both documents contain a
|Indian Patent Application Number||4596/CHENP/2006|
|PG Journal Number||46/2013|
|Date of Filing||15-Dec-2006|
|Name of Patentee||ORACLE INTERNATIONAL CORPORATION|
|Applicant Address||500 ORACLE PARKWAY REDWOOD SHORES CA 94065 USA|
|PCT International Classification Number||G06F 17/30|
|PCT International Application Number||PCT/US05/20795|
|PCT International Filing date||2005-06-13|