Title of Invention

A METHOD FOR CHECKING AND RETRIEVING AN ITEM OF DATA AND A DATABASE FILESERVER APPARATUS

Abstract In a client/server computer environment having a fileserver 100 running a master ( database 126 and clients 130 supporting cache databases 136, inconsistent data write accesses are prevented by using a data locking technique, which locks data during the course of an up-date transaction requested by one client 130. This prevents access to the same data by another client. Data consistency is checked, I prior to the write access, by comparing a time stamp associated with a respective I cache database entry and a time stamp associated with the index to the corresponding data entry in the master database. Time stamp equivalence obviates the need to access the master database 126 or to transfer data across the client/server communications network 140.
Full Text

The present invention relates in general to database access techniques. In particular, aspects the invention relate to fileservers which support caching and to methods and apparatus for ensuring cache consistency.
In today's information age, the ability to access rapidly data which is held in databases is of utmost importance to companies and organisations whose business may
The speed with which remote equipment, or a client, is able to access data on a database relies on two major factors. The first factor is the speed with which a (1 at abase fileserver can read data typically from a eo located external storage device such as a hard disk drive. This factor relies on the speed of the central processing unit {CPU} and on the disk access speed of the fileserver. The second factor is the capacity, or the bandwidth, of link between the client and the fileserver.
A third factor effecting database access speed, which is extremely significant to the performance of HO overall system, is the loading on the system. The loaclinq is typically proportional in the number of clients which require database access. Both the number ol operations performed by the CPU and the volume of data which needs to bo transmitted across communications links increases as more clients require database access. Obviously, there is a point where the demands placed on the fileserver and communications links exceed optimum capability, at which point system performance will degrade,
System performance can be improved by increasing the speed ot the fileserver ami the bandwidth of the communications links. However, the related costs cannot always be justified. Also, there is always a limit to the speed at which current technologies can operate.
Another way of achieving belter system performance, by reducing the database CPU and communications link loading, is by using cached copies of the data. The cached copies are typically located physically nearer in Hie clients, or even on the client systems themselves, Indeed, this technique has been widely adopted on the Internet by using Internet hleservers containing cached copies of date located at 'mirror sites'. Tor example, master data accessible from a master inleruHl fileserver al a site in the USA might be copied, or cached, to a lileserver at a minor site in the UK, from where a majority of Kiropeau users might prefer to read the data. Thus, the data transfer overhead for transatlantic links and the demands placed on the master Internet fileserver are reduced, and the overall,

perceived Internet performance is improved. In particular, the European users would expect to obtain far better data access performance by accessing the UK mirror site.
The use of cached data does raise important issues concerning data consistency, That is to say, it is sometimes difficult to know whether cached data is the same as the original, master data: the master data may change in some way after the cached copy is generated. In the case of the Internet, for example, at present a lack of consistency between master data and cached data may not be of great significance and, if it is significant, it is usually possible to choose to retrieve the master data, albeit at a slower rate, from the master database.
Cache consistency, or coherency, is however extremely important in commercial environments where the data, whether cached or master, forms the basis for making business decisions. Inconsistent data, or data which takes too long to access, inevitably results in reduced revenue. A simple example of such an environment is one for making flight bookings, where, for example, the master flight data is held in a database in the UK and travel agencies around Europe book seats on flights on the basis of cached data. In this example, under most circumstances, it is essential that the cached data is consistent with the master data.
Similar considerations are important in general in multiple-user systems, for example based on client/server or distributed database, networked environments, in which the users wherever possible rely on cached data to minimise CPU, data access, data transfer overheads imposed on master database fileservers, and overall network traffic.
In, for example, a client/server environment having multiple clients accessing a single, master database fileserver, there is typically considerable opportunity to maintain with each client a large cache of recently read data. A cache may be used during an active transaction to avoid the need to re-read data from the master database, However, if data is cached on a client and the client needs to re-access that data at a later time, unless the data in the master database cannot change, it must be assumed that the data on the cache is inconsistent, because there is no way of knowing differently. Thus, further master database access is needed to at least compare the current state of the data in the master database with the data in the cache database.
A decision whether or not to use data cacheing typically depends on the type of data to be accessed. Three common categories of data are described below.

Static data, that is data which rarely changes, are porno candidates for holding in a client cache database: the data can always be assumed to be current. However, a task must provided to ensure that changes lo ihe static data are propagated to all client caches. This is typically done an overnight batch process. For static data, there is no need for a real-time process to maintain consistency.
Highly dynamic data is extremely difficult to maintain in a consistent slate, The basic reason is that if the data changes very often, Ihe network and processor impact, in a client/server environment, of up-datinn many client caches, can be considerable. In some cases, the cost in terms ol processing overhead and network bandwidth of maintaining the caches might exceed the cost of each client accessing the master database directly each time data is required. Thus, this category of data would typically not be cached,
In between static and highly dynamic data is a type of data which is not static, but which changes relatively infrequently compared with highly dynamic data. Typically, in this ease, data might only be cached onto a client cache database, lor example at a travel agency, during the period ol an aircraft flight enquiry and seat, booking operation. Then, there would be a high degree of certainly that the data remains consistent during the operation. However, there would never be total certainty because, coincidentally, another client, or travel agency, might book the only remaining scats on the respective flight between the times on which the first client started and completed its operation,
The simplest way to prevent data inconsistency, in for example the bight booking operation described above, would bo to 'lock' any dala on the master database which is being accessed by one client, thus making lhat data inaccessible, or at least only readable, to other clients for ihe whole period of the operation. This is called 'pessimistic locking'. Typically, a lock table, which holds the identities of locked data which cannot be accessed or written to, rs generated by the master fileservcr. Such a system requires that all access requests made by clients invoke a lock table search on the fileservcr before data access or writing is allowed, or denied.
Obviously, however, in an environment where the same data might need to he accessed or up-dated by several clients for example for flight-booking purposes, pessimistic locking represents an unworkable solution with intolerable locking overheads.
Another method for dealing with possible inconsistency between cached and master data is discussed in ihe book "Transaction Processing Concepts and

techniques" by Gray J and Router A, published by Morgan Kautmann 1993 on partes 434 to 415. The method involves 'optimistic locking'.
Optimistic locking allows clients connected to a fileserver in use cached data at any time for reading purposes, but as soon as a transaction is initiated, for example making a flight seat booking, the cached data is compared with the maslor data TO ensure data consistency one The master data is locked only (or the period of the actual transaction (in example, the transaction is the actual database access and write procedure). This prevents another client horn changing the master data during the transaction. If, after the transaction has been initiated, the in the cache database is found to be inconsistent with The corresponding data m the master database, the cache database is updated with the latest master data, and the client is notified and left to resolve any problems the inconsistency might have caused.
The advantage of optimistic locking is that the master data is only locked for a very short period to time, lor maybe less than one second, to carry out the artunl transaction. In contrast, pessimistic locking requires the data accessed to be locked for the whole period of an operation, for example for (he whole period of an enquiry and a transaction, which might rake many minutes.
Although optimistic locking may therefore require extra processing by both a client and a fileserver, it can b« seen that the technique is far boner suited for dealing with relatively dynamic data which may need to be accessed by multiple clients.
When implementing optimistic locking, it is known sometimes to use time stamping to mark data rows in The roaster database with the time the data was last updated. In this way, if the time stamp for a particular row ot data held in a cache is the same as the row in the master database, the tilesetver accepts that rhe cached data is current and there is no need to re-sond the data row in question across the network to the client. Thus, network bandwidth is conserved whenever a cached copy is found to be current.
In accordance with a first aspect, the present invention provides a method for checking the consistency of an item of data in a each database with a respective item of data in a master database by comparing a first key stored in association with the item of data in the cache database with a second key stored in association with an index entry for the respective item to data in the master database.
In accordance with a second aspect, the present invention provides a method for retrieving an item of data from one of a cache or a master database,

I The master database comprising a plurality of items of master data and an index containing entries corresponding to one or more of the items to master data, the acheive database containing a cached copy of at least one item of the master data, I he method rerunning the steps of:
reading a first key stored in association with a cached copy a required item of data from the cache database.
reading a second key stored in association with an index entry for a respective item ol master data from the master database;
comparing first key with the second key; and
retrieving in the event the first and second keys are The same the cached copy of the item of data or in the event the first and second keys are different the respective item of master data.
In accordance with a third aspect, The present invention provides a database filescrver apparatus comprising;
input means for receiving a conditional read request lor an item of data stored in the database, the request including a first key from a previously retrieved copy of the item of data;
means for accessing an index of the database and reading an index entry for the requested item of data, the index entry including a second key for the stored itom of information;
moans for comparing the first and second keys; and
means if the keys are the same for returning an indication that the previously retrieved copy of the item of data is consistent or if the keys are different for reading from lhe database and returning 3 copy of the them ol data.
In accordance with a fourth aspect, the present invention provides a database index, wherein at least one index entry in the index includes at least;
identity information for identifying an item of data in the database,
location information for indicating the location in the database of the item of data; and
version information which changes each time the respective data in the database changes.
Embodiments of the invention arc particularly suited to use in a multiple-user computing environment, for example environment.
A key, or version information, preferably comprises a time stamp which retries to the last lime its respective data was up-dated. Alternatively, however, a key might be some other Indicator” which changes each time the respective data changes. For example, a key might comprise a version number, generated by an

incremental counter, which is incremented whenever the respective data is amended.
Advantageously, having keys associated with entries in the index of the master database obviates the need to read the actual master data. Hitherto, use of keys such as time stamps has been limited to associating keys with actual master darn, and not with the index entries therefore. Regardless of whether the Keys were the same, therefore, the data would have needed to he road from the master database for the purposes ot comparison.
In some embodiments, the master database is stored both m an external storage device arid in main memory of a
Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings, of which:
Figure 1 is a block diagram of an exemplary, two-tier client/server computing environment suitable for implementing an embodiment of the present invention;
Figure 2 is a diagram which represents a typical data structure for a database system;
Figures 3a to 3c are diagrammatic representations ol the row and index structures of a database for two prior art systems and for an embodiment of the present invention respectively;
Figure 4 is a flow chart representing a typical database transaction;
Figure 5 is a flow chart which represents a typical database read process;
Figure 6 is a flow chart which represents a database read process modified in line with an embodiment of the present invention;
Figure 7 is a flow chart which represents a database write process,
Figure 8 is y diagram representing a three-tier architecture on which an embodiment of the present invention can be implemented;
Figure 9 is an alternative arrangement to Figure 8;
Figure 10a is a diagram which illustrates a system k» testing the performance of an embodiment ol the present invention;
Figure 10b is a diagram which illustrates a system for establishing a set of base performance results;
Figure 11 is a graph which shows the CPU overhead for different caching or non caching scenarios;
Figure 1 2 is a graph which shows the CPU overhead for communications processing only for different caching or non-coaching scenarios;

Figure 13 is a graph which shows the CPU overhead for database access only for different caching or non caching scenarios;
Figure 14 is a graph which illustrates the number of road accesses required to retrieve data from a database for different caching or non-caching
scenarios; and
Figure 15 is a graph which illustrates how database access performance can lie improved hy maintaining index data in main memory rather than on an
internal storage device.
Figure 1 illustrated an exemplary, two tier client/server computer environment in which an embodiment of the present invention may be implemented.
A fileserver 100, for example a computing platform running the UNIX operating system, runs network and database management system (DBMS) software suitable for providing remote database access to a plurality ol clients 130 over a network 140. In this description, unless otherwise stated, the term "client" will be used to describe both the physical computer 130 connected to the network 140 and an operator of the computer. In a local area network environment, a suitable network might be an Ethernet network running TCP/IP software, and suitable DBMS software might be Oracle version 7, DBMS software allows creation and control of complex database systems and access to the data therein via appropriate structured query language (SUU calls, for further information on such systems, the reader is referred to the numerous texts and reference manuals on database systems which are widely available.
The file server 100 comprises standard computer components such as a processor 102, main memory 104 and input/output (I/O) hardware 106, all connected by suitable data and address buses 108. The fileserver supports, typically, a large capacity {for example 10 Gbytcs) external storage device such as a hard disk 120, on which substantially all the information in a masier database is stored.
The main memory 104 of the fileserver is split into areas lor standard program and data storage. Also, for the example ol running Oracle* database software, the main memory includes a large area of memory known as a system global area (SGA) 105, the purpose of which will be explained below. In this description, The terms "fileserver", "database" and "DBMS" may be used interchangeably and ali refer, in general, to a master data storage system to whicfi have data access.



As illustrated in Figure 3a, an index 225 includes a first index hold 310, lor example containing a destination, a second index field 311, for example containing an in week commencing value, and a third index held 312 which is a pointer to whore the corresponding data row 210 is held in tho data table 204. In • this example, therefore, flights can be searched by destination and in a particular week nf the year. An example search query might specify all flights to Paris in the 25th week of the current year. The DBMS, in response, would search the index for all flights to Paris in the week commencing 10 December 1905 and return the corresponding rows to the requesting client. Obviously, other search criteria, for 1 example by specific date, would require an index to have a specific data field instead of a week commencing field. In general, there typically needs to b« a different index for each different potential search criteria.
'I he cache database 136 comprises a storage area 230 for cached (.lata. The cached data typically comprises data rows 235 copied from the masler database 126. The rows 235 themselves are typically exact copies of the rows 210 in (he data table 204, although this need not be (he ease. Thus, the row configuration illustrated in Figure 3a is common to both the cache md the master database data rows.
The cache database 136 may or may not have a corresponding index, depending on the si/e and acceptable access speed ol the cache database, For the present purposes, it will be assumed that the cache database has no index and is Iherefore searched by sequential access.
The SGA memory 105 on the fileserver 100 is located predominantly, or preferably wholly in a well-tuned system, in main memory 104, I he SUA 105 is a form of internal cache used by Oracle lo speed up master database access for clients 130. The existence and operation of the SGA 105 is typically invisible to users or clients of the system. DBMSs other than Oracle implement similar schemes.
Whenever a client requests an item of data, the fileserver 100 looks initially in (he SGA 105, then in the master database 126, for the respective index entry and then looks in the SGA. followed by the master database a copy of I he row or rows. If either the index entry or the respective master data is not present in the SGA 105, the fileserver accesses the master database 126 arid copies the index entry or the data to the SGA 105. When a low needs to be copied from the master database 126 to the SGA 105, in tact, the whole page, or block, containing that row is copied to. the SGA 10b. Trom there, any access of the required row or rows is made via the SGA 105. In tins way, over time, the

not commonly accessed and/or dynamic data 'tends to reside tn trio t>UM IUO i(td, since access to main memory 104 can he in excess ot wn order of magnitude Lister than hard disk 120 access, the overall average access lime to The database '; reduced. In the present example, for ease of description only, it will be assumed duil only specific rows arc cached, rather than panes,
Presently, main memory is limited in capacity by tar more than external storage capacity. Therefore, the SGA 105 lends only to contain the most recently-used data. Also, since mam memory 104 tends to comprise RAM, which is volatile, the database needs to be updated regularly (tor example on a daily basis) with any changes to data held in the SGA 105. In el feet, the SGA 105 form:, an integral part of the master database and acts in a similar way to a write back' cache insofar as up dated data is periodically written back io the master database on the disk drive 1 20.
As has already been stated, lypically, a cache database 136 has a cache table arranged to hold data rows 235 which are copies o1 the respective data rows from I he master database 126.
Tor ease of understanding, in the following description, data access or write denial, due to the existence of a lock (which arc described in biief above), are not considered. It should be remembered, however, that in practice a lock mit]hi prevent a desired data access on the filoserver and thai the DDMS usually deals with such a situation in an appropriate manner, for example by boe/ing the operation of the denied client unhl the lock is removed.
The two main transactions carried out by a client in a database environment are 'read' data and 'up-date', or 'write', data: create or write data will for the present purposes be treated in the same way as up date data. A combination of read and write typically forms the basis lor client/servei database transactions
An example of a basic database operation, ignoring data caching, is illustrated in (he flow chart in Figure 4.
In Fipure 4, a client 130 transmits a query, in step 400, to lead data from the master database 126 In step 410, the fileserver 100 receives the query. The fileserver 100 returns the information, in step 420, to the client 130. The client then, in step 430, makes a decision on the basis of the data that has beep returned, An example of a decision might be which (light to book, from a number of available flights to Paris, on the basis of price. In step 440, a transaction is initiated by the client 130 to book a seat on a specific flight by nonsmimng the appropriate data to the fileserver 100, The data might include, lor example,

customer details of the person who wishes to book the seat on the flight to Paris and a write request to reduce seat availability on that flight accordingly. On receipt of the data, in stop 450, (he fileserver 100 searches for and locks the data to bo up-dated, by creating an entry in a lock table. Then, in stop 460, the fileserver 100, if possible, up-dates the flight data, which is read from a data table. \\] step 4/0, the? fileserver 100 commits the up-dated flight data and the now customer data to respective data tables in the database, I he data is unlocked by removing the lock table entry, in step 480, to allow access by other clients to the data Finally, in step 490, the tileserver 100 transmits a status response to the client that (he transaction has been successful.
Figures b and 6 illustrate in flow chart form respective database read procedures, taking account ol database cacheing, stimulated by a suitable read query for systems operating without and with a time stamp in the index entries respectively. In both examples, only one data row is requested bur it will be appreciated that queries for multiple rows are a possibility. Also, it is assumed that a cache copy of the data already exists in the cache database 1 3tt and that both cache copies of data and master copies of data include time stamps which reflect the last up date time for the respective data.
The following description of the process illustrated in Figure 5 assumes that the iiara rows and index entries have the structure shown in Figure 3h.
In Figure 5, in step bOO, the client 130 transmits a query, across lne network 140 to the tileserver 100, to read data from the fileserver. The query includes an identifier for the data row required and a respective time stamp. In step b05, the fileserver 100 receives the query. The hleserver 100, in step blO, accesses the SGA 105 in search of the index 225 for the data in question and retrieves the index if it is present. In step b15, il the index 22b is present, in the SGA 10b then, on the basis of the index value, the fileserver 100 accesses the data table 204 of the master database 1 2G to retrieve the data page 215 (which may he in the SGA or on the external storage device) containing the row 210, m a step b?3 On the other hand, if the index 225 is not present m the SGA 10b, ilien, in step 520, the fileserver 100 accesses the index 220 on the external stornno device and copies the index entry to the SGA 105, Then, in step 523, the tileserver 100 accesses the respective data page 21b of the master database 12b ro retrieve the data. At every stage, any data accessed which is not in the SGA 105 is copied to the SGA, In step 525, the fileserver 100 compares the time stumps ol the cached data row with the master data row requested, If the time stamps ore the same, the fileserver 100, in step 540, sends a reply to the client

130 that The cached data is still valid, or, join step 53b, if the time stumps are different, the fileserver 100 transmits the entire data row 210 (or only the ^quested columns of the data row) to the client 130 to update the cache database 130. In step M5, the client receives the response and acts according!/.
The following description of she process illustrated in Figure G assumes that thn data IOWS and index entries have the structure shown in Figure 3c.
In Figure 6, steps 600 to 620 are equivalent to steps 500 TO 520 in Figure 5. In step 625, the fileserver 100 compares the time stamp in the index entry with that of the query. If the tune stamps are the .same, in step 640 the fileserver 100 transmit1; a message to the client 130 that the cache copy is still valid. On the other hand, if the time stamps are not the same, the fileserver 100 accesses the master database 126, in step 630, to retrieve I he current data row. The current data row is then returnees! to the client 130 to update the c;*c;he database 136 in seeps 635 In step 645, the client 130 receives whatever data is returned by the fileserver 100 and acts accordingly.
The advantages and disadvantages of the method and database arrangement of the invention can be appreciated by comparing the flow charts in Figures 5 and 6, and by considering the field lengths shown in Figures 3a to 3c.
Obviously, using the row configurations illustrated in Figure 3a has the disadvantage that potentially a whole row of data would need to boo passed from then client 130 to the fileserver 100 to check for data consistency. However, using the configuration of 3b, there is only a requirement to transmit, a reference indicating the required row. and time stamp. Thus, in theory, the transmission overhead between Figures 3a and 3b is nut by 107 bytes (that is the difference between a whole row of 120 bytes and a reference, comprising a destination field ol 3 bytes, a week commencing field of 2 bytes and a time stomp field of 8 bytes ie. 120-13 102 bytes). With reference to Figures 5 and 0 which use the data configurations of Figures 3b and 3c respectively, initially the transmission overhead From the client 130 to the fileserver is the same. Thai is, the steps 500 to 520 m\d steps GOO to 620 arc equivalent. The first difference is between steps 525 and 625. In step 525, the fileserver 100 reads the whole data page containing the required data row from the data table 204 Lo enable comparison of the data row 210 time stamp with the lime stamp irji the query. Therefore, in Figure 5, the fileserver 100 must read the whole pans- containing the required row to enable a comparison of the time stamps.
In contrast, in step 625- the-fileserver 100 compares the query time stamp with the index entry time stamp, "Ihe fileserver 100 only needs to road the whole

d.ita page containing the row m the event the time stamps are determent. Consequently, external storage device 120 access is generally no( requued if the (nne stamps arc the same and the index entries iri question are in she SGA 105. Therefore, in theory, the database access rimes and The processing overhead of the hlescrver 100 can be reduced and the response times lot clients 130 can be reduced.
A disadvantage of the proposed method is the increased size required by the index By considering die index arrangements ot Hgures 3b wu\ 3c, it can be seen thai, there is about a 70% increase in index si/e. Iherctore, theoretically, the index size will increase by nearly 70% arid the search time might increase proportionally However, in practice, indexes are more complex and typically include many more index entries to allow more flexible query options. For example, a typical index might allow rows to be searched on dale and flight number as well as destination and week commencing, Thus, the addition of u time stamp might not incur sue!) a large overhead. In general, it can bo seen that lhe overhead of adding a time stamp to the index depends on the si/c and complexity of the index.
In some embodiments, it is only necessary to add tmm stamps to index entries for data which is classed as dynamic, such as, foi example, flight seat availability. Whereas, it would not be necessary to add time slumps lo index onirics lor static data such as, for example, flight number' and departure d*U« data. In this way, the time stamp overhead in lhe index may be reduced. Obviously, too SQI software and the DBMS would then need to deal with both possibilities.
While the example ol using time stamps has been explained in detail, it will bo appreciated that other methods for marking data rows to enable data consistency comparisons are possible, f-'or example, an incremental counter field might tit* used instead of a time stamp. Each time a data row is amended, lhe count value in the field would be incremented by a process on rhe hleserver. Thus a comparison between the increment value ol a cache data row and the increment value for an index entry for the master data row would highlight any changes lo the master data row, and thus any inconsistencies with the cache database row. Other comparable methods of identifying inconsistent data, for example using coding to produce unique codes representative ol the specific state of a row, will be apparent to the skilled addressee on reading this description.
In some embodiments of the invention, the S(iA 10b is large enough lo contain, at least, the whole index 206 permanently. 1 his is highly desirable since all index accesses for reading or writing use only main memory 104 access and

novei have to access the index 220 in (lie external storage device 120. II is envisaged that this will become common practice in future when memory becomes In any database system which allows data to be amended by a client, it is important for other clients to be able to access the updated data, if necessary, for carrying out a transaction. This would certainly be the case in the fliqht-bookinq scenario described above. In some scenarios, however, obtaining .1 current data value is riot so important. This might be I he case, for example, in a supermarket using point-of-sale (POS) terminals, or tills, which log all product sales to a database in the supermarket to facilitate automatic stock monitoring and ordering, lu this case, each time a can of beans is sold she till logs the sale to the database, which acts as a cache database to a master database at the supermarket chain's headquarters. It would not be important for each till to have knowledge of the number of cans of beans remaining in the supermarket or indeed the number of cans of beans stocked by the supermarket chain. Therefore, up-ro date data consistency between the cache database and the headquarters master database would be unnecessary, All data could be written back (from The cache database (o the master database) on a daily basis, which would be adequate In accommodate stock control for the supermarket chain.
For the present purposes, however, substantially only systems requiring consistent data, for example flight hooking systems, will be considered in connection with up-dating data.
One solution for ensuring data consistency is to pass all changes back to the master database immediately they are made. The data up-doling process is explained below in conjunction with the How chart in Figure 7.
According to Figure 7, in step 700 The client 130 transmits a wnle request to the fileserver 100 to up-date n specific data row. The request includes a copy of the new, up-dated row, and the time stamp of a previously-cached copy ol the row. The fileserver 100 receives the request in step 70&, In step 710, the fileserver 100 locks the data row in question from access by other clients by placing an entry for the row in a lock table. Then, in step 715, the hlescrver 100 compares the index entry time stamp for Ihe specified row with ihe time stamp in the request. If the time stamps are the same then the master data row can be over-written with the up dated version of the row, since clearly no changes have occurred to the master data row since the original cached copy was made. If, on the other hand, the time stamps nve different, then usually the master data row

cannot be over-written because clearly the master data row has changed since the original cached copy was made and, accordingly, the up-dates to the cached data row were made on the basis of inconsistent data.
In some cases, however, it is still valid to over-write the master data with the up-dated data even if the time stamps are different. One example of when an up-date woiild be possible, even if the cached data was not current, is if a client wants to book four seats on a flight, the cache database 136 reports that 100 seats are available but the master database 126 reports that only 90 scats are available: even though the cached data is inconsistent, the transaction is still possible. If, however, a client wants to book ten seats on a flight, the cache database 136 reports that twelve are available but the master database 126 reports that only three seats remain, then the transaction cannot be completed. The DBMS software would in practice carry out a test, to see whether a transaction is possible or not.
If in step 715, the fileserver 100 determines that the master data row can validly be over-written, in step 720 the row is over-written and a new time stamp for the row is generated and incorporated, Then, in step 725, the time stamp in the corresponding index is amended to the same value. In step 730, the amended row and index are committed to the SGA 105 of the master database (periodically, or when the SGA 105 becomes full, the new master data rows and index entries are written back to the external storage device 120). In step 735, the data row is unlocked by removing the lock table entry to make it available for access by other clients. The fileserver 100 then, in step 740, transmits a response to the client to acknowledge that the transaction has been completed. The response includes a time stamp value which matches the duly assigned time stamp of the amended row and index. The client, in step 745, receives the response, and in step 750 amends the row and updates the row time stamp with the new time stamp. In step 755, the client 130 writes the row to the cache database 136.
If in step 715 the required amendment cannot be made, in step 760, the
fileserver 100 unlocks the row and transmits, in step 765, a suitable response to
the client 130. A copy of the current master row is included in the response. In
step 770, the client receives the response and, in step 775, updates (he cache
database 136 with the current row, In step 780, the client is left with the task of
handling the discrepancy.
The write procedure described above ensures that all clients 130 are able to access current data if necessary. However," "the "procedure does incur an extra processing overhead on the fileserver 100 above that which would normally be

inquired in a prior art system. The extra overhead is in step 72b in winch tho filesorvcr 100 has to write to the index 220 to update tho index time stamp: normally, the index 220 is only written to if the row data in tho data table 215 is re-arranged for some reason, for example H new record is created or an old record *s deleted,
It is envisaged, however, that as SGAs increase in size and more index entries are stored in SGAs, this overhead will decrease in significance.
In some embodiments, rows of data may be arranged inio a yet which is stored in one or more page* of a database. This might bn The case if a client commonly deals with several rows of data at (he same time on a regular basis. I hen, the client defines a single query winch reads the set of rows, rather than roguinug a separate query for each row each time. The blesorver 100 hay a matching set definition to support this facility. Each row in the set has its own lime stamp. Thus, the overall time stamp of the sel can he taken to he the latest lime stamp of any row in the set. In this way, the database system need only compare set iunf: stamps instead of tho time stamps of each row TO determine it the data is current This raises the problem, however, that if another client deletes a row which is included in the set of another client, no latest time stamp is registered: tire row is removed so there is nowhere for the time stamp to go. Therefore, the set time stamp remains the same even though the .sat has been amended (by deletion).
One solution to this problem is to use some form of cyclic redundancy check (CRC) algorithm to ensure that the CRC of some property of tho rows m tho set or the index entries to the rows in the set is unchanged, I he code could be incorporated into the time stamp field itself, or into a specific field. Tor example, n 32 bit CRC check would give a 1 in 4,294,967,296 chance of an undetected error.
So far in this description the invention has been described in relation to a, so called, two tier computing environment in which a plurality ol clients have access to a master database. The present invention is, however, not limited in any sense to use on such a system, for example, the present invention can be implemented on multi tier systems. One such multi-tier system is shown in Kigure 8.
In Hguro 8, a back-end fileserver 800 supports a master database on a hard disk drive B05. The fileservcr 800 is equivalent in operation to tins tileserver 100 in Figure 1. The fileserver might be an IBM mainframe computer. The back end bleserver 800 is connected by a network 810 to .mid-tier fileservers 81b winch eaoh have a hard disk drive 820 which includes a storage aroa for a cache

database. The mid-tier fileservers might be UNIX-based computing platforms. The network 810 is an international network comprising private circuit connections supporting an ATM protocol. The mid-tier fileservers 815 operate, in relation to the back-end database 800, in a similar fashion to the clients 130 of Figure 1.
To each mid-tier fileserver 815 there is or are attached, via suitable networks 825, one or more clients 830. Suitable networks might comprise national networks such as, for example, a packet-switched data network. A client might be, for example, an IBM-compatible PC,
In this example, the clients 830 have access to the data in the back end fileserver 800 via the mid-tier fileservers 815. The clients 830 arc unaware ot the relationship between the mid-tier 815 and the back-end 800 fileservers: the relationship is said to be transparent to the clients.
An example of when such a system might be implemented is for an international company with local main offices in London (mid-tier fileserver A) and Hong Kong (mid-tier fileserver B) and a head office in Munich (mid-tier fileserver C and back-end fileserver 800). The clients 835 are based in respective regional offices which are connected to the local main offices via the national network 825.
In this example, the company's master database would be held on the back-end fileserver 800 at the head-office. Thus, access by the clients 830 to static data would be direct from the mid-tier fileserver 820 cache databases, onto which the necessary data is up-loaded regularly (for example, on a daily basis), or when required. Access to the more dynamic data would be from the hack-end fileserver 800, via the mid-tier fileservers 815, using the reading and writing processes described in relation to Figures 6 and 7 respectively. Thus, international direct access by the clients 830 to the back-end fileserver 800 is restricted to that required to carry out transactions based critically on dynamic data. Again, a flight booking is a good example of such a transaction.
Another possible system on which the present invention may be implemented is that illustrated in Figure 9.
The arrangement illustrated in Figure 9 is a variation on Figure 8. The difference is the inclusion of a middleware system 908, The middleware system 908 is, in effect, an interface between the back-end fileserver 900 database and the mid-tier fileservers 915. The middleware system 908 typically resides on a separate computing platform to the fileserver 900. The middleware system 908 converts the format of data available from the back-end database, for example a legacy database system, to a format of data readable by the mid-tier fileservers 915 and their clients 930. Such an arrangement is also useful if the data required

by, or the technology of the mid-tier or client systems change. Then, only the middleware system needs to be upgraded to accommodate the changes, rather than there being a need to up-grade the, typically, very expensive and complex, back-end database system.
Another variation on the system illustrated in Figure 8 is to incorporate a middleware system as a process on each mid-tier fileserver. This has the advantage of removing the need for a dedicated middleware computing platform, but suffers with the disadvantage that more than one middleware process needs to be amended in the case of an up-grade.
A further variation on the systems illustrated in Figure 8 and 9 is that one or more of the clients 830 and 930 have there own cache databases on their hard disk drives 835 and 935. In this way, it would be possible for The clients to hold copies of data, typically static data, from the mid-tier or back-end databases-However, dynamic data could be stored in a client's cache database as long as cache coherency were maintained, for example by using the read and wtite procedures illustrated in Figures 6 and 7 respectively.
Figure 10a illustrates a test harness for testing a three-tier embodiment ot the present invention. The harness comprises a back-end fileserver 1000 which supports a master Oracle database 1010 stored on an external disk drive device: all master database access is controlled by the back-end fileserver 1000. The master database 1010 holds the master version of the data. A mid tier fileserver 1020 is connected to the back-end fileserver via a ten Mbit/s 802.3 10baseT communications link. The mid-tier fileserver 1020 supports an Oracle cache database 1030 which stores data persistently on an external disk drive, but which also stores recently used data in main memory. A client 1040 is connected to the mid-tier fileserver 1020 and has a test result log 1050. Although the client 1040 is shown to be logically separate, it actually resides on the mid-tier fileserver for testing purposes.
The mid-tier fileserver 1020 controls both the cacheing process and accesses to the back-end fileserver 1000. The client 1040 controls the performance tests which are described below, and collates and stores accumulated test results.
The test harness uses Orbix (TM) as the communications mechanism for carrying information between the back-end fileserver 1000 and the mid-tiet files server 1030, The mid-tier fileserver 1020 and the back-end fileserver 1000 ate both SUN (TM) SPARCstation 20 computer systems. In practice, however, whilst the mid tier fileserver 1020 might well be a SPARCstation 20, the back-end

deserver 1000 would more likely be a large Unix (TM) based fileserver or even an 5M mainframe fileserver. The mid-tier fileserver 1030 and the back end fileserver 000 uso ProC (TM) as the interface between Oracle (TM) and the cache control mechanisms which are software routines written in the C and the c rogramming languages. The back-end fileserver 1000 has an SGA (not shown) which is held in virtual memory, which will usually be in main memory (RAM) in a tell-tuned system. Each read access made to the master database 1010 has the effct of copying the data read from the master database to the SGA, if it is not already present in the SGA. The implications of this are discussed in more del ail nore.
All tests were based on a 50 MB table stored in the master database 910. The table comprised 260,000 rows of data. Each row was arranged as follows

ID version data (x200 columns)
8 8
Each row had an 8-digit numeric identity (ID), an 8-digit numeric version and 200 data columns, each data column contained one character.
Each row had a corresponding index entry, although, as has already been mentioned, the index for Oracle may be arranged as a B-Tree optimised far rapid searching on the "key" fields in the data rows. One advantage of the B-Tree approach is that search time does not increase rapidly with size of indexed table.
The index is created so that, for each row referenced, the index contains both row address and version number (or time stamp). The effect of this arrangement is that whenever the client (or mid-tier fileserver) requests tho version number of a row stored in the master database, the back-end fileserver retrieves the index for the row and, in doing so, finds the version number foi the row within the index. Recognising the version number in the index, the back end fileserver does not access the actual row of data to retrieve the index
Tests were carried out by randomly accessing the table to retrieve a varying number of columns ranging from one column to 200 columns in steps of 10. Random accesses were selected as this was thought to be relatively realistic, in particular because statistics based on sequential access would be distorted by Oracle's (TM) interna! SGA cacheing mechanisms.
Tests were carried out within the following scenarios:
no cacheing;
cacheing with an empty cache database;

cacheing with a full cache database, but unknown data in the SGA; and
cacheing with a full cache database, with the Oracle ITM) index in the SGA.
The arrangement in Figure 10b was the basis for direct client TO fileserver tests. The arrangement comprises a fileserver 1060 having a master database 1070. The fileserver 1060 is connected to a client 1080 via an Orbix (TM)-based communications link. The client 1080 has a test result log 1090. In this scenario, the cacheing technique was not used since the client did not have any cached data to access. Instead, communications and data retrieval by the client 1080 was always directly to and from the fileserver 1060, where the fileserver had access to the data required by the client in the master database 1070. Therefore, the data in the master database 1070 was accessed each time the client 1080 requested data. The results obtained from this test scenario are used as a baseline for comparison against the test results from the scenarios which employ cacheing, which are described below*
The second test scenario used cacheing, but started with no cached data in the cache database 1030. Thus, the mid-tier fileserver 1020 had to retrieve all data from the back-end fileserver 1020. This is the worst-case scenario lor the cacheing technique. It represents the case where data has to be retrieved from the back-end fileserver 1000 and written to the cache database 1030. This will occur either if the data is absent from the cache database 1030 or if the data in the cache database is inconsistent.
The third scenario was arranged so that the cache database 1030 contained identical data to that held on the master database 1010. However, in this scenario, there was no control over which data started in the SGA of the back-end fileserver 1000. That is to say, the index, which contains the ID and the version of each database table row, was not fixed into the main memory of the back-end fileserver 1000. Therefore, the back-end fileserver 1000 needed potentially to retrieve the requested data and the respective index from the master database 1010 and place them into the SGA several times during each test.
In the final scenario, as in the previous scenario, the cache database 1030 on the mid-tier fileserver 1020 contained identical data to the master data held on the master database 1010. However, this time, the index was purposely placed into the SGA on the back-end fileserver 1000. This was achieved by sequentially accessing the ID and time stamp of each row of the master data (but not the data itself) on the master database 1010 from the back-ond fileserver 1000 before the test was carried out. This had the effect of creating an index entry in the SGA for

each database row accessed 'from the database table. With tho whole index copied into the SCiA, the index could be accessed to compare the version of the master data with versions of the cached, mid-tier data more rapidly nnd efficiently than il none of the index, or only parts thereof, resided in the SGA. Thai is to say, if the index is in the SGA, the file server only needs to moke main memory accesses to compare versions instead of needing to make disk accesses to the master database 1010.
The results obtained from the tests related to CPU utilisation on the back-end lilescrver 1000 and input/output (I/O) between the back-end iilcserver 1000 ri\u\ the master database 1010. Tho CPU utilisation was measured using ihe UNIX 'clock!)' system call, which measures CPU time used by the process issuing the call. I/O was measured using the Oracle (TM) view 'v$lilcstat\ which enabled measurement of I/O activity on the Oracle (TM) table space used for 11 us tests.
Figure 11 is a graph illustrating a comparison between (he CPU utilisation lor; the scenario using no oacheing; the scenario using a lull cache; and the scenario using an empty cache. The graph shows that CPU utilisation is significantly better in the case where the cache is fully utilised, compared with the direct client/server case (no cacheing). The utilisation is comparable when the complexity (the number of columns read) of the data being retrieved is low. However, as the complexity rises, so the advantages o| tho fully-cached arrangement become apparent. As can he seen from the graph, there is a penalty to pay in the case of an empty or inconsistent cache, when data must be retrieved from the hack end fileserver 1000. These results show that by incorporating the version, or time-stamp, in the index, irrespective of whether the index entries are in the SGA or the master rial abase, the CPU processing overhead is greatly reduced. Clearly, this improvement in performance will be particularly significant where many mid-tier fileservers or clients are attempting to access the lilosorvor at substantially the same time.
In"carrying out the experiments, the present inventors appreciated that the results plotted in Figure 11 represent combined Orbix (TM) and Oracle MM) CPU urihsation. further tests, described below, were carried out to determine how the CPU utilisation for database access only, using Oracle (IM), is affected by the various cacheing and non-cachemg arrangements.
Figure 12 is a graph showing the CPU utilisation for only tho Orbix (TM) parts of the processing. The graph shows that CPU utilisation is similar lor all three arrangements. Figure 13 is a graph which shows CPU utilisation on the lilescrver due only to Oracle "(TM)" "database processing. It is apparent that the

combined data plotted in Hyure 12 and'in Figure 13 matches closely the data plotted in Figure 1 1, which gives a good level of confidence that the separate rests woifi constructed correctly.
The data plotted in Figure 13 indicates more clearly that the difference between the full, consistent cache and the no cachemg or the empty or inconsistent cache situations is much more marked. It is shown thai at high complexities, a four fold reduction in CPU utilisation can be achieved by using data cached at the mid tier
It is also clear from Figure 13 that there is a lixod overhead in using the oaehcing technique when data lias to bo retrieved from the master database at the hluserver. When data complexity is low, the overhead forms a significant proportion of the overall time, but as complexity rises, the overhead doer eases as a proportion ot the total. That is to say, for one column ol data only, the overhead is about one half of the total access lime, while for two hundred columns of data, the overhead is only about one sixth of the total access time.
On the basis of this information, it is clear that Orbix is responsible for using a high proportion of the CPU's time. Indeed, Orbix is known to be a complicated communications mechanism which incurs a high CPU overhead, 'therefore, it is envisaged that by using a simpler communications mechanism, for example a Unix "sockets" approach, the CPU overhead could be reduced significantly in terms of communications processing, thereby greatly improving overall CPU efficiency.
It was expected that the use of the caeheing technique would bring significant I/O benefits in the case whore data is correctly cached in the mid-tier lilescrver. The graph in Figure 1*1 plots the number ol roads against the complexity ol the reads. The graph indicates that for both the direct, client/server access and the empty/inconsistent cache cases, reads run at a significant level irrespective of complexity. On the oilier hand, for the lull, consistent cache case, roads appear to decline with complexity. This, however, is an artefact of tho experimental method used. As outlined above, in the full cache case, only the index is being accessed: the index is about fiMB in size. Faeh time a row from a table or an index entry is read, the whole block, or page, containing the row or entry is in tact transferred to an area in the SGA called the data buffer cache. A
block is typically 2040 bytes large/and therefore contains roughly ten data rows.
I he data buffer cache is about 8MB in si/e. Therefore, as more index entries are read, a greater proportion of the index is found in the data buffer cache and a respective greater proportion of index look-ups is satisfied by main memory access

only, in the limit, li the entire index is read, and if no other processus ore reading from or writing to the master database in the meantime, then the entire index will l)i.? copied TO the data buffer cache area o( the SGA. Then, no I/O will be requited whatsoever Figure 1b is a graph which plots tesl results which show this ihoory to be correct. As can be sewn from the graph, there are no reads when the complete index is in the SGA, and there are declining reads whon the index is not initially in The SGA.
The results overall indicate thai the cacheing technique decreases CPU overhead ay a result of having the version number in the index entries. Also, the read overhead reduces as more index entries move lo the SGA. Although detailed test?; have not been carried out under multiple client condition*.;, it ts a sate assumption that both a reduced CPU overhead and a reduced rend requirement should improve fileserver performance and response times lor clients. Indeed, the performance improvement should become more significant as morn clients or mid-tier htcservers are connected to the tilcsetver and have access to the back-end, master database. Also, other procedures controlled by the fileserver, for example user miihcntication or database house keeping, will benefit from the increase in both CPU and database access performance.
Finally, the results also indicate that it is essential to select a high performance communications mechanism in order lo gain a maximum benefit from the novel cacheing technique described above.
The skilled addressee will appreciate thai the invention can be applied to almost any environment implementing cache databases, or equivalent. The decision lo use the invention is purely one based on whether tho invention provides an optimum system in terms of performance.






1 A method for checking the consistency of an item of data in a cache
database with a respective item of data in a master database by comparing a first key stored in association with the item of data in the cache database with a second key stored in association with an index entry for the respective item of data in the master database.
7 A method for retrieving an item of data from one of a cache or o master
database, the master database comprising a plurality of items of master data and an index containing entries corresponding to one or more of The items of master data, the cache database containing a cached copy of at least one item of the master data, the method comprising the steps of:
reading a first key stored in association with a cached copy of a required item ol data from the cache database;
reading a second key stored in association with an index entry lor a respective item of master data from the master database;
comparing the first key with the second key; and
retrieving in the event the first and second keys are the same the cached copy of the item of data or in the event the first and second keys are different the respective item of master data.
3. A method according to claim 1 or claim 2, wherein the lust and second keys are time-stamps.
4. Use of a method according to claim 1 or claim 2 in a clicnt'Sorver system.
A database fileserver apparatus comprising:
input means for receiving a conditional read request tor an item of data stored in the database, the request including a first key for a previously retrieved copy of the item ol data;
moans for accessing an index of the database and reading an index enhy lor the requested item of data, the index entry including a second key for the stored item of information;
means for comparing the first and second keys; and

means if the keys are the same lor returning an indication that the previously retrieved copy of the item of data is consistent or tf the keys are different for reading from the database and returning a copy of the item of data,
6 A database index, wherein at least one index entry in the index includes at
least:
identity information for identifying an item of information in the database;
location information for indicating the location in the database of the item of inhumation; and
version information which changes each time the respective information in the datable changes.
7. A method for checking the consistency of an item of
data substantially as herein described with reference to the
accompanying drawings.
8. A database fileserver apparatus substantially as
herein described with reference to the accompanying drawings.


Documents:

2179-mas-1996 abstract dublicate.pdf

2179-mas-1996 abstract.jpg

2179-mas-1996 drawings dublicate.pdf

2179-mas-1996 others.pdf

2179-mas-1996 petition.pdf

2179-mas-1996-abstract.pdf

2179-mas-1996-claims duplicate.pdf

2179-mas-1996-claims original.pdf

2179-mas-1996-correspondance others.pdf

2179-mas-1996-correspondance po.pdf

2179-mas-1996-description complete duplicate.pdf

2179-mas-1996-description complete original.pdf

2179-mas-1996-drawings.pdf

2179-mas-1996-form 1.pdf

2179-mas-1996-form 26.pdf

2179-mas-1996-form 4.pdf


Patent Number 207277
Indian Patent Application Number 2179/MAS/1996
PG Journal Number 26/2007
Publication Date 29-Jun-2007
Grant Date 04-Jun-2007
Date of Filing 04-Dec-1996
Name of Patentee M/S. BRITISH TELECOMMUNICATIONS PUBLIC LIMITED COMPANY
Applicant Address 81 NEWGATE ATREET, LONDON EC1 A7 A J.
Inventors:
# Inventor's Name Inventor's Address
1 BRITISH TELECOMMUNICATIONS PUBLIC LIMITED COMPANY 81 NEWGATE ATREET, LONDON EC1 A7 A J.
PCT International Classification Number G06F17/30
PCT International Application Number N/A
PCT International Filing date
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 95308682.4 1995-12-01 EUROPEAN UNION