Title of Invention

A METHOD AND A SYSTEM FOR COMMUNICATION OF INFORMATION DATA BETWEEN DIFFERENT COMPUTER SYSTEMS

Abstract The invention relates to a method of organizing storage of data in a dababase (2), the database (2) having conclusion sets (20, 24, 26, 28, 30; 40, 50, 60) arranged in a hierachical structure, the conclusion sets being arranged such that items are inserted into a selected conclusion set at a first level of significance (level 1) until the number of items reaches a threshold value for the selected conclusion set, and then the contents of the selected conclusion set are migrated to subordinate conclusion sets, thereby emptying the selected conclusion set, and wherein following migration of the contents from the selected conclusion set, additional insertions can be made into that conclusion set.
Full Text ORGANISING DATA IN A DATABASE
The present invention relates to a method of organising data within a database, and to a
database implementing such a method.
Typically databases, whethei' they are of the type described in the applicant's co-pending
British patent application GB 0029238.3, or ^i^b^her they are of other known types, such as
"B-tree" structure have a decision gr^h or other index which points towards conclusion
sets which store data which matches the search criteria. Additionally and/or alternatively
the conclusion sets may store pointere which point to the location of the data which
matches the search key.
In any database of any reasonable size, the conclusion sets are stored on mass storage
media, which at the moment typically means hard disc drives. Hard disc devices tend to be
considerably slower than semiconductor memory and consequently database performance
can be compromised by having to paform these input/output (I/O) operations with these
mass storage devices. Even with a imimnal index overhead a database typically has lo
perform 2 I/O operations as part of the read, modify and rewrite cycle to insert data into the
index and conclusion set.
According to a first asp«:t of the presait invention, there is provided a method of
organismg storage of data in a database, in which conclusion sets for the database are
arranged in a hierarchical structure, and in which the conclusion sets are arranged such that
items are inserted into a selected conclusion set at a first level of significance until the
number of items reaches a threshold value for tlie selected conclusion set, and then the
contents of the selected conclusion set are migrated to subordinate conclusion sets, thereby
emptying tlie selected conclusion set.
It is thus possible to provide a modified conclusion set structure wliich significantly
reduces the number of conclusion sets which immediately follow the output of the decision
graph.
Furthermore, it is also possible to distribute conclusion sets within the decision graph.

A prior art database will have a single "layer" of conclusion sets accessible from the
decision graph. Every new item of data inserted into the conclusion set will require at least
2 I/O operations to include that data (if tl^ data can belong to only one conclusion set, and
possibly more I/O operations if the insrated data can belong to more than one conclusion
set).
By organisii^ the conclusion sets in a hkiarchical structure, the number of conclusion sets
immediately accessible ficom the d^iision grz^h can be much reduced. Indeed, it becomes
possible to keep the most hierardiic^y sigpificant, (fliat is top level) conclusion sets in fast
m«nory, such as semiconductor taemoxy. This is especially true in those embodiments of
the invention in which conclusion :^ts are distributed through the decision graph.
By holdic^ the top level conclusi when inserting data into the database. It is thus possible to provide a significant
improvemoit in datalKise perfomumce during key and data insertion operations.
Advantageously the higji level oHK^lusion sets effectively cache data until such time as the
conclusion set becomes full or the numbo: of entries therein exceeds a predetermined level.
The conclusion set is then emptied by migrating its contents to subordinate conclusion sets.
During the migration process the data is sorted with reference to a search criterion, that is a
search key, such that the data can be expected to be randomly distributed between the
inctmediately subordinate conclusion sets. This filUng and migrating process can be
repeated for a plurality of hierarchical levels within the conclusion set structure.
The migration of data may require, and indeed often requires, transfer of data between one
or more conclusion sets held cm mass media storage. Thus disc read and disc write
operations are incurred, but now these occur for a conclusion set as a whole rather than for
each individual item within the conclusion set and consequently the I/O cost per entry
becomes much reduced.
Advantageously, during key retrieval or deletion the appropriate top level conclusion set
and each subordinate conclusion set whose decision criteria match the search key are
examined in order to see if matching data ai-e stored therein. Thus, the database query
overhead is increased compared to prior art databases, but this is acceptable in some
database structures where the numba: of inserts is large, but the number of queries is
relatively low.
In those embodiments of the invention where the conclusion sets are distributed througliout
the decision graph, the inter-conclusion set distance may be constrained so as to prevent
conclusion sets from occurring too frequently.
Advantageously a conclusion set distmx^ |»rameter is defined by an integer Q. Q can take
a number greater than or equal to zrao. Thus, for example, a database may be created
where
Q = 0 : A conclusion set can be formed at every branch nocte.
Q = 1 : A conclusion set can be formed at every other branch node.
Q = 2 : A conclusion sdt can be fonned at every second branch node.
and so on.
These rules can be maintained throughout the decision graph until the final Q layers of the
decision gr^h are reached. In these layers the rules concerning the hierarchical distance
between conclusion sets become unraiforceable and hence are not rigorously applied.
According to a second aspect of the ]^:^eat invention, there is provided a database in
which conclusion sets for the database are arranged in the hierarchical structure, and in
which the conclusion sets are arranged such that items ai-e inserted into a selected
conclusion set at a first level of significance until the number of items tlierein reaches a
threshold value, and then the contents of the selected conclusion set are migrated to
subordinate conclusion sets, thereby emptying the selected conclusion set.
According to a third aspect of the present invention, there is provided a computer program
product for causing a data processor to operate in accordance with the first aspect of the
present invention.
The present invention will fiirther be described, by way of example, with reference to the
accompanying drawings, in which:
nimiDer bt level one conclusion^ets becomes progressively diminished.
Figure 1 schematicaUy illustrates a database having conclusion sets arranged in a
conventioiial manner;
Figure 2 schonatically illustrates a databa:^ having concliision sets arranged in accordance
with the present invention; and
Figure 3 schematically illustrates a dai^Amac having conclusion sets distributed within the
decision graj^ and constituting sm embodiment of the present invention.
The database shown in Figure 1 has an mdex 2 which comprises a decision graph 4 and a
plurality of conclusion s^ 6, 8,10,12 and 14. Each conclusion set is reached by one, and
only one, path through the dedsaon graph. However each conclusion Ihen. points to
relevant entries within a data store 16.
The decision graph 4 comprises a plurality of decision nodes at which a search key is
matched with dedsimi critraia in order to define which path should be taken through the
decision graph. The internal organisation of the keys within decision graph does not
constitute part of the presrait invraition, and consequently need not be described in detail
here. However prior art indexing structures, such as the B-rtree index may be utilised
within the decision graph.
In the arrangement shewn in Figure 1, all the conclusion sets 6, 8, 10, 12 and 14 have equal
significance thus no conclusion set is more hierarchically significant than any other
conclusion set and indeed there may be many hundreds or indeed thousands of conclusion
sets.
In the arrangement shown in Figure 2 the conclusion sets are arranged in a hierarchical
structure. In the arrangement illustrated there are three levels of conclusion sets with level
one being the hierarchically most significant, and level three being the hierarchically least
significant. Thus in this arrangement, there is only one quarter of the number of level one
conclusion sets as there are level three conclusion sets, and again in this example one level
one conclusion set marks the entry to six other conclusion sets. Cleai-ly, as the number of
levels increase tlien for a given number of conclusion sets at tlie least significant level, the
number of le\'el one conclusion sets becomes progressively diminished.
Suppose now that it is desired to insert an entry into the database. The decision graph is
navigated in accordance with the insertion key for the eatry, as would be the case in prior
art databases, to discover which conclusion set the entry belongs to. hi the database
illustrated in Figure 1, this would lead to one conclusion set being uniquely identified.
However, in the present invention this results in one conclusion set 20 of a number of level
one conclusion sets (of which only two 20 and 22 are shown for clarity) being identified.
AdNmntageously the level one omcljUK^m sets 20 and 22 are also held in fast memory, that
is either a fast mass media storage device or better still semiconductor memory, such that
the time overhead in inserting data into conclusion set 20 is small compared to the time
required to insert data into one of the conclusion sets 6, 8, 10, 12 and 14 of a conventional
database, indeed, if the level one conclusion sets 20 and 22 are held in semiconductor
memory then there is no I/O cost incurred in writing to them.
During time, as more and more d^a is inserted into the database, the conclusion set 20
begins to fiU. Once the niunber of entries in the conclusion set 20 reaches a predetermined
number, corresponding to the conclusicm set being full, the aitries in the conclusion set 20
are migrated down to the immediately subordinate conclusion sets 24 and 26 which belong
to level 2 of the hierarchical structure.
The decision on which of the lower level conclusion sets 22 and 24 receives an entry from
the level one conclusion sets 20 is determined by continuing the navigation rules which
exist within the decision graph 4. Thus, for example, if the decision graph 4 has rules
based on kidividual bit values in increasing order of bit number, then the iiile for migrating
data from the top level conclusion set to the second level conclusion set 24 and 26 will use
the next bit in the search key to determine which of the conclusion sets 24 or 26 should be
recipient for each item of data. During this migration process, the conclusion set 20
becomes emptied.
The process of migration from an Nth level to an N+lth level occui-s at each level within
the conclusion set hierarchy as each conclusion set therein fills up. Thus once conclusion
set 26 becomes full, it in turn migrates its data to its subordinate conclusion sets 28 and 30
on the third level of the hierarchy. In this example, the third level is the lowest level of the
hierarchy and the conclusion sets 28 and 30 ai'e not able to pass their data down to
subordinate conclusion.sets. However, if four or more levels of conclusion sets were
included within this.hiMarchical structure, then the conclusion sets 28 and 30 could indeed
migrate their data to their own subordm^e conclusion sets as and when they became full.
It can be expected that, assumii^ a Tdrndoai distribution of keys, that during migr^on half
of the entries in tlie conclusion set 20 will go to coiKlusion set 24 and the other half will go
to conclusion set 26. This process is tqiodted at each level of migration such that all of the
entries are sufostaatUdly equally di^riimled tfazougfaout the lower level conclusion sets.
The VO cost of a migration can be illuMtated in the operation of migratuig data from
conclusion set 26 to conclusion sets 28 ami 30. In this process, the data must be read from
the conclusion set 26, ^A^iich recpnres (Hie read from disc. The lower level conclusion sets
28 and 30 must also have their data rotd from disc, this requires two read operations. The
data frx}m the conclusion set 26 is then sorted to correctly point to the destination set 28 or
30, the entries are tl^n updatwi aaad tinm the data for the two lower level conclusion sets 28
and 30 are written back to the disc, requiring two write operations. Then the corxclusion set
26 is emptied, v^ch requires one write operation. Thus this gives a total cost of six
input/ou^ut operations per migration. However this single migration may effect hundreds
of entries. It should also be noted tiiat the migration from the top level conclusion set 20
will be less because it will typically reside in memory and consequently only the two reads
to sets 24 and 26 and tlw writes to diese sets 24 and 26 must be performed.
Assuming the hioarchical structure as shown in Figure 2, where the conclusion set has S
sub-levels (where S = 2 in this case, S={L-1) where L is the number of levels) the number
of I/O operations reqturwi to move an entry to the bottom most level is 6S, However, as
each migration operation moves an entire (frill) conclusion set which contains E entries,
then the I/O cost per entry is given by:
yO cost = -f
therefore if E is large enough and/or S is small enough the I/O cost can be significmitly
reduced compared to the prior art scheme illustrated in Figure 1 which has an I/O cost of 2
per entry. Thus, with the conclusion set that holds 100 entries, and a conclusion set deptli
of 8, the I/O cost per insert is 6 x -j|o = 0.48.
It should be noted that for a fixed index size, whilst reducing S increases the insert
throughput, it also increases the numbar of conclusion sets that must be held in memory for
the benefit to be realised..
When querying the index for a specific key, each conclusion set in a single path rurming to
the very least significant conclusion set must be queried. Thus, if the hierarchical structure
consists of L levels, then L conclusion sets must be queried in order to find all results that
match the key, since the key may reside at any level within the hierarchy. Thus this
indexing scheme increases throughput, or the ease at which entries may be added to the
index, at the expense of degrading queiy performance. However, this trade-off is
acceptable in any applic^on which has to accept large amounts of data but queries it
infi-equently. An example of such an application is a fraud detection system thai has to
load every transaction, but only queries those transactions relating to suspicious activity.
It is thus possible to provide an improved database performance by modifying the structure
of the conclusion sets into a hierarchical structure.
The inventor has reahsed that tlie conclusion sets need not all be at the lowest level of the
database structure. As noted hereinabove, in the distributed structure shown in Figure 2,
data is inserted into the first conclusion set found, ie level 1 conclusion sets, and is tlien
progressively migrated to lower levels.
Tlius the insert overhead can be fijrther reduced if the conclusion sets are distributed
through the decision index structure. *.
Figure 3 shows a modified database where the conclusion sets are arranged in a
hierarchical structure, and wherein the structure extends into the decision graph.
Figure 3 illustrates a portion of a decision graph wherein the decision graph has been
constructed with a inter-conclusion set distance Q = 3, such tliat a conclusion set 40, 50 and
60 is allowed at every tliird branching node. Suppose that v\'e join tiie database at a time
where the data contained therein is such that conclusion s£t 40 has just been created, but
that conclusion sets 50 and 60 have not yet been created.
During use, data is added to the database, and the content of that data is such that the
conclusion set 40 becomes full. Once this occurs, the infonnation within the conclusion set
is redistributed en masse by creating tiie lower level node 42, optionally node 43, and
distributing the data from conclusion set 40 into new temporary conclusion sets 44 and 45-
shown in outline in Figure 3. Filling of the database then continues until conclusion set 40
fills again. It then attempts to redistribute its contents to its subordinate conclusion sets.
This in turn may make one of them fuU mxd so it becomes necessary- to remove the full
temporary conclusion set from the exit patih of node 42 and to insert one or more further
nodes and additional conclusion sets as required.
This can continue until such time as conclusion set 50 is established as an exit to node 46.
At this time, the inter-coiKlusion set distance between conclusion sets 40 and 50 then
corresponds to the distance Q specified by the designer or user of die database. Under
these conditions, a check is made to see if an interaiediate conclusion set exists in the
decision path extending between nodes 41 and 46. Any conclusion sets in this decision
path are removed and their contents are moved to conclusion set 50.
As conclusion set 50 fills up, new temporary conclusion sets may be created until such time
as conclusion set 60 is created, and so on. Thus it becomes possible for the
inter-conclusion set distance between conclusion sets which are not in the lowermost layers
of the decision index to be held at a specified inter-conclusion set spacing. In the
arrangement shown in Figure 3 conclusion sets 62 and 70 represent the lowermost layers of
the database, and mdeed conclusion sets 60, 62 and 70 can represent levels 1, 2 and 3 of
conclusion sets in the arrangement shown in Figure 2.
As before, the mass redistribution of conclusion set entries to lower level conclusion sets
results in a reduced disc input/output penalty compared with the writing of individual items
of data. Furthermore, the data entry cost into the structure shown in Figure 3 is further
reduced since an entry is inserted into the first conclusion set found. Each entry fi-om a full
conclusion set is then moved to the appropriate lower level conclusion set which is found
by following the decision graph from the fiill conclusion set via the entry to be moved.

When querying the database, the decision palh for the relevant key has to be navigated right
to the lowermost level conclusion set and each conclusion set found on route must also be
queried as it may contain data matching the sfcarch key criteria.



WE CLAIM

1. A method of orgmizing storage of data in a database (2), the database
(2) having conclusion sets (20,24,26,28,30; 40,50,60) arranged in a
hierarchical structure, the conclusion sets being arrangedsuch that items
are inserted into a selected conclusion set at a frst level of significance
(level 1) until the number of items reaches a threshold value for the
selected conclusion set, and then the contents of the selected conclusion
set are migrated to subordinate conclusion sets, thereby emptying the
selected conclusion set, and wherein following migration of the contents
firom the selected conclusion set, additional insertions can be made into
that conckjsion set.
2. A method as claimed in claim 1, wherein each conclusion set
(20,24,26,28,30), comprises zero, one or two immediately subordinate
conclusion sets.
3. A method as claimed in claim 1 or 2, wherein during migrrtion the date ri
the selected conclusion set is compared with a search criterion in order to
select which of the immediately subordinate conclusion sets an entry Is
transferred to.
4. A method as claimed in clam 3, wherein the database comprises a
decision graph and the search criterion constitutes an extension of the
decision graph.
5. A method as ciMnad ni claim any one of tfw preceding claHns, wlierein
the hier«'diKal structu-e of cora:li»iGMi sets is arrangbd into a layered
stricture.
6. A method as claimed in claim 5, wherein the layered structure xaipijifises L
levels, with each conclusion set in the 1^ to (L-1) th level having two
immediately sutxN'dinate conclusicNi sets in tlw 2"^ to Uh level,
respectively.
7. A method as claimed in cMm 6, wherem once a given concii»ktfi set m
the Nth level, where N is an integer from 1 to (L-l) has a jM'edetermwied
numt)er of entries hi it, its contents are migrated to it's immediately
siAxNrdinate corKlusion sets.
8. A method as claimed in any one of the preceding claims, wherein
conclusion sets having a hierarchical significance above a predetermined
vakje are stored, in use, wi fast memory.
9. A method 9S claimed in claim 1, wherein the ccNichisiGn sets (40, 50, 60}
are dBtrfeuted tivoi^ a decsi«i graph of the database.
10.A method as dakned in claim 1 or 9, whereri the decision gri^ is
constructed so as to maintein a specified inter conch^ion set distaiKe
tlvougNMJt the majority of the decision grai^.
11.A method as claimed in claim 9 or 10, wherein during migration of data
from a conclusion set, the data is migrated to subordinate conclusion sets
on the basis of the specific data content.
12.A method as claimed in claims 1, 9, 10 or 11 wherein the contents of a
conclusion set are migrated enmasse.
13.A device comprising conclusion sets of a database arranged in a
hierarchical structure, the conclusion sets are arranged such that items
are inserted into a selected conclusion set at a first level of significance
until the number of items reaches a threshold value for the selected
conclusion set, the contents of the selected conclusion set being migrated
to subordinate conclusion sets, thereby emptying the elected conclusion
set, and wherein following migration of the contents from the selected
conclusion set, additional insertions can be made into that condition set.
14.A database as claimed ri claim 13, wherein the conclusion sets (40,50,60)
are distributed throughout a decision graph.
15.A computer program product for causing a data processor to implement
the method as claimed in claim 1 to 12.


The invention relates to a method of organizing storage of data in a dababase (2), the database (2) having conclusion sets (20, 24, 26, 28, 30; 40, 50, 60) arranged in a hierachical structure, the conclusion sets being arranged such that items are inserted into a selected conclusion set at a first level of significance (level 1) until the number of items reaches a threshold value for the selected conclusion set, and then the contents of the selected conclusion set are migrated to subordinate conclusion sets, thereby emptying the selected conclusion set, and wherein following migration of the contents
from the selected conclusion set, additional insertions can be made into that conclusion set.

Documents:

1035-kolnp-2004-assignment.pdf

1035-KOLNP-2004-CORRESPONDENCE.pdf

1035-kolnp-2004-drawings.pdf

1035-kolnp-2004-form 1.pdf

1035-kolnp-2004-form 13.pdf

1035-kolnp-2004-form 2.pdf

1035-kolnp-2004-form 26.pdf

1035-kolnp-2004-granted-abstract.pdf

1035-kolnp-2004-granted-assignment.pdf

1035-kolnp-2004-granted-claims.pdf

1035-kolnp-2004-granted-correspondence.pdf

1035-kolnp-2004-granted-description (complete).pdf

1035-kolnp-2004-granted-drawings.pdf

1035-kolnp-2004-granted-examination report.pdf

1035-kolnp-2004-granted-form 1.pdf

1035-kolnp-2004-granted-form 18.pdf

1035-kolnp-2004-granted-form 2.pdf

1035-kolnp-2004-granted-form 26.pdf

1035-kolnp-2004-granted-form 3.pdf

1035-kolnp-2004-granted-form 5.pdf

1035-kolnp-2004-granted-priority document.pdf

1035-kolnp-2004-granted-reply to examination report.pdf

1035-kolnp-2004-granted-specification.pdf

1035-KOLNP-2004-OTHERS.pdf


Patent Number 241847
Indian Patent Application Number 1035/KOLNP/2004
PG Journal Number 31/2010
Publication Date 30-Jul-2010
Grant Date 28-Jul-2010
Date of Filing 21-Jul-2004
Name of Patentee FLINDERS APS
Applicant Address NORDRE FASANVEJ 108B, 2.SAL, DK 2000 FREDERIKSBERG
Inventors:
# Inventor's Name Inventor's Address
1 SJOBERG, HANS, HAKAN KALLTORPSVAGEN 67 S-136 70 HANINGE
2 WYON, KIM, NEEL YRSAVEJ 13 1.SAL, DK-2000 FREDERIKSBERG
PCT International Classification Number G06F 13/00
PCT International Application Number PCT/SE2002/02450
PCT International Filing date 2002-12-23
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 60/344,101 2001-12-28 Sweden
2 0104414-8 2001-12-21 Sweden