Title of Invention	"A SYSTEM OF STORING RETRIEVING, ENCRYPTING & DECRYPTING MINIATURISED DATA"
Abstract	A method of storing data including the steps of providing a first index of first location identifiers, a second index of second location identifiers and a dictionary data base of data items, wherein the first location identifiers are adapted to identify the location of data items in the dictionary date base, receiving data and separating the data into a plurality of data items and storing the data items in a main base, whereby at least one of the data items is stored in the main data base as at least one first location identifier, which identifies at least one second location identifier, which identifies the or each item in the dictionary data base.

Title of Invention

"A SYSTEM OF STORING RETRIEVING, ENCRYPTING & DECRYPTING MINIATURISED DATA"

Abstract

A method of storing data including the steps of providing a first index of first location identifiers, a second index of second location identifiers and a dictionary data base of data items, wherein the first location identifiers are adapted to identify the location of data items in the dictionary date base, receiving data and separating the data into a plurality of data items and storing the data items in a main base, whereby at least one of the data items is stored in the main data base as at least one first location identifier, which identifies at least one second location identifier, which identifies the or each item in the dictionary data base.

Full Text	FIELD OF THE INVENTION The present invention relates primarily, although not exclusively-, to system of storing, retrieving, encrypting and decrypting miniaturised data. It relates to a techniques used for storing electronic data in a form which requires less storage space. BACKGROUND OF THE INVENTION A typical method for compressing data involves the use of a dictionary data base which lists commonly occurring data and replaces this commonly occurring data with a coded "token" which effectively represents that data using a reduced number of data bits. Whenever an item of data occurs repeatedly this data item is replaced by its equivalent "token" and accordingly that data item is stored in a compressed form. When data is stored in the compressed form, by using a look-up table each token can be replaced by its equivalent data item so that the original data can be reformed. The above conventional compression technique has a number of drawbacks. These drawbacks include the number of data bits which are required to represent a token can also be significant with the result that significant storage space is required to store each token. In addition searching a data base which includes tokens can be quite cumbersome because tokens need to be reconverted to their original data item before a search of each of the data items can be properly conducted. OBJECT OF THE INVENTION The present invention provides an alternative to existing methods of storing data in a miniaturised form and extends to methods for encrypting data as well as systems for implementing the method, computer programs and storage medium for storing electronic data which is able to implement the method and system. DISCLOSURE OF THE INVENTION According to the present invention there is provided a method of storing data including the steps of providing a first index of first location identifiers, a second index of second location identifiers and a dictionary data base of data it was, wherein the first location identifiers are adapted to identify th* location of second location identifiers in the second index and the second location identifiers are adapted to identify the location of data itams in the dictionary data base, receiving data and separating the data into a plurality of data items and storing the data items in a main data base, whereby at least one of the data, items is stored in the main data base as at least one first location identifier, which identifies at least one second location identifier, which identifies the or each data item in the dictionary data base. According to another aspect of the present invention there is provided a method of retrieving data stored in a miniaturised form in a main data base, including the steps of accessing the main data base, retrieving one or more items of data including at least one first location identifier from the main data base, using the first location identifier to access and retrieve the location of a second location identifier identified in the first index by the first location identifier, accessing and retrieving from the second location identifier in the second index the location of an item of data in a dictionary data base. It is preferred that the method of storing data includes the step of searching the dictionary data base for at least one data item and replacing the data item with one first location identifier which indicates the location of one location identifier in the second index, which second location identifier indicates the location of the data item in the dictionary data base. it is preferred that the method includes the step of searching the dictionary data base for each data item and identifying if the data item occurs in the dictionary data base and if the data item occurs in the dictionary data base, retrieving the second location identifier in the second index that identifies the location of the data item in the dictionary data base, retrieving the first location identifier in the first index which identifies the location of Che second location identifier in the second index and storing the fixet location identifier in a main data base in place of the data item. It is preferred that the data item includes a string of data, a field of data or other group of data that can represent information in a predetermined format. The or each data item preferably represent* a stream of data which represents information which can be searched. Bach data item preferably represents a name, initial, address, phone number or other words or numbers or initials or characters or character strings or number strings. Each first location identifier preferably includes a pointer to the second index. Each second location identifier may include a pointer to the dictionary data base. The first index may comprise a plurality of pointers. Preferably the second index comprises a plurality of pointers. The first index may comprise a sequential list of pointers. The second index may comprise a sequential list of pointers. The dictionary data base preferably includes a plurality of data bases each with uniijue addresses which are represented by the location identifiers. Each index may include a plurality of sub-indexes . Preferably each second index is divided into different sections representing locations of predetermined types of data items. Each first index is preferably dividad into different sections representing the location of second location identifiers associated with predetermined types of data items. The Biethod may include providing additional indexes with additional location identifiers. According to another aspect of the present invention there is provided a system for storing data, th* system including at least ofcte dictionary data bas* and at least two index data bases wherein the dictionary data base comprises a plurality of data items, a first one of the index data bases comprising a plurality of data item location identifiers/ which respectively identify the location of at least one data item in the dictionary data base and a second one of the index data bases including a plurality of first location identifiers which respectively identify the location of at least one data item location identifier in the first index data base, and wherein the system includes a processing means which is adapted to receive data including data items and to store the data in a compressed fora by storing in place of each data item occurring in the dictionary data base, each corresponding first location identifier, whereby each data item occurring in the dictionary data base can be retrieved by referencing the data item location identifier identified by the first location identifier. Preferably the at least two index databases include separate lists of location identifiers in a common data base or which are part of other data bases. Preferably there is provided astorage mediuni including a sequence of instructions adapted to control a data processor to set up the system. The first index data base may b« part of the dictionary data base. The system may include one or more additional index data bases each with location identifiers which identify the location of another location identifier of another index data base. It is preferred that the system includes a main data base which ie adapted to store a stream of data as a combination of data items which are not represented in the dictionary data base and first location identifiers. According to one embodiment the stream of data Stored in the main data base may have data items and first location identifiers which are stored in an order determined by a further index data base and a reprocessing means which is adapted to control the ordering of data in the main data base with reference to the further index data base. Preferably the dictionary data base has data items stored in a predetermined order which ia determined by how commonly or frequently each data item stored therein is expected to occur in a data stream of data items. It is preferred that the most common data items have a location in the dictionary data base that is identified by a dictionary data base location identifier having minimal bits compared to an uncommon data item. Preferably the dictionary data base index comprises dictionary data base location identifiers arranged sequentially from lowest number to highest number of bits reguired to define them. Each first location identifier may comprise a pointer having a number which identifies a position of one data item location identifier in the dictionary data base index. Preferably each data item location identifier comprises a pointer having a number which identifies the position of one data item in the dictionary data base. The dictionary data base may be divided into different sections which have data items with locations which are identified by data item location identifiers from different dictionary data base indexes. The dictionary data base preferably includes storage spare into which data items can b« added. According to another aspect of the present invention there Is provided a computer program which is adapted to control a computer to provide at least one dictionary data base and at least two index data bases, wherein the dictionary data baee comprises a plurality of data items, a first one of the index data bases comprises a plurality of data item location identifiers, which respectively identify the location of at least one data item in the dictionary data base/ and a second one of the index data bases includes a plurality of firet location identifiers which respectively identify the location of at least one data item location identifier ia. the first index data base/ and wherein the computer program includes instructions to control the computer to receive data including data item* and to store the data in a compressed form by storing in place of each data item which occurs in the dictionary data base, each corresponding first location identifier, whereby each data item which occurs in the dictionary data base can be retrieved by referencing the data item location identifier identified by the first location identifier. Preferably the at least two index data bases include separate lists of location identifiers in a common data base or which are part of other data bases. The first index data base may be part of the dictionary data base. It is preferred that -the computer program includes an exception means for storing data items which do not occur in the dictionary data base. The exception means preferably includes a predetermined part of the dictionary data base. The exception means may be adapted to provide a dictionary index location identifier for any new data item stored in the dictionary data base. According to another embodiment of the present invention the exception means includes an exceptions data base which is adapted to store data items which do not occur'in the dictionary data base. The computer program may include a means for storing different types of data items in different dictionary data bases. Each dictionary data base may have predetermined dictionary location identifiers in the first one of the index data bases, which provide the location of data item* in that data base. Each dictionary data bas* may be split into a plurality of data types or fields each having a plurality of data items of that type or field. According to another aspect of the present invention there is provided a storage medium having computer software stored thereon which is adapted to control a computer to set up a system according to any one of the previously described embodiments of the invention. According to another aspect of the present invention there IB provided a system for retrieving data items stored in a miniaturised form, the system including at least one dictionary data base and at least two index data bases wherein the dictionary data base comprises a plurality of data, items and a first one of the index data bases comprises a plurality of data item location identifiers which respectively identify the location of at least one data item in the dictionary data base and a second one of the index data bases includes a plurality of first location identifiers which respectively identify the location of at least one data item location identifier in the first index data base and a processing means/ wherein the processing means is adapted to receive a first data stream including a plurality of first location identifiers and produce a second data stream including the data items without first location identifiers am? wherein first location identifiers are replaced by corresponding data items. It is preferred that the first data stream is adapted to be received by reading a data base. The data has* may be in a storage medium which is readable by a computer hardware device. The data base may be stored in a computer memory. The data stream preferably is transmitted and received from a communication system. The first data stream nay be received and stored in a data processor before being read and compressed or decompressed. According to another aspect of the present invention there is provided a system which includes the system for compressing data and the system for retrieving data. According to another embodiment of the present invention there is provided a storage medium having a computer program stored thereon which is adapted to control a computer to set up/implement the combined system. According to another aspect of the present invention there is provided a method of encrypting data using the system for compressing data. According to one embodiment of the method of encrypting data, the location of data items may be changed by using a coding means for changing the data item location identifiers in a reconvertible manner. It is preferred that the data item location identifiers are able to be reordered so that the first location identifiers identify different data item location identifiers to those before reordering. According to another embodiment of the present invention any one of the systems includes a scrambling means for reordering data item location identifiers in the first index data and for storing the method of reordering whereby the reordering can be reversed. According to another aspect of the present invention there is provided a method of decryption using the system for retrieving data. It is pi oferred that the method of decryption includes the scaling means for reversing any reordering which has taken place of the dictionary item location identifiers. According to another embodiftet of the present invention the method of decryption include* a de.crambling means which includes mean, for rawing any reordering of dictionary item location identifier, in the first index data base. According to another embodiment of the prevent invention there is provided a method of encrypting and decrypting data which incorporates the combined system for compresaing and retrieving data, the method also including the step of at predetermined time* using a scrambling means to reorder dictionary item location identifier* in accordance with a predetermined ordering technique which is stored or able to be stored and received by a descraabling means at a receiving end of the system. A preferred embodiment: of the present invention will now be described by way o£ example only with reference to HTML script or text. BEST MODE OF CARRYING OUT THE INVENTION AS an example the following HTML text will be minimised in accordance with the preferred embodiment of the invention: The Frontpage install The above text can be split into a number of groups which for convenience will be referred to as data items. Thus tha word "the" constitutes one data item, the word "frontpage" constitutes another data item and so on for the word ^install", " U«ing the miniaturisation technique in accordance with the present invention two indexing lists are set up as shown in Figure 1. A first lict 11 is set up which is effectively a data base of pointers. For convenience only son* of the pointers are shown/ being those pointers; required to identify text which is stored relating to the sample of HTML text referred to above. The first list 11 it generated by analysing the repetitive structure of HTML text and script that exists Am documents or data transfer streams. This list has common HTML text type documents. All items that are of repetitive nature that can be identified exist in this list. This text list could be a super set of other common lists* for example/ the English language list or the French language list. A second list 12 contains a dictionary of the HTML text which is to be miniaturised. Bach data item is located at a specific position in the list 12 and this position is identified by a number which is pointed to by a pointer from list 11. The list 12 is effectively a dictionary data base which is generated by coding the entrie* in the HTML text and script list. The 128 most common items are located first in the list and are assigned first level representation (typically 8 bits) in alphabetic sequence. The rest of the list is organised alphabetically and is assigned the minimum number of bits to uniquely identify the location of the original data in the list 11. As an example, if the total number of data items (e.g. characters) in the first list is 29,456 then 15 bits (0..32768) would be needed to represent the unique location of the start of a particular data item. The number of unique entries is then calculated. If, for example, there are 3,128 unique entries in the list 11, then 12 bit* (0 to 4096) will be required to identify the unique data itarns in the list. It follows from the above that by setting up the first lint 11 a reduced number of pointers are required to represent the data items in the second list, because data items that are repeated do not need to have an associated pointer, Accordingly if a data item occurs 1,000 times in the second list or dictionary data base, a single pointer is all that is required in the first list 11 and accordingly the single pointer is all that needs to be stored in a general data base 13. Thus referring back to the example of HTML text given above, the word "the" is the first data item which is to be stored in the general data base 13. Because the word "the" is a common word, it therefore occurs in the most common section of the second list 12 and may be located at position 340$. The corresponding pointer from the first list 11 may be located at position BA. Accordingly the word "the" does not need to be stored in the general data base 13 nor does the second list pointer 340$. Instead the first list pointer 8A con be stored in the general data base 13 and this obviously has a lower number of bits required to describe it and accordingly requires less space for storage. The next word in the HTML text is "Frontpage" which is not as common ac the word "th»", but does exist many times in normal HTML text. It therefore is located in the less common section of the dictionary list 12 at a location 23456. m the first list 11 location 23456 i* represented by pointer 2408. It follows therefore that pointer 2408 is placed in the general data base 13 straight after pointer BA. The word "install" is the next data item in the HMTL text and i* an uncommon word which is located at pocition 26578. The corresponding pointer in first list 11 is located at position 2458. Accordingly this pointer 2458 is stored in the general data base 13 after pointer 2408. Finally the script string " The word wPurpleText" is not common in either HTML script or text and therefore does not occur in the dictionary list 12. As a result thin word is represented by an exception flag "00" in the general data base 13 and has no associated pointer. Similarly any other script or text which is not represented in the dictionary data base 12, is also classified as an exception and is copied verbatim into the general data base 13. Reconstruction of the original data represented in the general data base is simply achieved by using a reverse look-up algorithm. Thus if the pointer 8A is read, a look-up algorithm is used to access the first list 11 which gives the location of the corresponding data item at location 3406 in the second list 12. At location 3406 the word "the" is located and this word is then retrieved and substituted for the pointer 8A. The above example discloses what is in effect a double index technique, utilising two pointers. However the present invention may equally be applicable to any nuanber of indexes and pointers, depending on the data which is to be miniaturised. Thus one application would be in'miniaturising data located in telephone white pages. In such a situation a. number of dictionary lists would be required, guch as a names list, a streets list and a locations list. Bach of thea* lists would have thair own separate first and second list pointers using the examples outlined above. Furthermore, each list could have an associated list which would also require a double index pointer system. Thus a streets list having the names of various streets may also require a sub-list of street types such as "ST", PL", «CR" etc. According to another example image data nay be represented by multi-level indexing techniques. Thus the first level may be the fact that the area is black, the second level may indicate the shape, the third level may indicate the size, similarly the levels may relate to further deconstruction of the original data. Clearly the above compression technique is not limited to text based data, but is also able to be used in connection with foreign languages/ foreign character sets (e.g. Arabic and Chinese), music and speech phonemes. The only requirement is that the data has a repetitive nature that can be analysed and represented as uniquely coded and identifiable items. An important advantage of the miniaturisation technique which IB described above li«s with, the ability to search data items in its miniaturised format. Thus instead of searching for the word "the" in the preferred embodiment given above, a search could be conducted for the pointer 8A. This ie in contrast to conventional searching techniques of compressed text, where it is necessary to continually convert and reconvert text in order to complete the search. Although the main focus of the present invention is miniaturisation of data, the invention is equally applicable to encrypting/decrypting data. This is because the ihdexing system described above in effect replaces common data items with associated pointers which act as tokens. Because each token and data item is easily retrievable, the list of tokens/pointers can easily be manipulated in a reversible manner to make unauthorised decryption more difficult. The present invention is therefore applicable to any data which includes repetitive element. This is because these repetitive element* can be represented in an index of pointers/tokens which obviate the need for pointers for each repeated eleaumt. It follow* therefore that theoretically any data stored, for example in computer nanory can b* stored in a miniaturised form by the majority of repeated data items. WE CLAIM: 1. A system of storing miniaturised data comprising a main database, a first index of first location identifiers, a second index of second location identifiers and a dictionary data base of data items, wherein the first location identifiers are adapted to identify the location of second location identifiers in the second index and the second location identifiers are adapted to identify the location of data items in the dictionary data base, the system including receiving data and separating the data into a plurality of data items and storing the data items in the main data base, whereby at least one of the data items is stored in the main data base as at least one first location identifier, which identifies at least one second location identifier, which identifies the or each data item in the dictionary data base. 2. A system as claimed in claim 1 comprising the step of searching the dictionary data base for at least one data item and replacing the data item with one first location identifier which indicates the location of one location identifier in the second index, which second location identifier indicates the location of the data item in the dictionary data base. 3. A system as claimed in claim 2 comprising the step of searching the dictionary data base for each data item and identifying if the data item occurs in the dictionary data base and if the data item occurs in the dictionary data base, retrieving the second location identifier in the second index that identifies the location of the data item in the dictionary data base, retrieving the first location identifier in the first index which identifies the location of the second location identifier in the second index and storing the first location identifier in a main data base in place of the data item. 4. A system as claimed in claim 3 wherein the data item comprises anyone of a string of data, a field of data or other group of data that can represent information in a predetermined format. 5. A system as claimed in claim 3 wherein the or each data item represents a stream of data which represents information which can be searched. 6. A system as claimed in claim 3 wherein each first location identifier includes a pointer to the second index. 7. A system as claimed in claim 6 wherein each second location identifier includes a pointer to the dictionary database. 8. A system as claimed in claim 7 wherein the first and second indexes comprise a plurality of pointers. 9. A system as claimed in claim 8 wherein the dictionary data base comprises a plurality of data bases each with unique addresses which are represented by the location identifiers. 10. A system as claimed in claim 9 wherein each index comprises a plurality of sub-indexes. 11. A system as claimed in claim 9 wherein each second index is divided into different sections representing locations of predetermined types of data items. 12. A system as claimed in claim 11 wherein each first index is divided into different sections representing the location of second location identifiers associated with predetermined types of data items. 13. A system for storing data, the system comprising at least one dictionary data base and at least two index data bases wherein the dictionary data base comprises a plurality of data items, a first one of the index data bases comprising a plurality of data item location identifiers, which respectively identify the location of at least one data item in the dictionary data base and a second one of the index data bases including a plurality of first location identifiers which respectively identify the location of at least one data item location identifier in the first index data base, and wherein the system includes a processing means which is adapted to receive data comprising data items and to store the data in a miniaturised form by storing in place of each data item occurring in the dictionary data base, each corresponding first location identifier, whereby each data item occurring in the dictionary data base can be retrieved by referencing the data item location identifier identified by the first location identifier. 14. A system as claimed in claim 13 wherein the at least two index data bases include separate lists of location identifiers in one or more other data bases. 15. A system as claimed in claim 14 comprising a storage medium having a sequence of instructions adapted to control a data processor to set up the system. 16. A system as claimed in claim 15 wherein the first index database is part of the dictionary database. 17. A system as claimed in claim 14 comprising one or more additional index databases each with location identifiers which identify the location of another location identifier of another index data base. 18. A system as claimed in claim 17 comprising a main data base which is adapted to store a stream of data as a combination of data items which are not represented in the dictionary data base and first location identifiers. 19. A system as claimed in claim 18 wherein the stream of data stored in the main database may have data items and first location identifiers which are stored in an order determined by a further index data base and reprocessing means which is adapted to control the ordering of data in the main data base with the reference to the further index data base. 20. A system as claimed in claim 19 wherein the dictionary data base has data items stored in a predetermined order which is determined by how frequently each data items stored therein is expected to occur in a data stream of data items. 21. A system as claimed in claim 20 wherein the most common data items have a location in the dictionary data base that is identified by a dictionary data base location identifier having minimal bytes compared to an uncommon data item. 22. A system as claimed in claim 21 wherein the dictionary data base index comprises dictionary data base location identifiers arranged sequentially from lowest number to highest number of bytes required to define them. 23. A system as claimed in claim 22 wherein each first location identifier comprises a pointer having a number which identifies a position of one data item location identifier in the dictionary data base index. 24. A system as claimed in claim 23 wherein each data item location identifier comprises a pointer having a number which identifies the position of one data item in the dictionary database. 25. A system as claimed in claim 24 wherein the dictionary data base is divided into different sections which have data items with locations which are identified by data item location identifiers from different dictionary data base indexes. 26. A system as claimed in claim 25 wherein the dictionary database comprises storage space into which data items can be added. 27. A system for retrieving data items stored in a miniaturised form, the system comprising at least one dictionary data base and at least two index data bases wherein the dictionary data base comprises a plurality of data items and a first one of the index data bases comprises a plurality of data item location identifiers which respectively identify the location of at least one data item in the dictionary data base and a second one of the index data bases includes a plurality of first location identifiers which respectively identify the location of at least one data item location identifier in the first index data base and a processing means, wherein the processing means is adapted to receive a first data stream including a plurality of first location identifiers and produce a second data stream including the data items without the first location identifiers and wherein first location identifiers are replaced by corresponding data items. 28. A system as claimed in claim 27 wherein the first data stream is adapted to be received by reading a data base. 29. A system as clamed in claim 28 wherein the data base is located in a storage medium which is readable by a computer hardware device. 30. A system as claimed in claim 29 wherein the data stream is transmitted and received from a communications system. 31. A system as claimed in claim 30 wherein the first data stream is received and stored in a data processor before being read and compressed or decompressed. 32. A system as claimed in claim 31 comprising a scrambling means for reordering data item location identifiers in the first index data base and storing a method of reordering data item location identifiers utilised by the scrambling means, whereby reordering data item location identifiers can be reversed. 33. A system as claimed in claim 32, wherein the scrambling means includes reversing means for reversing any reordering which has taken place of the dictionary item location identifiers. 34. A system for encrypting data, the system comprising at least one dictionary data base and at least two index data bases wherein the dictionary data base comprises a plurality of data items, a first one of the index data bases comprising a plurality of data item location identifiers, which respectively identify the location of at least one data item in the dictionary data base and a second one of the index data bases including a plurality of first location identifiers which respectively identify the location of at least one data item location identifier in the first index data base, and wherein the system includes a processing means which is adapted to receive data including data items and to store the data in a miniaturised form by storing in place of each data item occurring in the dictionary data base, each corresponding first location identifier, whereby each data item occurring in the dictionary data base can be retrieved by referencing the data item location identifier identified by the first location identifier. 35. A system for decrypting data items stored in a miniaturised form, the system comprising at least one dictionary data base and at least two index data bases wherein the dictionary data base comprises a plurality of data items and a first one of the index data bases comprises a plurality of data item location identifiers which respectively identify the location of at least one data item in the dictionary data base and a second one of the index data bases includes a plurality of first location identifiers which respectively identify the location of at least one data item location identifier in the first index data base and a processing means, wherein the processing means is adapted to receive a first data stream including a plurality of first location identifiers and produce a second data stream including the data items without the first location identifiers and wherein first location identifiers are replaced by corresponding data items. 36. A system as claimed in claim 35 including a scrolling means for reordering data item location identifiers in the first index data base and storing a method of reordering data item location identifiers utilised by the scrambling means, whereby reordering data item location identifiers can be reversed.

Full Text

FIELD OF THE INVENTION
The present invention relates primarily, although not exclusively-, to system of
storing, retrieving, encrypting and decrypting miniaturised data. It relates to a
techniques used for storing electronic data in a form which requires less storage
space.
BACKGROUND OF THE INVENTION
A typical method for compressing data involves the use of a dictionary data base which lists commonly occurring data and replaces this commonly occurring data with a coded "token" which effectively represents that data using a reduced number of data bits.
Whenever an item of data occurs repeatedly this data item is replaced by its equivalent "token" and accordingly that data item is stored in a compressed form.
When data is stored in the compressed form, by using a look-up table each token can be replaced by its equivalent data item so that the original data can be reformed.
The above conventional compression technique has a number of drawbacks. These drawbacks include the number of data bits which are required to represent a token can also be significant with the result that significant storage space is required to store each token. In addition searching a data base which includes tokens can be quite cumbersome because tokens need to be reconverted to their original data item before a search of each of the data items can be properly conducted.
OBJECT OF THE INVENTION
The present invention provides an alternative to existing methods of storing data in a miniaturised form and extends to methods for encrypting data as well as systems for implementing the method, computer programs and storage medium for storing electronic data which is able to implement the method and system.
DISCLOSURE OF THE INVENTION
According to the present invention there is provided a method of storing data
including the steps of
providing a first index of first location identifiers, a second index of second location identifiers and a dictionary data base of data it was, wherein the first location identifiers are adapted to identify th* location of second location identifiers in the second index and the second location identifiers are adapted to identify the location of data itams in the dictionary data base, receiving data and separating the data into a plurality of data items and storing the data items in a main data base, whereby at least one of the data, items is stored in the main data base as at least one first location identifier, which identifies at least one second location identifier, which identifies the or each data item in the dictionary data base.
According to another aspect of the present invention there is provided a method of retrieving data stored in a miniaturised form in a main data base, including the steps of accessing the main data base, retrieving one or more items of data including at least one first location identifier from the main data base, using the first location identifier to access and retrieve the location of a second location identifier identified in the first index by the first location identifier, accessing and retrieving from the second location identifier in the second index the location of an item of data in a dictionary data base.
It is preferred that the method of storing data includes the step of searching the dictionary data base for at least one data item and replacing the data item with one first location identifier which indicates the location of one location identifier in the second index, which second location identifier indicates the location of the data item in the dictionary data base.
it is preferred that the method includes the step of searching the dictionary data base for each data item and identifying if the data item occurs in the dictionary data base and if the data item occurs in the dictionary
data base, retrieving the second location identifier in the second index that identifies the location of the data item in the dictionary data base, retrieving the first location identifier in the first index which identifies the location of Che second location identifier in the second index and storing the fixet location identifier in a main data base in place of the data item.
It is preferred that the data item includes a string of data, a field of data or other group of data that can represent information in a predetermined format.
The or each data item preferably represent* a stream of data which represents information which can be searched.
Bach data item preferably represents a name, initial, address, phone number or other words or numbers or initials or characters or character strings or number strings.
Each first location identifier preferably includes a pointer to the second index.
Each second location identifier may include a pointer to the dictionary data base.
The first index may comprise a plurality of pointers.
Preferably the second index comprises a plurality of pointers.
The first index may comprise a sequential list of pointers.
The second index may comprise a sequential list of pointers.
The dictionary data base preferably includes a plurality of data bases each with uniijue addresses which are represented by the location identifiers.
Each index may include a plurality of sub-indexes .
Preferably each second index is divided into different sections representing locations of predetermined types of data items.
Each first index is preferably dividad into different sections representing the location of second location identifiers associated with predetermined types of data items.
The Biethod may include providing additional indexes with additional location identifiers.
According to another aspect of the present invention there is provided a system for storing data, th*
system including at least ofcte dictionary data bas* and at
least two index data bases wherein the dictionary data base comprises a plurality of data items, a first one of the index data bases comprising a plurality of data item location identifiers/ which respectively identify the location of at least one data item in the dictionary data base and a second one of the index data bases including a plurality of first location identifiers which respectively identify the location of at least one data item location identifier in the first index data base, and wherein the system includes a processing means which is adapted to receive data including data items and to store the data in a compressed fora by storing in place of each data item occurring in the dictionary data base, each corresponding first location identifier, whereby each data item occurring in the dictionary data base can be retrieved by referencing the data item location identifier identified by the first location identifier.
Preferably the at least two index databases include separate lists of location identifiers in a common data base or which are part of other data bases.
Preferably there is provided astorage mediuni including a sequence of instructions adapted to control a data processor to set up the system.
The first index data base may b« part of the dictionary data base.
The system may include one or more additional index data bases each with location identifiers which identify the location of another location identifier of
another index data base.
It is preferred that the system includes a main data base which ie adapted to store a stream of data as a combination of data items which are not represented in the dictionary data base and first location identifiers.
According to one embodiment the stream of data Stored in the main data base may have data items and first location identifiers which are stored in an order determined by a further index data base and a reprocessing means which is adapted to control the ordering of data in the main data base with reference to the further index data base.
Preferably the dictionary data base has data items stored in a predetermined order which ia determined by how commonly or frequently each data item stored therein is expected to occur in a data stream of data items.
It is preferred that the most common data items have a location in the dictionary data base that is identified by a dictionary data base location identifier having minimal bits compared to an uncommon data item.
Preferably the dictionary data base index comprises dictionary data base location identifiers arranged sequentially from lowest number to highest number of bits reguired to define them.
Each first location identifier may comprise a pointer having a number which identifies a position of one data item location identifier in the dictionary data base index.
Preferably each data item location identifier
comprises a pointer having a number which identifies the position of one data item in the dictionary data base. The dictionary data base may be divided into different sections which have data items with locations which are identified by data item location identifiers from different dictionary data base indexes.
The dictionary data base preferably includes
storage spare into which data items can b« added.
According to another aspect of the present invention there Is provided a computer program which is adapted to control a computer to provide at least one dictionary data base and at least two index data bases, wherein the dictionary data baee comprises a plurality of data items, a first one of the index data bases comprises a plurality of data item location identifiers, which respectively identify the location of at least one data item in the dictionary data base/ and a second one of the index data bases includes a plurality of firet location identifiers which respectively identify the location of at least one data item location identifier ia. the first index data base/ and wherein the computer program includes instructions to control the computer to receive data including data item* and to store the data in a compressed form by storing in place of each data item which occurs in the dictionary data base, each corresponding first location identifier, whereby each data item which occurs in the dictionary data base can be retrieved by referencing the data item location identifier identified by the first location identifier.
Preferably the at least two index data bases include separate lists of location identifiers in a common data base or which are part of other data bases.
The first index data base may be part of the dictionary data base.
It is preferred that -the computer program includes an exception means for storing data items which do not occur in the dictionary data base.
The exception means preferably includes a predetermined part of the dictionary data base.
The exception means may be adapted to provide a dictionary index location identifier for any new data item stored in the dictionary data base.
According to another embodiment of the present invention the exception means includes an exceptions data
base which is adapted to store data items which do not
occur'in the dictionary data base.
The computer program may include a means for
storing different types of data items in different
dictionary data bases.
Each dictionary data base may have predetermined
dictionary location identifiers in the first one of the
index data bases, which provide the location of data item*
in that data base.
Each dictionary data bas* may be split into a
plurality of data types or fields each having a plurality
of data items of that type or field.
According to another aspect of the present
invention there is provided a storage medium having
computer software stored thereon which is adapted to
control a computer to set up a system according to any one of the previously described embodiments of the invention.
According to another aspect of the present invention there IB provided a system for retrieving data items stored in a miniaturised form, the system including at least one dictionary data base and at least two index data bases wherein the dictionary data base comprises a plurality of data, items and a first one of the index data bases comprises a plurality of data item location identifiers which respectively identify the location of at least one data item in the dictionary data base and a second one of the index data bases includes a plurality of first location identifiers which respectively identify the location of at least one data item location identifier in the first index data base and a processing means/ wherein the processing means is adapted to receive a first data stream including a plurality of first location identifiers and produce a second data stream including the data items without first location identifiers am? wherein first location identifiers are replaced by corresponding data items.
It is preferred that the first data stream is
adapted to be received by reading a data base.
The data has* may be in a storage medium which is readable by a computer hardware device.
The data base may be stored in a computer memory. The data stream preferably is transmitted and received from a communication system.
The first data stream nay be received and stored in a data processor before being read and compressed or decompressed.
According to another aspect of the present invention there is provided a system which includes the system for compressing data and the system for retrieving data.
According to another embodiment of the present invention there is provided a storage medium having a computer program stored thereon which is adapted to control a computer to set up/implement the combined system.
According to another aspect of the present invention there is provided a method of encrypting data using the system for compressing data.
According to one embodiment of the method of encrypting data, the location of data items may be changed by using a coding means for changing the data item location identifiers in a reconvertible manner.
It is preferred that the data item location identifiers are able to be reordered so that the first location identifiers identify different data item location identifiers to those before reordering.
According to another embodiment of the present invention any one of the systems includes a scrambling means for reordering data item location identifiers in the first index data and for storing the method of reordering whereby the reordering can be reversed.
According to another aspect of the present invention there is provided a method of decryption using the system for retrieving data.
It is pi oferred that the method of decryption includes the scaling means for reversing any reordering which has taken place of the dictionary item location identifiers.
According to another embodiftet of the present invention the method of decryption include* a de.crambling means which includes mean, for rawing any reordering of dictionary item location identifier, in the first index data base.
According to another embodiment of the prevent invention there is provided a method of encrypting and decrypting data which incorporates the combined system for compresaing and retrieving data, the method also including the step of at predetermined time* using a scrambling means to reorder dictionary item location identifier* in accordance with a predetermined ordering technique which is stored or able to be stored and received by a descraabling means at a receiving end of the system.
A preferred embodiment: of the present invention
will now be described by way o£ example only with
reference to HTML script or text.
BEST MODE OF CARRYING OUT THE INVENTION
AS an example the following HTML text will be
minimised in accordance with the preferred embodiment of
the invention:
The Frontpage install The above text can be split into a number of
groups which for convenience will be referred to as data items. Thus tha word "the" constitutes one data item, the word "frontpage" constitutes another data item and so on for the word ^install", " U«ing the miniaturisation technique in accordance with the present invention two indexing lists are set up as shown in Figure 1.
A first lict 11 is set up which is effectively a data base of pointers.
For convenience only son* of the pointers are shown/ being those pointers; required to identify text which is stored relating to the sample of HTML text referred to above.
The first list 11 it generated by analysing the repetitive structure of HTML text and script that exists Am documents or data transfer streams. This list has common HTML text type documents. All items that are of repetitive nature that can be identified exist in this list. This text list could be a super set of other common lists* for example/ the English language list or the French language list.
A second list 12 contains a dictionary of the HTML text which is to be miniaturised. Bach data item is located at a specific position in the list 12 and this position is identified by a number which is pointed to by a pointer from list 11.
The list 12 is effectively a dictionary data base which is generated by coding the entrie* in the HTML text and script list.
The 128 most common items are located first in the list and are assigned first level representation (typically 8 bits) in alphabetic sequence. The rest of the list is organised alphabetically and is assigned the
minimum number of bits to uniquely identify the location of the original data in the list 11.
As an example, if the total number of data items (e.g. characters) in the first list is 29,456 then 15 bits (0..32768) would be needed to represent the unique location of the start of a particular data item. The number of unique entries is then calculated. If, for example, there are 3,128 unique entries in the list 11, then 12 bit* (0 to 4096) will be required to identify the unique data itarns in the list.
It follows from the above that by setting up the first lint 11 a reduced number of pointers are required to represent the data items in the second list, because data items that are repeated do not need to have an associated pointer,
Accordingly if a data item occurs 1,000 times in the second list or dictionary data base, a single pointer is all that is required in the first list 11 and accordingly the single pointer is all that needs to be stored in a general data base 13.
Thus referring back to the example of HTML text given above, the word "the" is the first data item which is to be stored in the general data base 13. Because the word "the" is a common word, it therefore occurs in the most common section of the second list 12 and may be located at position 340$. The corresponding pointer from the first list 11 may be located at position BA. Accordingly the word "the" does not need to be stored in the general data base 13 nor does the second list pointer 340$. Instead the first list pointer 8A con be stored in the general data base 13 and this obviously has a lower number of bits required to describe it and accordingly requires less space for storage.
The next word in the HTML text is "Frontpage" which is not as common ac the word "th»", but does exist many times in normal HTML text. It therefore is located in the less common section of the dictionary list 12 at a
location 23456. m the first list 11 location 23456 i* represented by pointer 2408.
It follows therefore that pointer 2408 is placed in the general data base 13 straight after pointer BA. The word "install" is the next data item in the HMTL text and i* an uncommon word which is located at pocition 26578. The corresponding pointer in first list 11 is located at position 2458. Accordingly this pointer 2458 is stored in the general data base 13 after pointer 2408.
Finally the script string " The word wPurpleText" is not common in either HTML script or text and therefore does not occur in the dictionary list 12. As a result thin word is represented by an exception flag "00" in the general data base 13 and has no associated pointer. Similarly any other script or text which is not represented in the dictionary data base 12, is also classified as an exception and is copied verbatim into the general data base 13.
Reconstruction of the original data represented in the general data base is simply achieved by using a reverse look-up algorithm.
Thus if the pointer 8A is read, a look-up algorithm is used to access the first list 11 which gives the location of the corresponding data item at location 3406 in the second list 12.
At location 3406 the word "the" is located and this word is then retrieved and substituted for the
pointer 8A.
The above example discloses what is in effect a double index technique, utilising two pointers. However the present invention may equally be applicable to any nuanber of indexes and pointers, depending on the data
which is to be miniaturised. Thus one application would be in'miniaturising data located in telephone white pages. In such a situation a. number of dictionary lists would be required, guch as a names list, a streets list and a locations list.
Bach of thea* lists would have thair own separate first and second list pointers using the examples outlined above. Furthermore, each list could have an associated list which would also require a double index pointer system.
Thus a streets list having the names of various streets may also require a sub-list of street types such as "ST", *PL", «CR" etc.
According to another example image data nay be represented by multi-level indexing techniques. Thus the first level may be the fact that the area is black, the second level may indicate the shape, the third level may indicate the size, similarly the levels may relate to further deconstruction of the original data.
Clearly the above compression technique is not limited to text based data, but is also able to be used in connection with foreign languages/ foreign character sets (e.g. Arabic and Chinese), music and speech phonemes. The only requirement is that the data has a repetitive nature that can be analysed and represented as uniquely coded and identifiable items.
An important advantage of the miniaturisation technique which IB described above li«s with, the ability to search data items in its miniaturised format. Thus instead of searching for the word "the" in the preferred embodiment given above, a search could be conducted for the pointer 8A. This ie in contrast to conventional searching techniques of compressed text, where it is necessary to continually convert and reconvert text in order to complete the search.
Although the main focus of the present invention is miniaturisation of data, the invention is equally
applicable to encrypting/decrypting data. This is because the ihdexing system described above in effect replaces common data items with associated pointers which act as tokens.
Because each token and data item is easily retrievable, the list of tokens/pointers can easily be manipulated in a reversible manner to make unauthorised decryption more difficult.
The present invention is therefore applicable to any data which includes repetitive element*. This is because these repetitive element* can be represented in an index of pointers/tokens which obviate the need for pointers for each repeated eleaumt. It follow* therefore that theoretically any data stored, for example in computer nanory can b* stored in a miniaturised form by the majority of repeated data items.

WE CLAIM:
1. A system of storing miniaturised data comprising a main database, a first
index of first location identifiers, a second index of second location identifiers and
a dictionary data base of data items, wherein the first location identifiers are
adapted to identify the location of second location identifiers in the second index
and the second location identifiers are adapted to identify the location of data
items in the dictionary data base, the system including receiving data and
separating the data into a plurality of data items and storing the data items in the
main data base, whereby at least one of the data items is stored in the main data
base as at least one first location identifier, which identifies at least one second
location identifier, which identifies the or each data item in the dictionary data
base.
2. A system as claimed in claim 1 comprising the step of searching the
dictionary data base for at least one data item and replacing the data item with
one first location identifier which indicates the location of one location identifier in
the second index, which second location identifier indicates the location of the
data item in the dictionary data base.
3. A system as claimed in claim 2 comprising the step of searching the
dictionary data base for each data item and identifying if the data item occurs in
the dictionary data base and if the data item occurs in the dictionary data base,
retrieving the second location identifier in the second index that identifies the
location of the data item in the dictionary data base, retrieving the first location
identifier in the first index which identifies the location of the second location
identifier in the second index and storing the first location identifier in a main data
base in place of the data item.
4. A system as claimed in claim 3 wherein the data item comprises anyone of
a string of data, a field of data or other group of data that can represent
information in a predetermined format.

5. A system as claimed in claim 3 wherein the or each data item represents a
stream of data which represents information which can be searched.
6. A system as claimed in claim 3 wherein each first location identifier
includes a pointer to the second index.
7. A system as claimed in claim 6 wherein each second location identifier
includes a pointer to the dictionary database.
8. A system as claimed in claim 7 wherein the first and second indexes
comprise a plurality of pointers.
9. A system as claimed in claim 8 wherein the dictionary data base comprises
a plurality of data bases each with unique addresses which are represented by the
location identifiers.
10. A system as claimed in claim 9 wherein each index comprises a plurality of
sub-indexes.

11. A system as claimed in claim 9 wherein each second index is divided into
different sections representing locations of predetermined types of data items.
12. A system as claimed in claim 11 wherein each first index is divided into
different sections representing the location of second location identifiers
associated with predetermined types of data items.
13. A system for storing data, the system comprising at least one dictionary
data base and at least two index data bases wherein the dictionary data base
comprises a plurality of data items, a first one of the index data bases comprising
a plurality of data item location identifiers, which respectively identify the location
of at least one data item in the dictionary data base and a second one of the index
data bases including a plurality of first location identifiers which respectively
identify the location of at least one data item location identifier in the first index

data base, and wherein the system includes a processing means which is adapted to receive data comprising data items and to store the data in a miniaturised form by storing in place of each data item occurring in the dictionary data base, each corresponding first location identifier, whereby each data item occurring in the dictionary data base can be retrieved by referencing the data item location identifier identified by the first location identifier.
14. A system as claimed in claim 13 wherein the at least two index data bases
include separate lists of location identifiers in one or more other data bases.
15. A system as claimed in claim 14 comprising a storage medium having a
sequence of instructions adapted to control a data processor to set up the system.
16. A system as claimed in claim 15 wherein the first index database is part of
the dictionary database.
17. A system as claimed in claim 14 comprising one or more additional index
databases each with location identifiers which identify the location of another
location identifier of another index data base.
18. A system as claimed in claim 17 comprising a main data base which is
adapted to store a stream of data as a combination of data items which are not
represented in the dictionary data base and first location identifiers.
19. A system as claimed in claim 18 wherein the stream of data stored in the
main database may have data items and first location identifiers which are stored
in an order determined by a further index data base and reprocessing means
which is adapted to control the ordering of data in the main data base with the
reference to the further index data base.
20. A system as claimed in claim 19 wherein the dictionary data base has
data items stored in a predetermined order which is determined by how frequently
each data items stored therein is expected to occur in a data stream of data items.

21. A system as claimed in claim 20 wherein the most common data items
have a location in the dictionary data base that is identified by a dictionary data
base location identifier having minimal bytes compared to an uncommon data
item.
22. A system as claimed in claim 21 wherein the dictionary data base index
comprises dictionary data base location identifiers arranged sequentially from
lowest number to highest number of bytes required to define them.
23. A system as claimed in claim 22 wherein each first location identifier
comprises a pointer having a number which identifies a position of one data item
location identifier in the dictionary data base index.
24. A system as claimed in claim 23 wherein each data item location identifier
comprises a pointer having a number which identifies the position of one data
item in the dictionary database.
25. A system as claimed in claim 24 wherein the dictionary data base is
divided into different sections which have data items with locations which are
identified by data item location identifiers from different dictionary data base
indexes.
26. A system as claimed in claim 25 wherein the dictionary database
comprises storage space into which data items can be added.
27. A system for retrieving data items stored in a miniaturised form, the
system comprising at least one dictionary data base and at least two index data
bases wherein the dictionary data base comprises a plurality of data items and a
first one of the index data bases comprises a plurality of data item location
identifiers which respectively identify the location of at least one data item in the
dictionary data base and a second one of the index data bases includes a plurality
of first location identifiers which respectively identify the location of at least one
data item location identifier in the first index data base and a processing means,
wherein the processing means is adapted to receive a first data stream including
a plurality of first location identifiers and produce a second data stream including
the data items without the first location identifiers and wherein first location
identifiers are replaced by corresponding data items.

28. A system as claimed in claim 27 wherein the first data stream is adapted to
be received by reading a data base.
29. A system as clamed in claim 28 wherein the data base is located in a
storage medium which is readable by a computer hardware device.
30. A system as claimed in claim 29 wherein the data stream is transmitted and
received from a communications system.
31. A system as claimed in claim 30 wherein the first data stream is received
and stored in a data processor before being read and compressed or
decompressed.
32. A system as claimed in claim 31 comprising a scrambling means for
reordering data item location identifiers in the first index data base and storing a
method of reordering data item location identifiers utilised by the scrambling
means, whereby reordering data item location identifiers can be reversed.
33. A system as claimed in claim 32, wherein the scrambling means includes
reversing means for reversing any reordering which has taken place of the
dictionary item location identifiers.
34. A system for encrypting data, the system comprising at least one dictionary
data base and at least two index data bases wherein the dictionary data base
comprises a plurality of data items, a first one of the index data bases comprising
a plurality of data item location identifiers, which respectively identify the location
of at least one data item in the dictionary data base and a second one of the index
data bases including a plurality of first location identifiers which respectively
identify the location of at least one data item location identifier in the first index
data base, and wherein the system includes a processing means which is adapted
to receive data including data items and to store the data in a miniaturised form by

storing in place of each data item occurring in the dictionary data base, each corresponding first location identifier, whereby each data item occurring in the dictionary data base can be retrieved by referencing the data item location identifier identified by the first location identifier.
35. A system for decrypting data items stored in a miniaturised form, the
system comprising at least one dictionary data base and at least two index data
bases wherein the dictionary data base comprises a plurality of data items and a
first one of the index data bases comprises a plurality of data item location
identifiers which respectively identify the location of at least one data item in the
dictionary data base and a second one of the index data bases includes a plurality
of first location identifiers which respectively identify the location of at least one
data item location identifier in the first index data base and a processing means,
wherein the processing means is adapted to receive a first data stream including a
plurality of first location identifiers and produce a second data stream including the
data items without the first location identifiers and wherein first location identifiers
are replaced by corresponding data items.
36. A system as claimed in claim 35 including a scrolling means for reordering
data item location identifiers in the first index data base and storing a method of
reordering data item location identifiers utilised by the scrambling means, whereby
reordering data item location identifiers can be reversed.

Documents:

abstract.jpg

in-pct-2002-715-del-abstract.pdf

in-pct-2002-715-del-claims.pdf

in-pct-2002-715-del-correspondence-others.pdf

in-pct-2002-715-del-correspondence-po.pdf

in-pct-2002-715-del-description (complete).pdf

in-pct-2002-715-del-form-1.pdf

in-pct-2002-715-del-form-19.pdf

in-pct-2002-715-del-form-2.pdf

in-pct-2002-715-del-form-3.pdf

in-pct-2002-715-del-form-5.pdf

in-pct-2002-715-del-gpa.pdf

in-pct-2002-715-del-pct-210.pdf

in-pct-2002-715-del-pct-409.pdf

in-pct-2002-715-del-petition-137.pdf

« Previous Patent

Next Patent »

Patent Number

220745

Indian Patent Application Number

IN/PCT/2002/00715/DEL

PG Journal Number

30/2008

Publication Date

25-Jul-2008

Grant Date

04-Jun-2008

Date of Filing

22-Jul-2002

Name of Patentee

ZENTRONIX PTY. LTD.

Applicant Address

Inventors:

#	Inventor's Name	Inventor's Address
1	GRAZIONE MELE
2	JOHN ARCHBOLD

PCT International Classification Number

G06F 7/00

PCT International Application Number

PCT/AU00/01594

PCT International Filing date

2000-12-21

PCT Conventions:

#	PCT Application Number	Date of Convention	Priority Country
1	PQ 4865	1999-12-23	Australia