Title of Invention

"AN OLIGONUCLEOTIDE TAG OR COMPLEMENT FOR USE IN A MULTIPLEX ASSAY"

Abstract An oligonucleotide tag or tag complement for use in a multiplex assay, wherein the tag or tag complement is selected from the group of oligonucleotides consisting of: GATTTGTATTGATTGAGATTAAAG (SEQ ID NO: 1), TGATTGTAGTATGTATTGATAAAG (SEQ ID NO: 2), GATTGTAAGATTTGATAAAGTGTA (SEQ ID NO: 3), GATTTGAAGATTATTGGTAATGTA (SEQ ID NO: 4), GATTGATTATTGTGATTTGAATTG (SEQ ID NO: 5), GATTTGATTGTAAAAGATTGTTGA (SEQ ID NO: 6), ATTGGTAAATTGGTAAATGAATTG (SEQ ID NO: 7), GTAAGTAATGAATGTAAAAGGATT (SEQ ID NO: 9), TGTAGATTTGTATGTATGTATGAT (SEQ ID NO: 13), and GATTAAAGTGATTGATGATTTGTA (SEQ ID NO: 15); wherein including oligonucleotides complementary thereto; wherein the number of false positives and false negatives are minimized by reducing cross reactivity with other nucleic acids in assay for analyzing presence of mutation or polymorphism at the loci of each nucleic acid and determining presence of suspected target contained in a biological mixture.
Full Text POLYNUCLEOTIDBS FOR USB AS TAGS AND TAG COMPLEMENTS,
MANUFACTURE AND USE THEREOF
FIELD OF THE INVENTION
This invention relates to families of oligonucleotide tags for use, for
example, in sorting molecules. Members of a given family of tags can be
distinguished one from the other by specific hybridization to their tag
complements.
BACKGROUND OF THE INVENTION
Specific hybridization of oligonucleotides and their analogs is a
fundamental process that is employed in a wide variety of research, medical,
and industrial applications, including the identification of disease-related
polynucleotides in diagnostic assays, screening for clones of novel target
polynucleotides, identification of specific polynucleotides in blots of
mixtures of polynucleotides, therapeutic blocking of inappropriately
expressed genes and DNA sequencing. Sequence specific hybridization is
critical in the development of high throughput multiplexed nucleic acid
assays. As formats for these assays expand to encompass larger amounts of
sequence information acquired through projects such as the Human Genome
project, the challenge of sequence specific hybridization with high fidelity
is becoming increasingly difficult to achieve.
In large part, the success of hybridization using oligonucleotides
depends on minimizing the number of false positives and false negatives.
Such problems have made the simultaneous use of multiple hybridization probes
in a single experiment i.e. multiplexing, particularly in the analysis of
multiple gene sequences on a gene microarray, very difficult. For example,
in certain binding assays, a number of nucleic acid molecules are bound to a
chip with the desire that a given "target" sequence will bind selectively to
its complement attached to the chip. Approaches have been developed that
involve the use of oligonucleotide tags attached to a solid support that can
be used to specifically hybridize to the tag complements that are coupled to
probe sequences. Chetverin et al. (WO 93/17126) uses sectioned, binary
oligonucleotide arrays to sort and survey nucleic acids. These arrays have a
constant nucleotide sequence attached to an adjacent variable nucleotide
sequence, both bound to a solid support by a covalent linking moiety. These
binary arrays have advantages compared with ordinary arrays in that they can
be used to sort strands according to their terminal sequences so that each
strand binds to a fixed location on an array. The design of the terminal
sequences in this approach comprises the use of constant and variatlc
sequences. United States Patent Nos. 6,103,463 and 6,322,971 issued to
Chetverin et al. on August 15, 2000 and November 27, 2001, respectively.
This concept of using molecular tags to sort a mixture of molecules is
analogous to molecular tags developed for bacterial and yeast genetics
(Hensel et al., Science; 269, 400-403: 1995 and Schoemaker et al., Nature
Genetics; 14, 450-456: 1996). Here, a method termed "signature tagged"
mutagenesis in which each mutant is tagged with a different DNA sequence is
used to recover mutant genes from a complex mixture of approximately 10,000
bacterial colonies. In the tagging approach of Barany et al. (WO 9731256),
known as the "zip chip", a family of nucleic acid molecules, the "zip-code
addresses", each different from each other, are set out on a grid. Target
molecules are attached to oligonucleotide sequences complementary to the
"zipcode addresses," referred to as "zipcodes/* which are used to
specifically hybridize to the address locations on the grid. While the
selection of these families of polynucleotide sequences used as addresses is
critical for correct performance of the assay, the performance has not been
described.
Working in a highly parallel hybridization environment requiring
specific hybridization imposes very rigorous selection criteria for the
design of families of oligonucleotides that are to be used. The success of
these approaches is dependent on the specific hybridization of a probe and
its complement. Problems arise as the family of nucleic acid molecules
cross-hybridize or hybridize incorrectly to the target sequences. While it
is common to obtain incorrect hybridization resulting in false positives or
an inability to form hybrids resulting in false negatives, the frequency of
such results must be minimized. In order to achieve this goal certain
thermodynamic properties of forming nucleic acid hybrids must be considered.
The temperature at which oligonucleotides form duplexes with their
complementary sequences known as the Tm (the temperature at which 50% of the
nucleic acid duplex is dissociated) varies according to a number of sequence
dependent properties including the hydrogen bonding energies of the canonical
pairs A-T and G-C (reflected in GC or base composition), stacking free energy
and, to a lesser extent, nearest neighbour interactions. These energies vary
widely among oligonucleotides that are typically used in hybridization
assays. For example, hybridization of two probe sequences composed of 24
nucleotides, one with a 40% GC content and the other with a 60% GC content,
with its complementary target under standard conditions theoretically may
have a 10°C difference in melting temperature (Mueller et al.. Current
Protocols in Mol. Biol.; 15, 5:1900). Problems in hybridization occur when
the hybridc are allowed to form under hybridization conditions that include a
single hybridization temperature that is not optimal for correct
hybridization of all oligonucleotide sequences of a set. Mismatch
hybridization of non-complementary probes can occur forming duplexes with
measurable mismatch stability (Santalucia et al., Biochemistry; 38: 3458-77,
1999). Mismatching of duplexes In a particular set o£ oligonucleotides can
occur under hybridization conditions where the mismatch results in a decrease
in duplex stability that results in a higher Tm than the least stable
correct duplex of that particular set. For example, if hybridization is
carried out under conditions that favor the AT-rich perfect match duplex
sequence, the possibility exists for hybridizing a GC-rich duplex sequence
that contains a mismatched base having a melting temperature that is still
above the correctly formed AT-rich duplex. Therefore design of families of
oligonucleotide sequences that can be used in multiplexed hybridization
reactions must include consideration for the thermodynamic properties of
oligonucleotides and duplex formation that will reduce or eliminate cross
hybridization behavior within the designed oligonucleotide set.
A multiplex sequencing method has been described in United States
Patent No. 4,942,124, which issued to Church on July 17, 1990. " The
method requires at least two vectors which differ from each other at a
tag sequence. It is stated in the specification that a tag sequence in
one vector will not hybridize under stringent hybridization conditions
to a tag sequence in another vector, i.e. a complementary probe of a
tag in one vector does not cross-hybridize with a tag sequence in
another vector. Exemplary stringent hybridization conditions are given
as 42"C in 500-1000 tnM sodium phosphate buffer. A set of 42 20-mer tag
sequences, all of which lack G residues, is given in Figure 3 of
Church's specification. Details of how the sequences were obtained are
not provided, although Church states that initially 92 were chosen on
the basis of their having sufficient sequence diversity to insure
uniqueness.
There have been other attempts at the development of families of tags.
There are a number of different approaches for selecting sequences for use in
multiplexed hybridization assays. The selection of sequences that can be
used as zipcodes or tags in an addressable array has been described in the
patent literature in an approach taken by Brenner and co-workers. United
States Patent No. 5,654,413 describes a. population of oligonucleotide tags
(and corresponding tag complements) in which each oligonuclcotide tag
includes a plurality of subunits, each eubunit consisting of an
oligonucleotide liaving a length of from three to six nucleotides and each
subunit being selected from a minimally cross hybridizing set,-wherein a
subunit of the set would have at least two mismatches with 'any other sequence
of the set. Table II of the Brenner patent specification describes exemplary
groups of 4mer subunits that are minimally cross hybridizing according to the
aforementioned criteria. In the approach taken by Brenner, constructing non
cross-hybridizing oligonucleotides, relies on the use of subunits that form a
duplex having at least two mismatches with the complement of any other
subunit of the same set. The ordering of subunits in the construction of
oligonucleotide tags is not specifically defined.
Parameters used in the design of tags based on subunits are discussed
in Barany et al. (WO 9731256). For example, in the design of polynucleotide
sequences that are for example 24 nucleotides in length (24mer) derived from
a set of four possible tetramers in which each 24mer "address" differs from
its nearest 24mer neighbour by 3 tetramers. They discuss further that, if
each tetramer differs from each other by at least two nucleotides, then each
24mer will differ from the next by at least six nucleotides. This is
determined without consideration for insertions or deletions when forming the
alignment between any two sequences of the set. In this way a unique "zip
code" sequence is generated. The zip code is ligated to a label in a target
dependent manner, resulting in a unique "zip code" which is then allowed to
hybridize to its address on the chip. To minimize cross-hybridization of a
"zip code" to other "addresses", the hybridization reaction is carried out at
temperatures of 75-80°C. Due to the high temperature conditions for
hybridization, 24mers that have partial homology hybridize to a lesser extent
than sequences with perfect complementarity and represent Mead zones'. This
approach of implementing stringent hybridization conditions for'example,
involving high temperature hybridization, is also practiced by Brenner et.
al.
The current state of technology for designing non-cross hybridizing
tags based on subunits does not provide sufficient guidance to construct a
family of sequences with practical value in assays that require stringent
non-cross hybridizing behavior.
Thus, while it is desirable with such arrays to have, at once, a large
number of address molecules, the address molecules should each.be highly
selective for its own complement sequence. While such an array provides the
advantage that the family of molecules making up the grid is entirely of
design, and aoes not rely on sequences as they occur in nature, the provision
of a family of molecules, which is sufficiently large and where each
individual member is sufficiently selective for its complement over all the
other zipcode molecules (i.e., where there is sufficiently low crosshybridization,
or cross-talk) continues to elude researchers.
SUMMARY OF INVENTION
Using the method of Benight et al. (described in commonly-owned
international patent application No. PCT/CA 01/00141 published under
WO 01/59151 on August 16, 2001) a family of 100 nucleotide sequences was
obtained using a computer algorithm to have optimal hybridization properties
for use in nucleic acid detection assays. The sequence set of 100
oligonucleotides was characterized in hybridization assays, demonstrating the
ability of family members to correctly hybridize to their complementary
sequences with an absence of cross hybridization. These are the sequences
having SEQ ID N0s:l to 100 of Table I. This set of sequences has been
expanded to include an additional 110 sequences that can be grouped with the
original 100 sequences as having non-cross hybridizing properties,'based on
the characteristics of the original set of 100 sequences. These additional
sequences are identified as SEQ ID NOs-.lOl to 210 of the sequences in Table
I. How these sequences were obtained is described below.
Variant families of sequences (seen as tags or tag complements)
of a family of sequences taken from Table I are also part of the
invention. For the purposes of discussion, families of tag complements
will be described.
A family of complements is obtained from a set of
oligonucleotides based on a family of oligonucleotides such as those of
Table I. For illustrative purposes, providing a family of complements
based on the oligonucleotides of Table I will be described.
Firstly, sequences based on the oligonucleotides of Table I can
be represented as follows:
Table IAi Numeric sequences corresponding to
word patterns of a 'set of
oligonucleotides
Here, each of the numerals l to 22 (numeric identifiers)
represents a 4mer and the pattern of numerals 1 to 22 of the sequences
in the above list corresponds to the pattern of tetrameric
oligonucleotide segments present in the oligonucleotides of Table X,
which oligonucleotides have been found to be non-cross-hybridizing, as
described further in the detailed examples. Each 4mer is selected from
the group of 4mers consisting of WWWW, WWWX, WWWY, WWXW, WWXX, WWXY,
WWYW, WWYX, WWYY, WXWW, WXWX, WXWY, WXXW, WXXX, WXXY, WXYW, WXYX, WXYY,
WYWW, WYWX, WYWY, WYXW, WYXX, WYXY, WYYW, WYYX, WYYY, XWWW, XWWX, XWWY,
XWXW, XWXX, XWXY, XWYW, XWYX, XWYY, XXWW, XXWX, XXWY, XXXW, XXXX, XXXY,
XXYW, XXYX, XXYY, XYWW, XYWX, XYWY, XYXW, XYXX, XYXY, XYYW, XYYX, XYYY,
YWWW, YWWX, YWWY, YWXW, YWXX, YWXY, YWYfc, 7WYX, YWYY, YXWW, YXWX, YXWY,
- 10 -
YXXW, YXXX, YXXY, YXYW, YXYX, YXYY, YYWW, YYWX, YYWY, YYXW, YYXX, YYXY,
YYYW, YYYX, and YYYY. Here W, X and Y represent nucleotide bases, A,
G, C, etc., the assignment of bases being made according to rules
described below.
Given this numeric pattern, a 4mer is assigned to a numeral. For
example, 1 = WXYY, 2 - YWXY, etc. Once a given 4mer has been assigned
to a given numeral, it is not assigned for use in the position of a
different numeral. It is possible, however, to assign a different 4mer
to the same numeral. That is, for example, the numeral 1 in one
position could be assigned WXYY and another numeral 1, in a different
position, could be assigned XXXW, but none of the other numerals 2 to
10 can then be assigned WXYY or XXXW. A different way of saying this
is that each of 1 to 22 is assigned a 4mer from the list of eighty-one
4mers indicated so as to be different from all of the others of 1 to
In the case of the specific oligonucleotides given in Table I, 1
WXYY, 2 = YWXY, 3 = XXXW, 4 = YWYX, 5 = WYXY, 6 = YYWX, 7 = YWXX, 8 -
WYXX, 9 = XYYW, 10 - XYWX, 11 - YYXW, 12 « WYYX, 13 - XYXW, 14 » WYYY,
15 • WXYW, 16 = WYXW, 17 * WXXW, 18 = WYYW, 19 = XYYX, 20 = YXYX, 21 -
YXXY and 22 - XYXY.
Once the 4mers are assigned to positions according to the above
pattern, a particular set of oligonucleotides can be created by
appropriate assignment of bases, A,- T/U, G, C to W, X, Y. These
assignments are made according to one of the following two sets of
rules:
(i) Each of W, X and Y is a base in which:
(a) W - one of.A, T/U, G, and C,
X « one of A, T/U, G, and C,
Y - one of A, T/U, G, and C,
and each of W, X and Y is selected so as to be different
from all of the others of W, X and Y, and
(b) an unselected said base of (i)(a) can be substituted any
number of times for any one of W, X and Y.
or
(ii) Bach of W, X and Y is a base in which:
(a) W ** G or C,
X - A or T/U,
Y - A or T/U,
and X * Y, and
(b) a base not selected in (ii)(a) can be inserted into each
sequence at one or more locations, the location of each
insertion being the same in each sequence as that of every
other-£5.T^&ceypf the set.
In the case of the specific oligonucleotides given in Table I, W
- G, X = A and Y = T.
In any case, given a set of oligonucleotides generated according
to one of these sets of rules, it is possible to modify the members of
a given set in relatively minor ways and thereby obtain a different set
of sequences while more or less maintaining the cross-hybridization
properties of the set subject to such modification. In particular, it
is possible to insert up to 3 of A, T/U, G and C at any location of any
sequence of the set of sequences. Alternatively, or additionally, up
to 3 bases can be deleted from any sequence of the set of sequences.
A person skilled in the art would understand that given a set of
oligonucleotides having a set of properties making it suitable for use
as a family of tags (or tag complements) one can obtain another family
with the same property by reversing the order of all of the members of
the set. In other words, all the members can be taken to be read 5' to
3' or to be read 3' to 5'.
A family of complements of the present invention is based on a
given set of oligonucleotides defined as described above. Each
complement of the family is baaed on a different oligonucleotide of the
set and each complement contains at least 10 consecutive (i.e.,
contiguous) bases of the oligonucleotide on which it is based, when
selecting a sequence of contiguous bases, preference is given to those
sets in which the contiguous bases of each oligonucleotide of a set are
selected such that the position of the first ba'se of each said
oligonucleotide within the sequence on which it is based is the same
for all nucleotides of the set. Thus, for example, if a nucleotide
sequence of twenty contiguous bases corresponds to bases 3 to 22 of the
sequence on which the nucleotide sequence is based, then preferably,
the twenty contiguous bases for all nucleotide sequences corresponds to
bases 3 to 22 of the sequences on which the nucleotides sequences are
based. For a given family of complements where one is seeking to
reduce or minimize inter-sequence similarity that would result in
cross-hybridization, each and every pair of complements meets
particular homology requirements. Particularly, subject to limited
exceptions, described below, any two complements within a set of
complements are generally required to have a defined amount of
dissimilarity.
In order to notionally understand these requirements for
dissimilarity as they exist for a given pair of complements of-a
family, a phantom sequence is generated from the pair of complements.
A "phantom" sequence is a single sequence that is generated from a pair
of complements by selection, from each complement of the pair, of a
string of bases wherein the bases of the string occur in the same order
in both complements. An object of creating such a phantom sequence is
to create a convenient and objective means of comparing the sequence
identity of the two parent sequences from which the phantom sequence is
created.
A phantom sequence can be considered to be similar in concept to
a consensus sequence which a person skilled in the art would be
familiar with, except that a consensus sequence typically is comprised
of all bases from both parent sequences with each position reflecting
the most common choice of base at each position (the union of both
sequences), whereas the "phantom" sequence is comprised of only bases
which occur in the same order in both parent sequences (the
intersection of both sequences). Also, a consensus sequence usually is
indicative of a common phylogenetic ancestry for the two sequences (or
more than 2 sequences depending on how many sequences are used to
generate the consensus sequence), whereas the "phantom" sequence
definition has been created to specifically address the sequence
similarity between 2 complementary sequences which have no ancestral
history but may have a propensity to cross-hybridize under certain
conditions.
A phantom sequence may thus be generated from exemplary Sequence
1 and Sequence 2 as follows:
Sequence 1: ATGTTTAGTGAAAAGTTAGTATTG
Sequence 2: ATGTTAGTGAATAGTATAQTATTG
Phantom Sequence: ATGTTAGTGAAAGTTAGTATTG
The phantom sequence generated from these two sequences is thus
22 bases in length. That is, one can see that there are 22 identical
bases with identical sequence (the same order) in Sequence Nos. 1 and
2. There is a total of three insertions/deletions and mismatches
present in the phantom sequence when compared with the sequences from
which it was generated:
ATOT-TAOTGAA-AGT-TAGTATTG
The dashed lines in this latter representation of the phantom sequence
indicate the locations of the insertions/deletions and mismatches in
the phantom sequence relative to the parent sequences from which it was
derived. Thus, the "T" marked with an asterisk in Sequence 1, the "A"
marked with a diamond in Sequence 2 and the "A-T" mismatch of Sequences
1 and 2 marked with two dots were deleted in generating the phantom
sequence.
A person skilled in the art will appreciate that the term
"insertion/deletion" is intended to cover the situations indicated by
the asterisk and diamond. Whether the change is considered, strictly
speaking, an insertion or deletion is merely one of vantage point.
That is, one can see that the fourth base of Sequence 1 can be deleted
therefrom to obtain the phantom sequence, or a WT" can be inserted
after the third base of the phantom sequence to obtain Sequence 1.
One can thus see that if it were possible to create a phantom
sequence by elimination of a single insertion/deletion from one of the
parent sequences, that the two parent sequences would have identical
homology over the length of the phantom sequence except for the
presence of a single base in one of the two sequences being compared.
Likewise, one can see that if it were possible to create a phantom
sequence through deletion of a mismatched pair of bases, one base in
each parent, that the two parent sequences would have identical
horaology over the length of the phantom sequence except for the
presence of a single base in each of the sequences being compared.' For
this reason, the effect of an insertion/deletion is considered
equivalent to the effect of a mismatched pair of bases when comparing
the homology of two sequences.
Once a phantom sequence is generated, the compatibility of the
pair of complementF from which it was generated within a family of
complements can be systematically evaluated.
According to one embodiment of the invention, a pair of
complements is compatible for inclusion within a family of complements
if any phantom sequence generated from the pair of complements has the
following properties:
(1) Any consecutive sequence of bases in the phantom sequence which is
identical to a. consecutive sequence of bases in each of the first and
second complements from which it is generated is no more than ((3/4 x
L) - 1) bases in length;
(2) The phantom sequence, if greater than or equal to (5/6 x L) in length
contains at least 3 insertions/deletions or mismatches when compared
to the first and second complements from which it is generated; and
(3) The phantom sequence is not greater than or equal to (11/12 x L) in
length.
Here, 1^ is the length of the first complement, La is the length
of the second complement, and L - L1( or if LI * L2, L is the greater of
LI and L2.
In particular preferred embodiments of the invention, all pairs
of complements of a given set have the properties set out above. Under
particular circumstances, it may be advantageous to have a limited
number of complements that do not meet all of these requirements when
compared to every other complement in a family.
In one case, for any first complement there are at most two
second complements in the family which do not meet all of the three
listed requirements. For two such complements, there would thus be a
greater chance of cross-hybridization between their tag counterparts
and the first complement. In another case, for any first complement
there is at most one second complement which does not meet all of three
listed requirements.
It is also possible, given this invention, to design a family of
complements where a specific number or specific portion of the
complements do not meet the three listed requirements. For example, a
set could be designed where only one pair of complements within the set
do not meet the requirements when compared to each other. There could
be two pairs, three pairs, and any number of pairs up to and including
all possible pairs. Alterracively, it may be advantageous to have a
given proportion of pairs of complements that do not meet the
requirements, say 10% of pairs, when compared with other sequences that
do not meet one or more of the three requirements listed. This number
could instead be 5%, 15%, 20%, 25%, 30%, 35%, or 40%.
The foregoing comparisons would generally be largely carried out
using appropriate computer software. Although notionally described in
terms of a phantom sequence for the sake of clarity and understanding,
it will be understood that a competent computer programmer can carry
out pairwise comparisons of complements in any number of ways using
logical steps that obtain equivalent results.
The symbols A, G, T/U, C take on their usual meaning in the art
here. In the case of T and U, a person skilled in the art would
understand that these are equivalent to each other with respect to the
inter-strand hydrogen-bond (Watson-Crick) binding properties at work in
the context of this invention. The two bases are thus interchangeable
and hence the designation of T/U.
Analogues of the naturally occurring bases can be inserted in
their respective places where desired. An Analogue is any non-natural
base, such as peptide nucleic acids and the like that undergoes normal
Watson-Crick pairing in the same way as the naturally occurring
nucleotide base to which it corresponds.
In one broad aspect, the present invention is thus a composition
comprising molecules for use as tags or tag complements wherein each
molecule comprises an oligonucleotide selected from a set of
oligonucleotides based on a group of sequences having numeric patters
as set out in Table IA wherein:
(A) each of 1 to 22 is a 4mer selected from the group of 4mers consisting
of WWWW, WWWX, WWWY, WWXW, WWXX, WWXY, WWYW, WWYX, WWYY, WXWW, WXWX,
WXWY, WXXW, WXXX, WXXY, WXYW, WXYX, WXYY, WYWW, WYWX, WYWY, WYXW,
WYXX, WYXY, WYYW, WYYX, WYYY, XWWW, XWWX, XWWY, XWXW, XWXX, XWXY,
XWYW, XWYX, XWYY, XXWW, XXWX, XXWY, XXXW, XXXX, XXXY, XXYW, XXYX,
XXYY, XYWW, XYWX, XYWY, XYXW, XYXX, XYXY, XYYW, XYYX, XYYY, YWWW,
YWWX, YWWY, YWXW, YWXX, YWXY, YWYW, YWYX, YWYY, YXWW, YXWX, YXWY,
YXXW, YXXX, YXXY, YXYW, YXYX, YXYY, YYWW, YYWX, YYWY, YYXW, YYXX,
YYXY, YYYW, YYYX, and YYYY, and
(B) each of 1 to 22 is selected so as to be different from all of the
others of 1 to 22;
(C) each of W, X and Y is a base in which either (i) or (ii) is true:
(i) (a) W = on3 of A, T/U, G, and C,
X = one of A, T/U, G, and C,
Y = one of A. T/U, G, and C,
and each of W, X and Y is selected so as to be different
from all of the others of W, X and Y, and
(b) an unselected said base of (i)(a) can be substituted any
number of times for any one of W, X and Y,
(ii) (a) W = G or C,
X = A or T/U,
Y = A or T/U,
and X * Y, and
(b) a base not selected in (ii) (a) can be inserted into each
sequence at one or more locations, the location of each
insertion being the same in all the sequences;
(D) up to three bases can be inserted at any location of any of the
sequences or up to three bases can be deleted from any of the
sequences;
(E) all of the sequences of a said group of oligonucleotides are read 5'
to 3' or are read 3' to 5'; and
wherein each oligonucleotide of a said set has a sequence of at least ten
contiguous bases of the sequence on which it is based, provided that:
(F) (I) the quotient of the sum of G and C divided by the sum of A, T/U,
G and C for all combined sequences of the set is between about
0.1 and 0.40 and said quotient for each sequence of the set does
not vary from the quotient for the combined sequences by more
than 0.2; and
(II) for any phantom sequence generated from any pair of first and
second sequences of the set LL and L2 in length, respectively,
by selection from the first and second sequences of identical
bases in identical sequence with each other:
(i) any consecutive sequence of bases in the phantom
sequence which is identical to a consecutive sequence of
bases in each of the first and second sequence from
which it is generated is less than ((3/4 x L) - 1) bases
in length;
(ii) the phantom sequence, if greater than or equal to (5/6 x
L) in length, contains at least three
insertions/deletions or mismatches when compared to the
first and second sequences from which it is generated;
and
(iii) the phantom sequence is not greater than or equal to
(11/12 x L) in length;
where L = LI, or if LI * L2, where L is the .greater of LI and L2;
and
wherein any base present may be substituted by an analogue thereof.
In a preferred embodiment, a set of oligonucleotides of the
invention is based on the numeric patters of sequences tested in
Example 2.
Preferably,
(G) for the group of 24mer sequences in which each 1 = GATT, each 2 =
TQAT, each 3 = AAAG, each 4 = TGTA, each 5 = GTAT, each 6 - TTGA, each
7 = TGAA, each 8 » GTAA, each 9 = ATTG, each 10 - ATGA, each 11 =
TTAG, each 12 B GTTA, each 13 = ATAQ, each 14 = GTTT, each 15 - GATG,
each 16 - GTAG, each 17 = GAAG, each 18 = GTTG, each 19 = ATTA, each
20 - TATA, each 21 - TAAT and each 22 - ATAT, for the group of
sequences in which each 1 » GATT, each 2 = TGAT, each 3 = AAAG, each 4
- TGTA, each 5 = GTAT, each 6 = TTGA, each 7 - TGAA, each 8 = GTAA,
each 9 - ATTG, each 10 = ATGA, each 11 = TTAG, each 12 = GTTA, each 13
ATAG, each 14 = GTTT, each 15 = GATG, each 16 = GTAG, each 17 =
GAAG, each 18 - GTTG, each 19 - ATTA, each 20 - TATA, each 21 - TAAT
and each 22 ATAT, under a defined set of conditions in which the
maximum degree of hybridization between a sequence and any complement
of a different sequence of the group of 24mer sequences does not
exceed 30% of the degree of hybridization between said sequence and
its complement, for all oligonucleotides of the set, the maximum
degree of hybridization between an oligonucleotide and a complement of
any other oligonucleotide of the set does not exceed 50% of the degree
of hybridization of the oligonucleotide and its complement.
It can thus be seen that it is possible to routinely determine
whether all oligonucleotides of a selected set are all minimally crosshybridizing.
Preferably in (G), under said defined set of conditions
in which the maximum degree of hybridization between a sequence and any
complement of a different sequence dees not exceed 30% of the degree of
hybridization between said sequence and its complement, it is also true
that the degree of hybridization between each sequence and its
complement varies by a factor of between 1 and 10, more preferably
between 1 and 9, and more preferably between 1 and 8. It is
demonstrated in Example 2, below, for a preferred set of
oligonucleotides, that the degree of hybridization between each
sequence and its specific complement varies by a factor of between 1
and 8.25 and the maximum degree of hybridization between a aequence and
any complement of a different sequence does not exceed 10.2% cf the •
degree of hybridization between the sequence and its specific
complement.
Preferably, the maximum degree of hybridization in (G) between a
sequence and any complement of a different sequence of the group of
24mer sequences does not exceed 25%, more preferably wherein the '
maximum degree of hybridization in (G) between a sequence and any
complement of a different sequence of the group of 24mer sequences does
not exceed 20%, more preferably wherein the maximum degree of
hybridization in (G) between a sequence and any complement of a
different sequence of the group of 24mer sequences does not exceed 15%,
more preferably wherein the maximum degree of hybridization in (G)
between a sequence and any complement of a different sequence of the
group of 24mer sequences does not exceed 11%.
Preferably, under the defined set of conditions of (G), the
maximum degree of hybridization between a sequence and a complement of
any other sequence of the set is no more than 15% greater than the
maximum degree of hybridization between a'sequence and any complement
of a different sequence of the said group of 24mer sequences, more
preferably no more than 10% greater, more preferably no more than 5%
greater,
According to Example 2, described below, under conditions of 0.2
M Nad, 0.1 M Tris, 0.08% Triton X-100, pH 8.0 at 37°C, the maximum
degree of hybridization between a sequence and any complement of a
different sequence of the group of 24mer sequences does not exceed
10.2% when 24mer nucleotide sequences are covalently linked to a solid
support, in this case microparticles or beads.
In another preferred aspect of the composition, in (G) for the
group of 24mers the maximum degree of hybridization between a sequence
and any complement of a different sequence does not exceed 15% of the
degree of hybridization between said sequence and its complement and
the degree of hybridization.between each sequence and its complement
varies by a factor of between 1 and 9, and for all oligonucleotidee of
the set, the maximum degree of hybridization between an oligonuc.leotide
and a complement of any other oligonucleot.ide of the set does not
exceed 20% of the degree of hybridization of the oligonucleotide and
its complement.
In a preferred aspect, each of the 4rcers represented by numerals
1 to 22 is selected from the group of 4mers consisting of WXXX, WXXY,
V-'XYX, WXYY, WYXX, WYXY, WYYX, WYYY, XWXX, XWXY, XWYX, XWYY, XXWX.. ZXWY,
XXXW, XXYW, XYWX, XYWY, XYXW, XYYW, YWXX, YWXY, YWYX, YWYY, YXWX, YXWY,
YXXW, YXYW, YYWX, YYWY, YYXW, and YYYW.
In another aspect, each of the 4mers represented by numeral 1 are
identical to each other, each of the 4mers represented by numeral 2 are
identical to each other, each of the 4mers represented by numeral 3 are
identical to each other, each of the 4mers represented by numeral 4 are
identical to each other, each of the 4mers represented by numeral 5 are
identical to each other, each of the 4mers represented by numeral 6 are
identical to each other, each of the 4mers represented by numeral 7 are
identical to each other, each of the 4mers represented by numeral 8 are
identical to each other, each of the 4mers represented by numeral 9 are
identical to each other, each of the 4mers represented by numeral 10
are identical to each other, each of the 4mers represented by numeral
11 are identical to each other, each of the 4mers represented by
i
numeral 12 are identical to each other, each of the 4mers represented
by numeral 13 are identical to each other, each of the 4mers
represented by numeral 14 are identical to each other, each of the
4mers represented by numeral 15 are identical to each other, each of
the 4mers represented by numeral 16 are identical to each other, each
of the 4mers represented by numeral 17 are identical to each other,
each of the 4mers represented by numeral IS are identical to each
other, each of the 4mers represented by numeral 19 are identical to
each other, each of the 4mers represented by numeral 20 are identical
to each other, each of the 4mers represented by numeral 21 are
identical to each other, and each of the 4mers represented by numeral
22 are identical to each other.
In another aspect, at least one of the 4mers represented by the
numeral 1 has the sequence WXYY, at least one of the 4mers represented
by the numeral 2 has the sequence YWXY, at least one of the 4mers
represented by the numeral 3 has the sequence XXXW, at least one of the
4mers represented by the numeral 4 has the sequence YWYX, at least one
of the 4mers represented by the numeral 5 has the sequence WYXY, at
least one of the 4mers represented by the numeral 6 has the sequence
YYWX, at least one of the 4mers represented by the numeral 7 has the
sequence YWXX, at least one of the 4mers represented by the numeral 8
has the sequence WYXX, at least one of the 4mers represented by the
numeral 9 has the sequence XYYW, at least one of the 4mers represented
by the numeral 10 has the sequence XYWX, at least one of the 4mars
represented by the numeral 11 has the sequence YYXW, at least one of
the 4mers represented by the numeral 12 has tiie sequence WYYX, at least
one of the 4mers represented by the numeral 13 has the sequence XYXW,
at least one of the 4mers represented by the numeral 14 has the
sequence WYYY, at least one of the 4mers represented by the numeral.15
has the sequence WXYW, at least one of the 4mers represented by the
numeral 16 has the sequence WYXW, at least one of the 4mers represented
by the numeral 17 has the sequence WXXW, at least one of the 4mers
represented by the numeral 18 has the sequence WYYW, at least one of
the 4mers represented by the numeral 19 has the sequence XYYX, at least
one of the 4mers represented by the numeral 20 has the sequence YXYX,
at least one of the 4mers represented by the numeral 21 has the
sequence YXXY, and/or at least one of the 4mers represented by the
numeral 22 has the sequence XYXY.
In one preferred aspect, the invention is a composition in which
each 1 - WXYY, each 2 = YWXY, each 3 - XXXW, each 4 - YWYX, each 5 -
WYXY, each 6 - YYWX, each 7 - YWXX, each 8 = WYXX, each 9 «> XYYW, each
10 - XYWX, each 11 = YYXW, each 12 o WYYX, each 13 * XYXW, each 14 =•
WYYY, each 15 = WXYW, each 16 * WYXW, each 17 • WXXW, each 18 « WYYW,
each 19 - XYYX, each 20 = YXYX, each 21 = YXXY and each 22 - XYXY.
In one broad aspect, the invention is a composition wherein a
group of sequences is based on those having numeric patterns of those
with numeric identifiers l to 173 of Table IA and wherein each of the
4mers represented by numerals 1 to 14 in (A) is selected from the group
of 4mers consisting of WXYY, YWXY, XXXW, YWYX, WYXY, YYWX, YWXX, WYXX,
XYYW, XYWX, YYXW, WYYX, XYXW, and WYYY.
In such a composition it is preferred that each of the 4mers
represented by numeral '1 are identical to each other, each of the 4mers
represented by numeral 2 are identical to each other, each of the 4mers
represented by numeral 3 are identical to each other, each of the 4mers
represented by numeral 4 are identical to each other, each of the 4mers
represented by numeral 5 are identical to each other, each of the 4mers
represented by numeral 6 are identical to each other, each of the 4raers
represented by numeral 7 are identical to each other, each of the 4mers
represented by numeral 8 are identical to each other, e'ach of the 4mers
represented by numeral 9 are identical to each other, each of the 4mers
represented by numeral 10 are identical to each 'other, each of the
4mers represented by numeral 11 are identical to each other, each of
the 4mers represented by numeral 12 are identical to each other, each
of the 4mers represented by numeral 13 are identical to each other,
and/or each of the 4mers represented by numeral 14 are iriftntical to
each other.
It is also preferred that at least one of the 4mers represented
by the numeral 1 has the. sequence WXYY, at least one of the 4mers
represented by the numeral 2 has the sequence YWXY, at least one of the
4mers represented by the numeral 3 has the sequence XXXW, at least one
of the 4mers represented by the numeral 4 has the sequence YWYX, at
least one of the 4mers represented by the numeral 5 has the sequence
WYXY, at least one of the 4mers represented by the numeral 6 has the
sequence YYWX, at least one of the 4mers represented by the numeral 7
has the sequence YWXX, at least one,of the 4mers represented by the
numeral 8 has the sequence WYXX, at least one of the 4mers represented
by the numeral 9 has the sequence XYYW, at least one of the 4mers
represented by the numeral 10 has the sequence XYWX, at least one of
the 4mers represented by the numeral 11 has the sequence YYXW, at least
one of the 4mers represented by the numeral 12 has the sequence WYYX,
at least one of the 4mers represented by the .numeral 13 has the
sequence XYXW, and/or at least one of the 4mers represented by the
numeral 14 has the sequence WYYY.
More preferably, each 1 - WXYY, each 2 - YWXY, each 3 - XXXW,
each 4 = YWYX, each 5 » WYXY, each 6 YYWX, each 7 - YWXX, each 8 -
WYXX, each 9 =• XYYW, each 10 - XYWX, each 11 = YYXW, each 12 = WYYX,
each 13 = XYXW, and each 14 = WYYY.
In another broad aspect, the invention is a composition in which
a group of sequences is based on those sequences having the numeric
patters of those with sequence identifiers 1 to 100 set out in Table IA
and wherein each of the 4mers represented by numerals 1 to 10 in (A) is
selected from the group of 4mers consisting of WXYY, YWXY, XXXW, YWYX,
WYXY, YYWX, YWXX, WYXX, XYYW, and XYWX.
In such a composition it is preferred that each of the 4mers
represented by numeral 1 are identical to each other, each of the 4mers
represented by numeral 2 are identical to each other, each of the 4mers
represented by numeral 3 are identical to each other, each of the 4mers
represented by numeral 4 are identical to each other, each of the 4mers
represented by numeral 5 are identical to each other, each of the 4mers
represented by numeral 6 are identical to each other, each of the 4mers
represented by numeral 7 are identical to each other, each of the 4mers
represented by numeral 8 are identical to each other, each of the 4mers
represented by numeral 9 are identical to each other, and/or each of
the 4mers represented by numeral 10 are ideuticaJ to each other.
It also preferred that at least one of the im-jrs represented by
the numeral 1 has the sequence WXYY, at'least one of the 4mers
represented by the numeral 2 has the sequence YWXY, at least one of the
4tners represented by the numeral 3 has the sequence XXXW, at least one
of the 4mers represented by the numeral 4 has the sequence YWYX, at
least one of the 4mers represented by the numeral 5 has the sequence
WYXY, at least one of the 4mers represented by the numeral 6 has the
sequence YYWX, at least one of the 4mers represented by the numeral 7
has the sequence YWXX, at least one of the 4mers represented by the
numeral 8 has the sequence WYXX, at least one of the 4mers represented
by the numeral 9 has the sequence. XYYW, and/or at least one of the
4mers represented by the numeral 10 has the sequence XYWX.
More preferably, each 1 - WXYY, each 2 - YWXY, each 3 - XXXW,
each 4 a YWYX, each 5 = WYXY, each 6 = YYWX, each 7 = YWXX, each
WYXX, each 9 - XYYW, and each 10 = XYWX.
In the most preferred compositions, in (C)(i)(a): W = one of G
and C; X - one of A and T/U; and Y = one of A and T/U, maintaining the
provisos of (F). More preferably, (C)(i)(a): W = G; X = one of A, and
T/U; and Y » one of A and T/U. Even more preferably, wherein W = 0; X
- A; and Y - T/U.
A person skilled in the art will appreciate that the closer a
given oligonucleotide sequence variant is to one of the most preferred
sequences (Table I), the more closely it will resemble the preferred
sequence as a member of a minimally cross-hybridizing set of
oligonucleotides.
It will be understood that when it is stated herein that a group
of sequences (oligonucleotides) is minimally cross-hybridizing/ it is
meant that any given member of the group of sequences
(oligonucleotides) only minimally hybridizes with the complement of any
other sequence (oligonucleotide) of that group.
Preferably, in (F)(I), the quotient for each sequence of the set
does not vary from the quotient for the combined sequences by more than
0.1, more preferably, the quotient for each sequence of the set does
not vary from the quotient for the combined sequences by more than
0.05, ,more preferably the quotient for each sequence of the set does
not vary from the quotient for the combined sequences by more than
0.01.
Also, it is preferred in (F)(I) that the quotient of the sum of G
and C divided by the sum of A, T/U, G and C for all combined sequences
of the set is between about 0.15 and 0.35, more preferably between
about 0.2 and 0.3, more preferably between about 0.21 and 0 23, more
preferably between about 0.22 and 0.20, more preferably between about
0.23 and 0.27, even more preferably between about 0.24 and 0.26, and
most preferably the quotient is 0.25.
Preferably, in (D) up to two bases can be inserted at any
location of any of the sequences or up to two bases can be deleted from
any of the sequences, more preferably only one base can be inserted at
any location of any of the sequences or one base can be deleted from
any of the sequences, and most preferably no base is inserted at any
location of any of the sequences.
Also, it is preferred that in (D), no base can be deleted from
any of the sequences, and most preferably, in (D) no base can be
inserted at or deleted from any location of any of the sequences.
In preferred compositions, each of the oligonucleotides of a set
has a sequence at least eleven contiguous bases of the sequence on
which it is based; or more preferably each of the oligonucleotides of a
set has a sequence at least twelve contiguous bases of the sequence on
which it is based; or more preferably each of the oligonucleotides of a
set has a sequence at least thirteen contiguous bases of the sequence
on which it is based; or more preferably each of the oligonucleotides
of a set has a sequence at least fourteen contiguous bases of the
sequence on which it is based; or more preferably each of the
oligonucleotides of a set has a sequence at least fifteen contiguous
bases of the sequence on which it is based; or more preferably each of
the oligonucleotides of a set has a sequence at least sixteen
contiguous bases of the sequence on which it is based; or more
preferably each of the oligonucleotides of a set has a sequence at
least seventeen contiguous bases of the sequence on which it is based;
or more preferably each of the oligonucleotides of a set has'a sequence
at least eighteen contiguous bases of the sequence on which it is
based; or more preferably each of the oligonucleotides of a set has a
sequence at least nineteen contiguous bases of the sequence on which it
is based; or more preferably each of the oligonucleotides of a set has
a sequence at least twenty contiguous bases of the sequence on which it
is based; or more preferably each of the oligonucleotides of a set has
a sequence at least twenty-one contiguous bases of the sequence on
which it is based; or more preferably each of the oligonucleotides of a
set has a sequence at least twenty-two contiguous bases of the sequence
on which it is based; or more preferably each cf the oligonucleotides
of a set has a sequence at least twenty-three contiguous bases of the
sequence on which it is based; or more preferably each of the
oligonucleotides of a set has a sequence at least twenty-four
contiguous bases of the sequence on which it is based.
Preferably, each of the oligonucleotides of a set is up to thirty
bases in length; or more preferably each of the oligonucleotides of a
set is up to twenty-nine bases in length; or more preferably each of
the oligonucleotides of a set is up to twenty-eight bases in length; or
more preferably each of the oligonucleotides of a set is up to twentyseven
bases in length; or more preferably each of the oligonucleotides
of a set is up to twenty-six bases in length; or more preferably each
of the oligonucleotides of a set is up to twenty-five bases in length;
or more preferably each of the oligonucleotides of a set is up to
twenty-four bases in length.
. In certain preferred embodiments, each of the oligonucleotides of
a set has a length of within five bases of the average length of all of
the oligonucleotides in the set; or more preferably each of the
oligonucleotides of a set has a length of within four bases of the
average length of all of the oligonucleotides in the set; or more
preferably each of the oligonucleotides of a set has a length of within
three bases of the average length of all of the oligonucleotides in the
set; or more preferably each of the oligonucleotides of a set has a
length of within two bases of the average length of all of the
oligonucleotides in the set; or more preferably each of the
oligonucleotides of a set has a length of within one base of the
average length of all of the oligonucleotides in the set.
Preferably, the string of contiguous bases of each
oligonucleotide of a said set are selected such that the position of
the first base of each string within the sequence on which it is based
is the same for all nucleotides of the set.
In preferred embodiments, the composition includes at least ten
said molecules, or at least eleven said molecules, or at least twelve
said molecules, or at least thirteen said molecules, or at least
fourteen said molecules, or at least fifteen said molecules, or at
least sixteen said molecules, or at least seventeen said molecules, or
at least eighteen said molecules, or at least nineteen said molecules,
or at least twenty said molecules, or at least twenty-one said
molecules, or at least twenty-two said molecules, or at least twentythree
said molecules, or at least twenty-four said molecules, or at
least twenty-five said molecules, or at least twenty-six said
molecules, or at least twenty-seven saidl rriolecules, or at least twentyeight
said molecules, or at least twenty-pine said molecules, or at
least thirty said molecules, or at least thirty-one said molecules, or
at least thirty-two said molecules, or at least thirty-three said
molecules, or at least thirty-four said molecules, or at least thirtyfive
said molecules, or at least thirty-six said molecules, or at least
thirty-seven said molecules, or at least thirty-eight said molecules,
or at least thirty-nine said molecules, or at least forty said
molecules, or at least forty-one said molecules, or at least forty-two
said molecules, or at least forty-three said molecules, or at least
forty-four said molecules, or at least forty-five said molecules, or at
least forty-six said molecules, or at least forty-seven said molecules,
or at least forty-eight said molecules, or at least forty-nine said
molecules, or at least fifty said molecules, or at least sixty said
molecules, or at least seventy said molecules, or at least eighty said
molecules, or at least ninety said molecules, or at least one hundred
said molecules, or at least, depending upon the size of the group of
sequences on which the oligonucleotides are based, one hundred and ten
said molecules, or at least one hundred and twenty said molecules, or
at least one hundred and thirty said molecules, or at least one hundred
and forty said molecules, or at least one hundred and fifty said
molecules, or at least one hundred and sixty said molecules, or at
least one hundred and seventy said molecules, or at least-one hundred
and eighty said molecules, or at least one hundred and ninety said
molecules, or at least two hundred said molecules.
A person skilled in the art will appreciate that, depending upon
the use to which a family of oligonucleotides of the invention are to
be put, it may or may not be desirable to include with sequences that
can be distinguished one from the other (i.e., are minimally crosshybridizing)
a number of sequences that do cross hybridize with each
other.
In a preferred aspect, the invention is a composition wherein in
(II) (i), any consecutive sequence of bases in the phantom sequence
which is identical to a consecutive sequence of bases in each of the
first and second sequences from which it is generated is no more than
((2/3 x L) - 1) bases in length. More preferably, the phantom
sequence, if greater than or equal to (3/4 x L) in length, contains at
least 3 insertions/deletions or mismatches when compared to the first
and second sequences from which it is generated, and even more
preferably, the phantom sequence, if greater than or equal to (2/3 x I in length, contains at least 3 insertions/deletions or mismatches when
compared to the first and second sequences from which it is generated.
In another preferred aspect, in (II)(iii), the phantom sequence
is not greater than or equal to (5/6 x L) in length, more preferably,
the phantom sequence is not greater than or equal to (3/4 x L) in
length.
In another broad aspect, the invention is i a composition
containing molecules for use as tags or tag complements wherein each
molecule comprises an oligonucleotide selected from a set of
oligonucleotides based on a group of sequences having the numeric
patterns of the sequences tested in Example 2, as set out in Table IA,
wherein:
(A) wherein 1 = WXYY, each 2 = YWXY, each 3 « XXXW, each 4 = YWYX, each 5
= WYXY, each 6 = YYWX, each 7 = YWXX, each 8 = WYXX, each 9 = XYYW,
each 10 » XYWX, each 11 » YYXW, each 12 - WYYX, each 13 - XYXW, each
14 = WYYY, each 15 = WXYW, each 16 = WYXW, each 17 - WXXW, each IS =
WYYW, each 19 » XYYX, each 20 = YXYX, each 21 - YXXY and each "22
XYXY;
(B) each of W, X and Y is a base in which either:
(i) (a) W - one of A, T/U, G, and C,
X =.one of A, T/U, Q, and C,
Y = one of A, T/U, G, and C,
and each of W, X and Y is selected so as to be different
from all of the others of W, X and Y,
(b) an unselected said base of (i)(a) can be substituted any
number of times for any one of W, X and Y, or
(ii) (a) W = G or C,
X - A or T/U,
Y = A or T/U,
and X * Y, and
(b) a base not selected in (ii) (a) can be inserted into each
sequence at one or more locations, the location of each
insertion being the same in all the sequences;
(C) up to three bases can be inserted at any location of any of the
sequences or up to three bases can be deleted from any of the
sequences;
(D) all of the sequences of a said group of oligonucleotides are read 5'
to 3' or are read 3' to 5'; and
wherein each oligonucleotide of a said set has a sequence of at least ten
contiguous bases of the sequence on which it is based, provided i.hat:
'SJ the quotient of the sum of G and C divided by the sum of A, T/U, G and
C for all combined sequences of the set is between about 0.1 and 0.40
and said quotient for each sequence of the set does not vary from the
quotient for the combined sequences by more than 0.2; and
(F) for the group of 24mer sequences in which each 1 « GATT, each 2 =
TGAT, each 3 = AAAG, each 4 = TGTA, each 5 - GTAT, each 6 * TTGA, each
7 - TGAA, each 8 --GTAA, each 9 » ATTG, each 10 - ATGA, each 11 -
TTAG, each 12 - GTTA, each 13 = ATAG, each 14 - GTTT, each 15 = GATG,
each 16 = GTAG, each 17 - GAAG, each 18 = GTTG, each 19 •• ATTA, each
20 - TATA, each 21 - TAAT and each 22 = ATAT, for the group of
sequences in which each 1 = GATT, each 2 = TGAT, each 3 « AAAG, each 4
- TGTA, each 5 » GTAT, each 6 - TTGA, each 7 • TGAA, each 8 - GTAA,
each 9 = ATTG, each 10 = ATGA, each 11 = TTAG, each 12 = GTTA, each 13
= ATAG, each 14 = GTTT, each 15 - GATG, each 16 = GTAG, each 17 =
GAAG, each 18 - GTTG, each 19 = ATTA, each 20 = TATA, each 21 » TAAT
and each 22 = ATAT, under a defined'set of conditions in which the
maximum degree of hybridization between a sequence and any complement
of a different sequence of the group of 24mer sequences does not
exceed 30% of the degree of hybridization between said sequence and
its complement, for all oligonucleotides of the set, the maximum
degree of hybridization between an oligonucleotide and a complement of
any other oligonucleotide of the set does not exceed 50% of the degree
of hybridization of the oligonucleotide and its complement;
wherein any base present may be substituted by an analogue thereof.
Again, preferably, the contiguous bases of each oligonucleotide
of a set are selected such that the position of the first base of each
oligonucleotide within the sequence on which it is based is the same
for all nucl'eotides of the set.
In a preferred aspect, subject to the provisos of (E) and (F)
above, each oligonucleotide of a said set comprises.a said sequence of
twenty-four contiguous bases of the sequence on which it is based.
More preferably, subject to the proviso of (F) each
oligonucleotide of a said set comprises a said sequence of twenty-four
contiguous bases of the sequence on which it is based.
In particularly preferred aspects, in (B), W = one of G and C; X
- one of A and T/U; and Y - one of A and T/U.
Even more preferred, in (B): W * G; X « one of A, and T/U; and Y
= one of A and T/U.
In another broad aspect, the invention is a composition that
includes fifty minimally cross-hybridizing molecules for use as tags or
tag complements wherein each molecule comprises an oligonucleotide
comprising a sequence of nucleotide bases for which, under a defined
set of conditions, the maximum degree of hybridization between a said
oligonucleotide and any complement of a different oligonucleotide does
not exceed about 10% of the degree of hybridization between said
oligonucleotide and its complement.
A preferred set.of such defined conditions results in a level of
hybridization that is the same as the level of hybridization obtained
when hybridization conditions include 0.2 M NaCl, 0.1 M Tris, 0.08%
Triton X-100, pH 8.0 at 37°C, and the sequences are covalently linked
to microparticles. Of course, these conditions are preferably used
directly.
Preferably, under the defined set of conditions', whatever the
conditions are, the degree of hybridization between each
oligonucleotide and its complement varies by a factor of between 1 and
8.
Preferably, each oligonucleotide is the same length and is at
least twenty nucleotide bases in length. More preferably, each
oligonucleotide is twenty-four nucleotide bases in length.
In certain embodiments, each molecule of a composition is linked
to a solid phase support so as to be distinguishable from a mixture of
said molecules by hybridization to its complement. Each such molecule
can be linked to a defined location on such a solid phase support, the
defined location for each molecule being different than the definedlocation
for other, different, molecules.
In one preferred embodiment, the solid phase support is a
microparticle and each said molecule is covalently attached'to a
different microparticle than each other different said molecule.
The invention includes kits for-sorting and identifying
polynucleotides. Such a kit can include one or more solid phase
supports each having one or more spatially discrete regions, each such
region having a uniform population of substantially identical tag
complements covalently attached. The tag complements are made up of a
set of oligonucleotides of the invention.
The one or more solid phase supports can be a planar substrate in
which the one or more spatially discrete regions is a plurality of
spat ally addressable regions.
The tag complements car. also be' coupled to microparticles.
Microparticles preferably fiacii have a diameter in.the range of from 5
to 40 urn.
Such a kit preferably includes microparticles that are
spectrophotometrically unique, and therefore distinguisable from each
other according to conventional laboratory techniques. Of course for
such kits to work, each type of microparticle would generally have only
one tag complement associated with it, and usually there would be a
different oligonucleotide tag complement associated with (attached to)
each type of microparticle.
The invention includes methods of using families of
oligonucleotides of the invention. «
One such method is of analyzing a biological sample containing a
biological sequence for the presence of a mutation or polymorphism at a
locus of the nucleic acid. The method includes:
(A) amplifying the nucleic acid molecule in the presence of a first primer
having a 5'-sequence having the sequence of ai tag complementary to the
sequence of a tag complement belonging to a family of tag complements
of the invention to form an amplified molecule with a 5'-end with a
sequence complementary to the sequence of the tag;
(B) extending the amplified molecule in the presence of a polymerase and a
second primer having 5'-end complementary the 3'-end of the amplified
sequence, with the 3'-end of the second primer extending to immediately
adjacent said locus, in the presence of a plurality of nucleoside
triphosphate derivatives each of which is: (i) capable of
incorporation during transciption by the polymerase onto the 3'-end of
a growing nucleotide strand; (ii) causes termination of polymerization;
and (iii) capable of differential detection, one from the other,
wherein there is a said derivative complementary to each possible
nucleotide present at said locus of the amplified sequence;
(C) specifically hybridizing the second primer to a tag complement having
the tag complement sequence of (A); and
(D) detecting the nucleotide derivative incorporated into the second primer
in (B) so as to identify the base located at the locus of the nucleic
acid.
In another method of the invention, a biological sample
containing a plurality of nucleic acid molecules is analyzed for the
presence of a mutation or polymorphism at a locus of each nucleic acid
molecule, for each nucleic acid molecule. This method includes steps
of:
(A) amplifying the nucleic acid molecule in the presence of a first primer
having a 5'-sequence having the sequence of a tag complementary to the
sequence of a tag complement belonging to a family of tag complements
of the invention to form an amplified molecule with a 5'-end with a
sequence complementary to the sequence of the tag;
(B) extending the amplified molecule in the presence of a polymerase and a
second primer having 5'-end complementary the 3'-end of the amplified
sequence, the 3'-end of the second primer extending to immediately
adjacent said locus, in the presence of a plurality of nucleoside
triphosphate derivatives each of which is: (i) capable of
incorporation during transciption by the polymerase onto the 3'-end of
a growing nucleotide strand; (ii) causes termination of polymerization;
and (iii) capable of differential detection, one from the other,
wherein there is a said derivative complementary to each possible
nucleotide present at said locus of the amplified molecule;
(C) specifically hybridizing the second primer to a tag complement having
the tag complement sequence of (A); and
(D) detecting the nucleotide derivative incorporated into the second primer
in (B) BO as to identify the base located at the locus of the nucleic
acid;
wherein each tag of (A) is unique for each nucleic acid molecule and steps
(A) and (B) are carried out with said nucleic molecules in the presence of
each other.
Another method includes analyzing a biological sample that
contains a plurality of double stranded complementary nucleic acid
molecules for the presence of a mutation or polymorphism at a locus of
each nucleic acid molecule, for each nucleic acid molecule. The method
includes steps of:
(A) amplifying the double stranded molecule in the presence of a pair of
first primers, each primer having an identical 5'-sequence 'having the
sequence of a tag complementary to the sequence of a tag complement
belonging to a family of tag complements of the invention to form
amplified molecules with 5'-ends with a sequence complementary to the
sequence of the tag;
(B) extending the amplified molecules in the presence of a polymerase and a
pair of second primers each second primer having a 5'-end complementary
a 3'-end of the amplified sequence, the 3'-end of each said second
primer extending to immediately adjacent said locus, in the presence of
a plurality of nucleoside triphosphate derivatives each of which is:
(i) capable of incorporation during transciption by the polymerase onto
the 3'-end of a growing nucleotide strand; (ii) causes termination of
polymerization; and (iii) capable of differential detection, one from
the other;
(C) specifically hybridizing each of the second primers to a tag complement
having the tag complement sequence of (A); and
(D) detecting the nuclebtide derivative incorporated into the second
primers in (B) so as to identify the base located at said locus;
wherein the sequence of each tag of (A) is unique for each nucleic acid
molecule and steps (A) and (B) are carried out with said nucleic molecules in
the presence of each other.
In yet another aspect, the invention is a method of analyzing a
biological sample containing a plurality of nucleic acid molecules for
the presence of a mutation or polymorphism at a locus of each nucleic
acid molecule, for each nucleic acid molecule, the method including
steps of:
(a) hybridizing the molecule and a primer, the primer having a 5'-sequence
having the sequence of a tag complementary to the sequence of a tag
complement belonging to a family of tag complements of the invention
and a 3'-end extending to immediately adjacent the locus;
(b) enzymatically extending the 3'-end of the primer in the presence of a
plurality of nucleoside triphosphate derivatives each of which is: (i)
capable of enzymatic incorporation onto .the 3'-end of a growing
nucleotide strand; (ii) causes termination of said extension; and (iii)
capable of differential detection, one from the other, wherein there is
a said derivative complementary to each possible nucleotide present at
said locus;
(c) specifically hybridizing the extended primer formed in step (b) to a
tag complement having the tag complement sequence of (a); and
(d) detecting the nucleotide derivative incorporated into the primer in
step (b) so as to identify the base located at the locus of the nucleic
acid molecule;
wherein each tag of (a) is unique for each nucleic acid molecule and steps
(a) and (b) are carried out with said nucleic molecules in the presence of
each other.
The derivative can be a dideoxy nucleoside' triphosphate.
Each respective complement can be attached as a uniform
population of substantially identical complements in spacially discrete
regions on one or more solid phase support(s).
Each tag complement can include a label, each such label'being
different for respective complements, and step (d) can include
detecting the presence of the different labels for respective
hybridization complexes of bound ta^s and tag complements.
Another aspect of the invention includes a method of determining
the presence of a target suspected of being contained in a mixture.
The method includes the steps of:
(i) labelling the target with a first label;
(ii) providing a first detection moiety capable of specific binding' to the
target and including a first tag;
(iii) exposing a sample of the mixture to the detection moiety under
conditions suitable to permit (or cause) said specific binding of the
molecule and target;
(iv) providing a family of suitable tag complements of the invention wherein
the family contains a first tag complement having a sequence
complementary to that of the first tag;
(v) exposing the sample to the family of tag complements under conditions
suitable to permit (or cause) specific hybridization of the first tag
and its tag complement;
(vi) determining whether a said first detection moiety hybridized to a first
said tag complement is bound to a said labelled target in order to
determine the presence or absence of said target in the mixture.
Preferably , the first tag complement is linked to a solid
support at a specific location of the support and step (vi) includes
detecting the presence of the first label at said specified location.
Also, the first tag complement can include a second label and
step (vi) includes detecting t;he presence of the first and second
labels in a hybridized complex of the moiety and the first tag
complement.
Further, the target can be selected from the -group consisting of
organic molecules, 'antigens, proteins, polypeptides, antibodies and
nucleic acids. The target can be an antigen and the first molecule -can
be an antibody specific for that antigen.
The antigen is usually a polypeptide or protein and the labelling
step can include conjugation of fluorescent molecules, digoxigenin,
biotinylation and the like.
The target can be a nucleic acid and the labelling step can
include incorporation of fluorescent molecules, radiolabelled
nucleotide, digoxigenin, biotinylation and the like.
DETAILED DESCRIPTION OF TBS INVENTION
FIGURES
Reference is made to the attached figures in which,
Figures 1A and IB illustrate results obtained in the crosshybridization
experiments described in Example 1. Figure 1A shows the
hybridization pattern found when a microarray containing all 100 probes (SEQ
ID NOs:l to 100) was hybridized with a 24mer oligonucleotide having the
complementary sequence to SEQ ID NO:3 (target). Figure IB shows the pattern
observed when a similar array was hybridized with a mix of all 100 targets,
i.e., oligonucleotides having the sequences complementary to SEQ ID N0s:l to
100.
Figure 2 shows the intensity of the signal (MFI) for each perfectly
matched sequence (indicated in Table I) and its complement obtained as
described in Example 2.
Figure 3 is a three dimensional representation showing crosshybridization
observed for the sequences of Figure 2 as described in
'Example 2. The results shown in Figure 2 are reproduced along the
diagonal of the drawing.
Figure 4 is illustrative of results obtained for an individual
target (SEQ ID N0:23, target No. 16) when exposed to.the 100 probes of
Example 2. The MFI for each bead is plotted.
DETAILED EMBODIMENTS
The invention provides a family of minimally cross-hybridizing
sequences. The invention includes a method for sorting complex mixturea of
molecules by the use of families of the sequences as oligonucleotide sequence
tags. The families of oligonucleotide sequence tags are designed so as to
provide minimal cross hybridization during the' sorting process. Thus any
sequence within a family of sequences will not cross hybridize with any other
sequence derived from that family under appropriate hybridization conditions
known by those skilled in the art. The invention is particularly useful in
highly parallel processing of analytes.
Families of Oligonucleotide Sequence Tags
The present invention includes a family of 24mer polynucleotides, that
have been demonstrated to be minimally cross-hybridizing with each other.
This family of polynucleotidea is thus useful as a family of tags, and their
complements as tag complements.
The oligonucleotide sequences that belong to families of sequences that
do not exhibit cross hybridization behavior can be derived by computer
programs (described in international patent publication NO. WO 01/59151).
The programs use a method of generating a maximum number of minimally crosshybridizing
polynucleotide sequences that can be summarized as follows.
First, a set of sequences of a given length are created based on a given
number of block elements. Thus, if a family of polynucleotide sequences 24
nucleotides (24mer) in length is desired from a set of 6 block elements, each
element comprising 4 nucleotides, then a family of 24mers is generatedconsidering
all positions of the 6 block elements. In this case, there will
be 6s (46,656) ways of assembling the 6 block elements to generate all
possible polynucleotide sequences 24 nucleotides in length.
Constraints are imposed on the sequences and are expressed as a set of
rules on the identities of the blocks such that homology between any two
sequences will not exceed the degree of homology desired between these two
sequences. All polynucleotide sequences generated which obey the rules-are
saved. Sequence comparisons are performed in order to generate an incidence
matrix. The incidence matrix is presented as a simple graph and the sequences
with the desired property of being minimally cross hybridizing are found from
a clique of the simple graph, which may have multiple cliques. Once a clique
containing a suitably large number of sequences is found, the sequences are
experimentally tested to determine if it is a set of minimally cross
hybridizing sequences. This method has been used to obtain the 100 non crosshybridizing
tags of Table I that are the subject of this patent application.
The method includes a rational approach to the'selection of groups of
sequences that are used to describe the blocks. For example there are n4
different tetramers that can be obtained from n different nucleotides, nonstandard
bases or analogues thereof. In a more preferred embodiment there are
44 or 256 possible tetramers when natural nucleotides are used. More
preferably 81 possible tetramers when only 3 bases are used A, T and 0. Most
preferably 32 different tetramers when all sequences have only one Q.
Block sequences can be composed of a subset of natural bases most
preferably A, T and G. Sequences derived from blocks that are deficient in
one base possess useful characteristics, for example, in reducing potential
secondary structure formation or reduced potential for cross hybridization
with nucleic acids in nature. Sets of block sequences that are most
preferable in constructing families of non cross hybridizing tag sequences
should contribute approximately equivalent stability to the formation of the
correct duplex as all other block sequences of the set. This should provide
tag sequences that behave isothermally. This can be achieved for example by
maintaining a constant base composition for all block sequences such as one G
and three A's or T's for each block sequence. Preferably, non-cross
hybridizing sets of block sequences will oe comprised from blocks of
sequences that are isothermal. The block sequences should be different from
each other by at least one mismatch. Guidance for selecting such sequences
is provided by methods for selecting primer and or probe sequences that can
be found in published techniques (Robertson et al., Methods Mol Biol;98:121-
54 (1998); Rychlik et al, Nucleic Acids Research, 17:8543-8551 (1989);
Breslauer et al., Proc Natl Acad Sci., 83:3746-3750 (1986)) and the like.
Additional sets of sequences can be designed by extrapolating on the original
family of non cross hybridizing sequences by simple methods known to those
skilled in the art.
A preferred family of 100 tags is shown as SEQ ID N0s:l to 100 in
Table I. Characterization of the family of 100 sequence tags was performed
to determine the ability of these sequences to form specific duplex
structures with their complementary sequences and to assess the potential for
cross hybridization. The 100 sequences were synthesized and spotted onto
glass slides where they were coupled to the surface by amine linkage.
Complementary tag sequences were Cy3-labeled and hybridized individually to
the array containing the family of 100 sequence tags. Formation of duplex
structures was detected and quantified for each of the positions on the
array. Each of the tag sequences performed as expected, that is the perfect
match duplex was formed in the absence of significant cross hybridization
under stringent hybridization conditions. The results of a aample
hybridization are shown in Figure 1. Figure la shows the hybridization
pattern seen when a microarray containing all 100 probes was hybridized with
the target complementary to probe 181234. The 4 sets of paired spots
correspond to the probe complementary to the target. Figure Ib shows the
pattern seen when a similar array was hybridized with a mix of all 100
targets. These results indicate that the family of sequences which is the
subject of this patent can be used as a family of non-cross hybridizing (tag)
sequences.
The family of 100 non-cross-hybridizing sequences can be expanded by
incorporating additional tetramer sequences that are used in constructing
further 24mer oligonucleotides. In one example, four additional words were
included in the generation of new sequences to be considered for inclusion as
non-cross talkers in a family of sequences that were obtained from the above
method using 10 tetramers. In this case, the four additional words were
selected to avoid potential homologies with all potential combinations pf
Other words: YYXW (TTAG); WYYX (GTTA); XYXW (ATAG) and WYYY (GTTT). The
total number of sequences containing six words using the 14 possible words is
14s or 7,529,536. These sequences were screened to eliminate sequences that
contain repetitive regions tbat present potential hybridization problems such
as four or more of a similar ,oa« of these sequences was compared to the sequence set of the original family of
100 non-cross-hybridizing sequences (SEQ ID N0s:l to 100) . Any new sequence
that -contained a minimal threshold of homology (that does not include the use
of insertions or deletions) such as 15 or more matches with any of the
original family of sequences was eliminated. In other words, if it was
possible to align a new sequence with one or more of the original 100
sequences so as to obtain a maximum simple homology of 15/24 or more, the new
sequence was dropped. "Simple homology" between a pair of sequences is
defined here as the number of pairs of nucleotides that are matching (are the
same as each other) in a comparison of two aligned sequences divided by the
total number of potential matches. "Maximum simple homology" is obtained
when two sequences are aligned with each other so as to have the maximum
number of paired matching nucleotides. In any event, the set of new
sequences so obtained was referred to as the "candidate sequences". One of
the candidate sequences was arbitrarily chosen and referred to as sequence
101. All the candidate sequences were checked against sequence 101, and
sequences that contained 15 or more non-consecutive matches (i.e., a maximum
simple homology of 15/24 (62.5%) or more were eliminated. This results in a
smaller set of candidate sequences from which another sequence is selected
that is now referred to as sequence 102. The smaller set of candidate
sequences is now compared to sequence 102 eliminating sequences that
contained 15 or more non-consecutive matches and the process is repeated
until there are no candidate sequences remaining. Also, any sequence
selected from the candidate sequences is eliminated if it has 13 or more
consecutive matches with any other previously selected candidate sequence.
The additional set of 73 tag sequences so obtained (SEQ ID NOsrlOl to
173) is composed of sequences that when compared to any of SEQ ID NOsil to
100 of Table I have no greater similarity than the sequences of the original
100 sequence tags of Table I. The sequence set as derived from the original
family of non cross hybridizing sequences, SEQ ID NOs:l to 173, are expected
to behave with similar hybridization properties to the sequences having SEQ
ID NOs:l to 100 since it is understood that sequence similarity correlates
directly with cross hybridization (Southern et al., Nat. Genet.; 21, 5-9:
1999).
The set of 173 24mer oligonucleotides were expanded to include
those having SEQ ID N0s:174 to 210 as follows. The 4mers WXYW, XYXW,
WXXW, WYYW, XYYX, YXYX, YXXY and XYXY where W=Q, X=A, and Y=U/T were
used in combination with the fourteen 4mers used in the generation of
SEQ ID N0s:l to 173 to generate potential 24-base oligonucleotides.
Excluded from the se>: were those containing the sequences patterns GG,
AAAA and TTTT. To be included in the set of additional 24mers, a
sequence also had to have at least one of the 4mers containing two G's:
WXYW (GATG) , WYXW (QTAG) , WXXW (GAAG) , WYYW (GTTG) while also
containing exactly six G's. Also required for a 24mer to be included
was that there be at most six bases between every neighboring pair of
G's, Another way of putting this is that there are at most six non-G's
between any two G's. Also, each G nearest the 5'-end of its
oligonucleotide (the left-hand side as written in Table I) was required
to occupy one of the first to seventh positions (counting the 5'-
terminal position as the first position.) A set of candidate sequences
was obtained by eliminating any new sequence that was found to have a
maximum simple homology of 16/24 or more with any of the previous set
of 173 oligonucleotides (SEQ ID NOs:l to 173). As above, an arbitrary
174th sequence was chosen and candidate sequences eliminated by
comparison therewith. In this case the permitted maximum degree of
simple homology was 16/24. A second sequence was also eliminated if
there were ten consecutive matches between the two (i.e., it was
notionally possible to generate a phantom sequence containing a
sequence of 10 bases that is identical to a sequence in each of the
sequences being compared). A second sequence was also eliminated if it ;
was possible to generate a phantom sequence 20 bases in length or
greater.
A property of the polynucleotide sequences shown in Table I is that the
maximum block homology between any two sequences is never greater than 66 2/3
percent. This is because the computer algorithm by which the sequences were
initially generated was designed to prevent such an occurrence. It is within
the capability of a person skilled in the art, given the family of sequences
of Table I, to modify the sequences, or add other s.equences while largely
retaining the property of minimal-cross hybridization which the
polynucleotides of Table I have been demonstrated to have.
There are 210 polynucleotide sequences given in Table I. since all 210
of this family of polynucleotides can. work with each other as a minimally
cross-hybridizing set, then any plurality of polynucleotides that is a subset
of the 210 can also act as a minimally cross-hybridizing set of
polynucleotides. An application in which, for example, 30 molecules are to
be sorted using a family of polynucleotide tags and tag complements could
thus use any group of 30 sequences shown in Table I. This is not to say that
some subsets may be found in practical sense to be more preferred than
others. For example, it may be found that a particular subset is more
tolerant of a wider variety of conditions under which hybridization is
conducted before the decree of cross-hybridization becomes unacceptable.
It may be desirable to use polynucleotides that are shorter in length
than the 24 bases of those in Table I. A family of subsequences (i.e.,
subframes of the sequences illustrated) based on those contained in Table I
having as few as 10 bases per sequence could be chosen, so long as the
subsequences are chosen to retain homological properties between any two of
the sequences of the family important to their non cross-hybridization.
The selection of sequences using this approach would be amenable to a
computerized process. Thus for example, a string of 10 contiguous bases of
the first 24mer of Table II could be selected: GATTTGTATTGATTGAGATTAAAG.
A string of contiguous bases from the second 24mer could then be
selected and compared for maximum homology against the first chosen sequence:
TGATTGTAGTATGTATTGATAAAG
• Systematic pairwise comparison could then be carried out to determine
if the maximum homology requirement of 66 2/3 percent is violated:
As can be seen, the maximum homology between the two selected
subsequences is 50 percent (5 matches out of the total length of 10), and so
these two sequences are compatible with each other.
A lOmer subsequence can be selected from the third 24mer sequence of
Table I, and pairwise compared to each of the first two lOmer sequences to
determine its compatability therewith, etc. and in this way a family of lOmer
sequences developed.
It is within the scope of this invention, to obtain families of
sequences containing llmer, 12mer, 13mer, 14mer, 15mer, 16mer, 17mer, 18mer,
19mer, 20tner, 21mer, 22mer and 23mer sequences by analogy to that shown for
lOmer sequences.
It may be desirable to have a family of sequences in which there are
sequences greater in length than the 24mer sequences shown in.Table I. It is
within the capability of a person skilled in the art, given the family of
sequences shown in Table I, to obtain such a family of sequences. One
possible approach would be to insert into each sequence at one or more
locations a nucleotide, non natural base or analogue such that the longer
sequence should not have greater similarity than any two of the original non
cross hybridizing sequences of Table I and the addition of extra bases to the
tag sequences' should not result in a major change in the thermodynamic
properties of the tag sequences of that set for example the GC content must
be maintained between 10%-40% with a variance from the average of 20%. This
method of inserting- bases could be used to obtain a family of sequences up to
40 bases long.
Given a particular family of sequences that can be used as a family of
tags (or tag complements), e.g., those of Table I or Table II , or the
combined sequences of these two tables, a skilled person will readily
recognize variant families that work equally as well.
Again taking the sequences of Table I for example, every T could be
converted to an A and vice versa and no significant change in the crosshybridization
properties would be expected to be observed. This would also
be true if every G were converted to a c.
Also, all of the sequences of a family could be taken to be constructed
in the 5'-3' direction, as is the convention, or all of the constructions of
sequences could be in the opposite direction (3'-5f).
There are additional modifications that can be carried out. For
example, C has not been used in the family of sequences. Substitution of C
in place of one or more G's of a particular sequence would yield a sequence
that is at least as low in homology with every other sequence of the family
as the particular sequence chosen to be modified was. It is thus possible to
substitute C in place of one or more G's in any of the sequences shown in
Table I. Analogously, substituting of C in place of one or more A's is
possible, or substituting C in place of one or more T's is possible.
It is preferred that the sequences of a given family are of the same,
or roughly the same length. Preferably, all the sequences of a family of
sequences of this invention have a.length that is within five bases of the
base-length of the average of the family. More preferably, all sequences are
within four bases of the average base-length. Even more preferably,- all or
almost all sequences are within three bases of the average base-length of the
family. Better still, all or almost all sequences have a length that is
within one of the base-length of the average of the family.
It is also possible for a person skilled in the art to derive sets of
sequences from the family of sequences that is the subject of this patent and
remove sequences that would be expected to have undesirable hybridization
properties.
Methods For Synthesis Of Oligonucleotide Families
Preferably oligonucleotide sequences of the invention are
synthesized directly by standard phosphoramidite synthesis approaches
and the like (Caruthers et al, Methods in Enzymology; 154, 287-313:
1987; Lipshutz et al, Nature Genet.; 21, 20-24: 1999; Fodor et al,
Science; 251, 763-773: 1991). Alternative chemistries involving non
natural bases such as peptide nucleic acids or modified nucleosides
that offer advantages in duplex stability may also be used (Hacia et
al; Nucleic Acids Res ;27: 4034-4039, 1999; Nguyen et al, Nucleic
Acids Res.,-27, 1492-1498: 1999; Weiler et al, Nucleic Acids Res.; 25,
2792-2799:1997). It is also possible to synthesize the oligonucleotide
sequences of this invention with alternate nucleotide backbones such as
phosphorothioate or phosphoroamidate nucleotides. Methods involving
synthesis through the addition of blocks of sequence in a step wise
manner may also be employed (Lyttle et al, Biotechniques, 19: 274-280
(1995). Synthesis may be carried out directly on the substrate to be
used as a solid phase support for the application or the
oligonucleotide can be cleaved from the support for use in solution or
coupling to a second support.
Solid Phase Supports
There are several different solid phase supports that can be used with
the invention. They include but are not limited to slides, plates, chips,
membranes, beads, microparticles and the like. The solid phase supports can
also vary in the materials that they are composed of including plastic,
glass, silicon, nylon, polystyrene, silica gel, latex and the like. The
surface of the support is coated with the complementary sequence of the same.
In preferred embodiments, the family of tag complement sequences are
derivatize"d to allow binding to a solid support. Many methods of
derivatizing a nucleic acid for binding to a solid support are known in the
art (Hermanson G., Bioconjugate Techniques; Acad. Press: 1996). The sequence
tag may be bound to a solid support through covalent or non-covalent bonds
(lannone et al, Cytometry; 39: 131-140, 2000; Matson et al, Anal. Biochem.;
224: 110-106, 1995; Proudnikov et al, Anal Biochem; 259: 34-41, 199B;
Zammatteo et al, Analytical Biochemistry; 280:143-150, 2000). The sequence
tag can be conveniently derivatized for binding to a solid support by
incorporating modified nucleic acids in the terminal 5' or 3' locations.
A variety of moieties useful for binding to a solid support (e.g.,
biotin, antibodies, and the like), and methods for attaching them to nucleic
acids, are known in the art. For example, an amine-modified nucleic acid
base (available from, eg., Glen Research) may be attached to a solid support
(for example, Covalink-NH, a polystyrene surface grafted with secondary amino
groups, available from Nunc) through a bifunctional crosslinker (e.g.,
bis(sulfosuccinimidyl-suberate), available from Pierce). Additional spacing
moieties can be added to reduce steric hindrance between the capture moiety
and the surface of the solid support.
Attaching Tags to Analytea for Sorting
A family of oligoucleotide tag sequences can be conjugated to a
population of analytes most preferably polynucleotide sequences in several
different ways including but not limited to direct chemical synthesis,
chemical coupling, ligation, amplification, and the like. Sequence tags that
have been synthesized with primer sequences can be used for enzymatic
extension of the primer on the target for example in PCR amplification.
Detection of Single Wuclsotide Polymorphisms Using Primer Extension
There are a number of areas of genetic analysis where families of non'
cross hybridizing sequences can be applied including disease dagnosis, single
nucleotide polymorphism analysis, genotyping, expression analysis and the
- 42 -
like. One such approach for genetic analysis referred to as the primer
extension method (also known as Genetic Bit Analysis (Nikiforov et al,
Nucleic Acids Res.; 22, 4167-4175: 1994; Head et al Nucleic Acids Res.; 25,
5065-5071: 1997)) is an extremely accurate method for identification of the
nucleotide located at a specific polymorphic site within genomic DNA. In
standard primer extension reactions, a portion of genomic DNA containing a
defined polymorphic site is amplified by PCR using primers that flank the
polymorphic site. In order to identify which nucleotide is present at the
polymorphic site, a third primer is .synthesized such that the polymorphic
position is located immediately 3' to the primer. A primer extension
reaction is set up containing the amplified DNA, the primer for extension, up
to 4 dideoxynucleoside triphosphates, each labelled with a different
fluorescent dye and a DNA polymerase such as the Klenow subunit of DNA
Polytnerase 1. The use of dideoxy nucleotides ensure that a single base is
added to the 3' end of the primer,, a site corresponding to the polymorphic
site. In this way the identity of the nucleotide present at a specific
polymorphic site can be determined by the identity of the fluorescent dyelabelled
nucleotide that is incorporated in each reaction. One major
drawback to this approach is its low throughput. Each primer extension
reaction is carried out independently in a separate tube.
Universal sequences can be used to enhance the throughput of primer
extension assay as follows. A region of genomic DNA containing multiple
polymorphic sites is amplified by PCR. Alternately, several genomic regions
containing one or more polymorphic sites each are amplified together in a
multiplexed PCR reaction. The primer extension reaction is carried out as
described above except that the primers used are chimeric, each containing a
unique universal tag at the 5' end and the sequence for. extension at the 3'
end. In this way, each gene'-specific sequence would be associated with a
specific universal sequence. The chimeric primers would be hybridized to the
amplified DNA and primer extension carried out as described above. This
would result in a mixed pool of extended primers, each with a specific
fluorescent dye characteristic of the incorporated nucleotide. Following the
primer extension reaction, the mixed extension reactions are hybridized to
an array containing probes that are reverse complements of the universal
sequences on the primers. This would segregate the products of a number of
primer extension reactions into discrete spots. The fluorescent dye present
at each spot would then identify the nucleotide incorporated at each specific
location.
Kits Using Families Of Tag Sequences
The families of non cross-hybridizing sequences may be provided in kits
for use in for example.genetic analysis. Such kits include at least one set
of non cross hybridizing sequences in solution or on a solid support.
Preferably the sequences are attached to microparticles and are provided with
buffers and reagents that are appropriate for the application. Reagents may
include enzymes, nucleotides, fluorescent labels and the like that would be
required for specific applications. Instructions for correct use of the kit
for a given application will be provided.
BXAMPLKS
EXAMPLE 1 - Demonstration of Non Cross Talk Behavior on Solid Array
One hundred oligonucleotide probes corresponding to a family of non-cross
talking oligonucleotides from Table I were synthesized by Integrated DNA
Technologies (IDT, Coralville IA) . These oligonucleotides incorporated a
aminolink group coupled to the 5' end of the oligo through a C18 ethylene
glycol spacer. These probes were used to prepare microarrays as follows. The
probes were resuspended at a concentration of SO uM in 150 mM NaP04, pH 8.5.
The probes were spotted onto the surface of a SuperAldehyde slide (Telechem
Int., Sunnyvale CA) using an SDDC-II microarray spotter (ESI, Toronto
Ontario, Canada). The spots formed were approximately 120 UM in diameter
with 200 uM centre-to-centre spacing. Each probe was spotted 8 times on each
microarray. Following spotting, the arrays were processed essentially as
described by the slide manufacturer. Briefly, the arrays were treated with
67 mM sodium borohydride in PBS/EtOH (3:1) for 5 minutes then washed with 4
changes of 0.1% SDS. The arrays were not boiled.
One hundred labelled oligonucleotide targets were also synthesized by
IDT. The sequence of these targets corresponded to the reverse complement
of the 100 probe sequences. The targets were labelled at the 5' end with
Cy3.
Each Cy3-labeled target oligonucleotide was hybridized separately to
two microarrays each of which contained all 100 oligonucleotide probes.
Hybridizations were carried out at 42°C for 2 hours in a 40 \il reaction and
contained 40 nM of the labelled target suspended in 10 mM TrisHCl, pH 8.3, 50
mM KCl, 0.1% Tween 20. These are low stringency hybridization conditions
designed to provide a rigorous test of the performance of the family of noncross
hybridizing sequences. -Hybridizations were carried out by depositing
the hybridization solution on a clean cover slip then carefully positioning
v
the microarray slide over the cover slip in order to avoid bubbles. The
slide was then inverted and transferred to a humid chamber for incubation.
Following hybridization, the cover slip was removed and the microarray was
washed in hybridization buffer for 15 minutes at room temperature. The slide
was then dried by brief centrifugation.
Hybridized microarrays were scanned using a ScanArray Lite (GSILumonics,
Billerica MA). The laser power and photomultiplier tube voltage
used for planning each hybridized microarray were optimized in order to
maximize the signal intensity from the spots representing the perfect match.
The results of a sample hybridization are shown in Figures 1A and IB.
Figure 1A shows the hybridization pattern seen when a microarray containing
all 100 probes was hybridized with the target complementary to probe 181234.
The 4 sets of paired spots correspond to the probe complementary to the
target. Figure Ib shows the pattern seen when a similar array was hybridized
with a mix of all 100 targets.
Similar results to those illustrated in Figure la were obtained for all
of the sequences tested, and the feasibility of the use of molecules
containing oligonucleotides containing SEQ ID N0s:l to 100 as a set of tags
(or tag complements) is thus established.
EXAMPLE 2 - Cross Talk Behavior of Sequence on Beads
A group of 100 of the sequences of Table I was tested for feasibility ,
for use as a family of minimally cross-hybridizing oligonucleotides. The 100
sequences selected are separately indicated in Table I along with the numbers
assigned to the sequences in the tests.
The tests were conducted using the Luminex LabMAP™ platform available
from Luminex Corporation, Austin, Texas, U.S.A. The one hundred sequences,
used as probes, were synthesized as oligonucleotides by Integrated DNA
Technologies (IDT, Coralville, Iowa, U.S.A.). Each probe included a C6
aminolink group coupled to the 5'-end of the oligonucleotide through a Ci2
ethylene glycol spacer. The Cs aminolink molecule is a six carbon spacer
containing an amine group that can be used for attaching the oligonucleotide
to a solid support. One hundred oligonucleotide targets (probe complements),
the sequence of each being the reverse complement of the 100 probe sequences,
were also synthesized by IDT. Each target was labelled at its 5'-end with
biotin. All oligonucleotides were purified using standard desalting
procedures, and were reconstituted to a concentration of approximately 200 \iK
in sterile, distilled water for u&e. Oligonucleotide concentrations were
determined spectrophotometrically using extinction coefficients provided by
the supplier.
Each probe was coupled by its amino linking group to a
carboxylated fluorescent microsphere of the LabMAP system according to
the Luminex100 protocol. The microsphere, or bead, for each probe
sequence has unique, or spectrally distinct, light absorption
characteristics which permits each probe to be distinguished from the
other probes. Stock bead pellets were dispersed by sonication and then
vortexing. For each bead population, approximately five million
microspheres (400 uL) were removed from the stock tube using barrier
tips and added to a 1.5 mL Eppendorf tube (USA Scientific). The
microspheres were then centrifuged, the supernatant was removed, and
beads were resuspended in 25 nL of 0.2 M MES (2-(N-morpholino)ethane
sulfonic acid) (Sigma), pH 4.5, followed by vortexing and sonication.
One nmol of each probe (in a 25 uL volume) was added to its
corresponding bead population. A volume of 2.5 UL of EDC cross-linker
(l-ethyl-3-(3-dimethylaminopropyl) carbodiimide hydrochloride (Pierce),
prepared immediately before use by adding 1.0 mL of sterile ddH2O to 10
mg of EDC powder, was added to each microsphere population. Bead mixes
were then incubated for 30 minutes at room temperature in the dark with
periodic vortexing. A second 2.5 uL aliquot of freshly prepared EDC
solution was then added followed by an additional 30 minute incubation
in the dark. Following the second EDC incubation, 1.0 mL of 0.02%
Tween-20 (BioShop) was added to each bead mix and vortexed. The
microspheres were centrifuged, the- supernatant was removed, and the
beads were resuspended in 1.0 mL of 0.1% sodium dodecyl sulfate
(Sigma). The beads were centrifuged again and the supernatant removed.
The coupled beads were resuspended in 100 \& of 0.1 M MES pH 4.5. Bead
concentrations were then determined by diluting each preparation 100-
fold in ddHjO and enumerating using a Neubauer BrightLine
Hemacytometer. Coupled beads were stored as individual populations at
2-8°'c protected from light.
The relative oligonucleotide probe density on each bead
population was assessed by Terminal Deoxynucleotidyl Transferase (TdT)
end-labelling with biotin-cidUTPs. TdT was used to label the 3'-ends of
single-stranded DNA with a labeled ddNTP. Briefly, 180 uL of the pool
of 100 bead populations (equivalent to about 4000 of each bead type) to
be used for hybridizations was oipetted into an Eppendorf tube and
centrifuged. The supernatant was removed, and the beads were washed in
ix TdT buffer. The beads were then incubated with a labelling reaction
mixture, which consisted of 5x TdT buffer, 25mM Cod2, and 1000 pmol of
biotin-16-ddUTP (all reagents were purchased from Roche). The total
reaction volume was brought up to 85.5 uL with sterile, distilled H2O,
and the samples were incubated in the dark for 1 hour at 37°C. A
second aliquot of enzyme was added, followed by a second 1 hour
incubation. Samples were run in duplicate, as was the negative
control, which contained all components except the TdT. In order to
remove unincorporated biotin-ddUTP, the beads were washed 3 times with
200 uL of hybridization buffer, and the beads were resuspended in 50 uL
of hybridization buffer following the final wash. The biotin label was
detected spectrophotometrically using SA-PE (streptavidin-phycoerythrin
conjugate). The streptavidin binds to biotin and the phycoerythrin is
spectrally distinct from the probe beads. The lOmg/mL stock of SA-PE
was diluted 100-fold in hybridization buffer, and 15 uL of the diluted
SA-PE was added directly to each reaction and incubated for 15 minutes
at 37°Celsius. The reactions were analyzed on the Luminex100 LabMAP.
Acquisition parameters were set to measure 100 events per bead using a
sample volume of 50 uL.
The results obtained are shown in Figure 2. As can be seen the
Mean Fluorescent Intensity (MFI) of the beads varies from 277.75 to
2291.08, a range of 8.25 -fold. Assuming that the labelling reactions
are complete for all of the oligonucleotides, this illustrates the
signal intensity that would be obtained for each type of bead at this
concentration if the target (i.e., labelled complement) was bound to
the probe sequence to the full extent possible.
The cross-hybridization of targets to probes was evaluated as
follows. 100 oligonucleotide probes linked to 100 different bead
populations, as described above, were combined to generate a master
bead mix, enabling multiplexed reactions to be carried out. The pool
of microsphere-immobilized probes was then hybridized individually with1
each biotinylated target. Thus, each target was examined individually
for its specific hybridization with its complementary bead-immobilized
sequence, as well as for its non-specific hybridization with the other
99 bead-immobilized universal sequences present in the reaction. For
e?.ch hybridization reaction, 25 uL bead mix (containing about 2500 of
each bead population in hybridization buffer) was added to each well of
a 96-well Thermowell PCR pla^-e and equilibrated at 37°C. Each target
was diluted to a final concentration cf 0.002 fmol/nL in hybridization
buffer/ and 25 jiL of 50 nL. Hybridization buffer consisted of 0.2 M
NaCl, 0.1 M Tris, 0.08% Triton X-100, pH 8.0 and hybridizations were
performed at 37 °C for 30 minutes. Each target was analyzed in
triplicate and six background samples (i.e. no target) were, included in
each plate. A SA-PE conjugate was used as a reporter, as described
above. The 10 mg/mL stock of SA-PE was diluted 100-fold in
hybridization buffer, and 15 nL of the diluted SA-PE was added directly
to each reaction, without removal of unbound target, and incubated for
15 minutes at 37°C. Finally, an additional 35 nL of hybridization
buffer was added to each well, resulting in a final volume of 100 \iL
per well prior to analysis on the Luminex"0 LabMAP. Acquisition
parameters were set to measure 100 events per bead using a sample
volume of 80 jiL.
The percent hybridization was calculated for any event in which
the NET MFI was at least 3 times the zero target background. In other
words, a calculation was made for any sample where (MFI,ampie-MFIzero target
A "positive" cross-talk event (i.e., significant mismatch or crosshybridization)
was defined as any event in which the net median fluorescent
intensity (MFI§an5i-MFIlero target background) generated by a mismatched hybrid was
greater than or equal to the arbitrarily set limit of 10% that of the
perfectly matched hybrid determined under identical conditions. As there are
100 probes and 100 targets, there are 100 x 100 = 10,0000 possible different
interactions possible of which 100 are the result of perfect hybridizations.
The remaining 9900 result from hybridization of a target with a mismatched
probe.
The results obtained are illustrated in Figure 3. The ability of
each target to be specifically recognized by its matching probe is
shown. Of the possible 9900 non-specific hybridization events that
could have occurred when the 100 targets were each exposed to the pool
of 100 probes, 6 events were observed. Of these 6 events, the highest
non-specific event generated a signal equivalent to 10.2 % of the
signal observed for the perfectly matched pair (i.e. specific
hybridization event).
Each of the 100 targets was thus examined individually for
specific hybridization wit-u its complement sequence as incorporated
onto a microsphere, as we" 1 as -for non-specific hybridization with the
complements of the other 99 target sequences. Representative
hybridization results for target 16 (complement of probe 16, Table I)
are shown in Figure 4. Probe 16 was found to hybridize only to its
perfectly-matched target. No cross-hybridization with any of the other
99 targets was observed.
The foregoing results demonstrate the possibility of
incorporating the 210 sequences of Table I, or any subset thereof, into
a multiplexed system with the expectation that most if not all
sequences can be distinguished from the others by hybridization. That
is, it is possible to distinguish each target from the other targets by
hybridization of the target with its precise complement and minimal
hybridization with complements of the other targets.
EXAMPLE 3 - Tag Sequences used in Sorting Polynucleotides
•The family of non cross hybridizing sequence tags or a subset thereof
can be attached to oligonucleotide probe sequences during synthesis and used
to generate amplified probe sequences, £n order to test the feasibility of
PCR amplification with non cross hybridizing sequence tags and subsequently
addressing each respective sequence to its appropriate location on twodimensional
or bead arrays, the following experiment was devised. A 24mer
tag sequence was connected in a 5'-3' specific manner to a p53 exon specific
sequence (20mer reverse primer). The connecting p53. sequence represented the
inverse complement of the nucleotide gene sequence. To facilitate the
subsequent generation of single stranded DNA post-amplification the tag-
Reverse primer was synthesized with a phosphate modification (P0«) on the 5'-
end. A second PCR primer was also generated for each desired exon, which
represented the Forward (5'-3') amplification primer. In this instance the
Forward primer was labeled with a 5'-biotin modification to allow detection
with Cy3-avidin or equivalent.
A practical example of the aforementioned description is as follows:
For exon 1 of the human p53 tumor suppressor gene sequence the following tag-
Reverse primer was generated:
222087 222063
5' -PO4 -GATTOTAAGATTTGUirAAAGrGTA-TCCAGGGAAGCCTGTCACCCTCGT-a'
Tag Sequence # 3 Exon 1 Reverse
The numbering above the Bxon-1 reverse primer represents the genomic
nucleotide positions of the indicated bases.
The corresponding Exon-1 F&rv;rrd primer sequence is as follows:
221B73 221896
5'-Biotin-TCATGGCGACTGTCCAGCTTTGTG-3'
In combination these primers will amplify a product of 214 bp plus a
24 bp tag extension yielding a total size of 238 bp.
Once amplified, the PCR product was purified using a QIAquick PCR
purification -kit and the resulting DNA was quantified. To generate
single stranded DNA, the DNA was subjected to X-exonuclease digestion
thereby resulting in the exposure of a single stranded sequence (antitag)
complementary to the tag-sequence covalently attached to the solid
phase array. The resulting product was heated to 95°C for 5 minutes
and then directly applied to the array at a concentration-of 10-50 nM.
Following hybridization and concurrent sorting, the tag-Exon l
sequences were visualized using Cy3-streptavidin. In addition to
direct visualization of the biotinylated product, the product itself
can now act as a substrate for further analysis of the amplified
region, such as SNP detection and haplotype determination.
A number of additional methods for the detection of single
nucleotide polymorphisms, including but not limited to, allele specific
polymerase chain reaction (ASPCR), allele specific primer extension
(ASPE) and oligonucleotide ligation assay (OLA) can be performed by
those skilled in the art in combination with the tag sequences
described herein.
DEFINITIONS
Non cross hybridization: Describes the absence of'hybridization
between two sequences that are not perfect complements of each other.
Cross Hybridisation: The hydrogen bonding of a single-stranded
DNA sequence that is partially but not entirely complementary to a
single-stranded substrate.
Homology: How closely related two or more separate strands of DNA
are to each other, based on their base sequences.
Analoguet A chemical which resembles a nucleotide base. A base
which does not normally appear in DNA but can substitute for the ones
which do, despite minor differences in structure.
Complementi The opposite or "mirror" image of a DNA sequence. A
complementary DNA sequence has an "A" for every "T" and a "C" for every
"G". Two complementary strands of single stranded DNA, for example a
tag sequence and it's complement, will join to form a double-stranded
molecule.
Complementary UNA (cDNA)i DNA that is synthesized from a
messenger RNA template; the single- stranded form is often used as a
probe in physical mapping.
Oligormclaotide: Refers to a short nucleotide polymer whereby, the
nucleotides may be natural nucleotide bases or analogues thereof.
Tag« Refers to an oligonucleotide that can be used for specifically
sorting analytes with at least one other oligonucleotide that when used
together do not cross hybridize.
Similar Homology: In the context of this invention, pairs of sequences
are compared with each other based on the amount of "homology" between the
sequences. By way of example, two sequences are said to have a 50% "maximum
homology" with each other if, when the two sequences are aligned side-by-side
with each other so to obtain the (absolute) maximum number of identically
paired bases, the number of identically paired bases is 50% of the total
number-of bases in one of the sequences. (If the sequences being compared
are of different lengths, then it would be of the total number of bases in
the shorter of the two sequences.) Examples of determining maximum homology
are as follows:
EXAMPLE 4 - Determining Maximum Homology
A-A-B-B-C-C
B-D-C-D-D-D (2 out of 4 paired bases are the same)
* *
A-A-B-B-C-C
B-D-C-D-D-D (2 out of 3 paired bases are the same)
In this case, the maximum number of identically paired bases is two and
there are two possible alignments yielding this maximum number. The total
number of possible pairings i.s six giving 33 1/3 % (2/6) homology. The
maximum amount of homology between the two sequences is thus 1/3.
BXAMPUE 5 - Determining Maximum Homology
A-A-B-B-C-A
A-A-r-L-C-D (3 out of 6 paired bases are the same)
- 51 -
n this alignment, the number of identically paired bases is three and
the total number of possibly paired bases is six, so the homology between the
two sequences is 3/6 (50%).
A-A-B-B-C-A
A-A-D-D-C-D (1 out of 1 paired bases are the same)
In this alignment, the number of identically paired bases is 1, so the
homology between the two sequences is 1/6 (16 2/3 %).
The maximum homology between these two sequences is thus 50%.
Block sequence: Refers to a symbolic representation of a sequence of
blocks. In its most general form a block sequence is a representative
sequence in which no particular value, mathematical variable, or other
designation is assigned to each block of the sequence.'
Incidence Matrix: As used herein is a. well-defined term in the field of
Discrete Mathematics. However, an incidence matrix cannot be defined without
first defining a "graph". In the method described herein a subset of general
graphs called simple graphs is used. Members of this subcategory are further
defined as follows.
A simple graph G is a pair (V, B) where V represents the set of
vertices of the simple graph and E is a set of un-oriented edges of the
simple graph. An edge is defined as a 2-component combination of members of
the set of vertices. In other words, in a simple graph Q there are some
pairs of vertices that are connected by an edge. In our application a graph
is based on nucleic acid sequences generated using sequence templates and
vertices represent DNA sequences and edges represent a relative property of
any pair of sequences.
The incidence matrix is a mathematical object that allows one to
describe any given graph. For the subset of simple graphs used herein, the
simple graph G=(V,E), and for a pre-selected and fixed ordering of vertices,
V={v1,v2, ... ,vn}, elements of the incidence matrix A(G) = [aij] are defined by
the following rules:
(1) a;-;=l for any pair of vertices {vi.v.,} that is a member of the
set of edges; and
(2) a1?=0 for any pair of vertices {vi,Vj}that is not a member of
the set of edges.
This is .an exact unequivocal definition of the incidence matrix. In effect,
one selects the indices: l,2,...n of the vertices and then forms an (n x n)
square matrix with elements aj.j=l if the vertices Vi and Vj are connected by an
edge and aj.j=0 if the vertices Vi and Vj are not connected by an edge.
To define the term "class property" as used herein, the term
"complete simple graph" or "clique" must first be defined. The complete
simple graph is required because all sequences that result from the method
described herein should collectively share the relative property of any pair
of sequences defining an edge of graph G, for example not violating the
threshold rule that is, do not have a "maximum simple homology" greater than
a predetermined amount, whatever pair of the sequences are chosen from the
final set. It is possible that additional "local" rules, based on known or
empirically determined behavior of particular nucleotides, or nucleotide
sequences, are applied to sequence pairs in addition to the basic threshold
rule.
In the language of a simple graph, G»(V, E), this means in the
final graph there should be no pair of vertices (no sequence pair) not
connected by an edge (because an edge .means that the sequences represented by
Vi and Vj do not violate the threshold rule) .
Because the incidence matrix of any simple graph can be generated
by the above definition of its elements, the consequence of defining a simple
complete graph is that the corresponding incidence matrix for a simple
complete graph will have all off-diagonal elements equal to 1 and all
diagonal elements equal to 0. This is because if one aligns a sequence with
itself, the threshold rule is of course violated, and all other sequences are
connected by an edge.
For any simple graph, there might be a complete subgraph. First,
the definition of a subgraph of a graph is as follows. The subgraph
Gs=(Vs,Es) of a simple graph G-(V,E) is a simple.graph that contains the
subsets of vertices Vs of the set V of vertices and inclusion of the set Vs
into the set V is immersion (a mathematical term). This means that one
generates a subgraph Gs=(Vs,Es) of a simple graph G in two steps. First
select some vertices Vs from G. Then select those edges Es from G that
connect the chosen vertices and do not select edges that connect selected
with non selected vertices.
We desire a subgraph of G that is a complete simple graph. By
using this property of the complete simple graph generated from the eimpls
graph G o£ all sequences generated by the template based algorithm, the
pairwise property of any pair of the sequences (vioj.ating/non-violating the
threshold rule) is converted into the property of all members of the set,
termed "the class property".
By selecting a subgraph of a simple graph G that is a complete
simple graph, this assures that, up to the tests involving the local rules
described herein, there are no pairs of sequences in the resulting set that
violate the threshold rule, also described above, independent of which pair
of sequences in the set are chosen. This feature is called the "desired class
property". '
The present -invention thus includes reducing the potential for
non cross-hybridization behavior by taking into account local
homologies of the sequences and appears to have greater rigor than
known approaches. For example, the method described herein involves
the sliding of one sequence relative to the other sequence in order to
form a sequence alignment that would accommodate insertions or
deletions. (Kane et al., Nucleic Acids Res.; 28, 4552-4557: 2000).
(1) Oligonucieotid.es having SEQ ID N0s:l to 100 were used in
experiments of Example 1.
(2) Oligonucleotides used in experiments of Example 2 are indicated
in this column by the numbers assigned to them in the
experiments.
All references referred to in this specification are incorpprated
herein by reference.
The scope of protection sought for the invention described herein
is defined by the appended claims. It will also be understood that any
elements recited above or in the claims, can be combined with the
elements of any claim. In particular, elements of a dependent claim
can be combined with any element of a claim from which it depends, or
with any other compatible element of the invention.
This application claims priority from United States Provisional
Patent Application Nos. 60/263,710 and 60/303,799, filed January 25,
2001 and July 10, 2001. Both of these documents are incorporated herein by reference.






We claim:
1. An oligonucleotide tag or tag complement for use in a multiplex assay, wherein the tag or tag complement is selected from the group of oligonucleotides consisting of:
GATTTGTATTGATTGAGATTAAAG (SEQ ID NO: 1), TGATTGTAGTATGTATTGATAAAG (SEQ ID NO: 2), GATTGTAAGATTTGATAAAGTGTA (SEQ ID NO: 3), GATTTGAAGATTATTGGTAATGTA (SEQ ID NO: 4), GATTGATTATTGTGATTTGAATTG (SEQ ID NO: 5), GATTTGATTGTAAAAGATTGTTGA (SEQ ID NO: 6), ATTGGTAAATTGGTAAATGAATTG (SEQ ID NO: 7), GTAAGTAATGAATGTAAAAGGATT (SEQ ID NO: 9), TGTAGATTTGTATGTATGTATGAT (SEQ ID NO: 13), and GATTAAAGTGATTGATGATTTGTA (SEQ ID NO: 15); wherein
including oligonucleotides complementary thereto;
wherein the number of false positives and false negatives are minimized by reducing cross reactivity with other nucleic acids in assay for analyzing presence of mutation or polymorphism at the loci of each nucleic acid and determining presence of suspected target contained in a biological mixture.
2. An oligonucleotide as claimed in any one of claims 1-3, wherein each molecule is linked to a solid phase support so as to be distinguishable from a mixture of said molecules by hybridization to its complement.
3. An oligonucleotide as claimed in claim 4, wherein each molecule is linked to a defined location on a solid phase support, the defined location for each said molecule being different that the defined location for different other said molecules.
4. An oligonucleotide as claimed in claim 4 or 5, wherein each said solid phase support is a microparticle and each said molecule is covalently linked to a different microparticle than each other different said molecule.
5. An oligonucleotide as claimed in claim 6, wherein each microparticle is spectrophotometricaily unique from each other microparticle having a different oligonucleotide attached thereto,
6. An oligonucleotide as claimed in any one of claims 1 to 7, wherein each said molecule comprises a tag complement.


Documents:

01209-delnp-2003-abstract.pdf

01209-delnp-2003-assignments.pdf

01209-delnp-2003-claims.pdf

01209-delnp-2003-correspondence-others.pdf

01209-delnp-2003-description (complete)-11-08-2008.pdf

01209-delnp-2003-description (complete).pdf

01209-delnp-2003-drawings.pdf

01209-delnp-2003-form-1.pdf

01209-delnp-2003-form-13.pdf

01209-delnp-2003-form-18.pdf

01209-delnp-2003-form-2.pdf

01209-delnp-2003-form-26.pdf

01209-delnp-2003-form-3.pdf

01209-delnp-2003-form-5.pdf

01209-delnp-2003-gpa.pdf

1209-DELNP-2003-Abstract-(11-08-2008).pdf

1209-DELNP-2003-Abstract-(29-06-2009).pdf

1209-DELNP-2003-Claims-(11-08-2008).pdf

1209-DELNP-2003-Claims-(13-03-2009).pdf

1209-delnp-2003-claims-(29-06-2009).pdf

1209-DELNP-2003-Correspondence-Others-(06-04-2009).pdf

1209-DELNP-2003-Correspondence-Others-(11-08-2008).pdf

1209-DELNP-2003-Correspondence-Others-(13-03-2009).pdf

1209-DELNP-2003-Correspondence-Others-(25-05-2009).pdf

1209-DELNP-2003-Correspondence-Others-(28-04-2009).pdf

1209-delnp-2003-correspondence-others-(29-06-2009).pdf

1209-DELNP-2003-Drawings-(11-08-2008).pdf

1209-DELNP-2003-Form-1-(11-08-2008).pdf

1209-delnp-2003-form-1-(29-06-2009).pdf

1209-DELNP-2003-Form-2-(11-08-2008).pdf

1209-delnp-2003-form-2-(29-06-2009).pdf

1209-DELNP-2003-Form-3-(11-08-2008).pdf

1209-DELNP-2003-GPA-(28-04-2009).pdf

1209-DELNP-2003-Others-Document-(28-04-2009).pdf

1209-DELNP-2003-Petition-137-(11-08-2008).pdf

1209-DELNP2003-Correspondence-Others-(19-03-2009).pdf

1209-DELNP2003-Form-2-(19-03-2009).pdf

1209-DELNP2003-GPA-(19-03-2009).pdf


Patent Number 240038
Indian Patent Application Number 01209/DELNP/2003
PG Journal Number 30/04/2010
Publication Date 30-Apr-2010
Grant Date 23-Apr-2010
Date of Filing 31-Jul-2003
Name of Patentee TM BIOSCIENCE CORPORATION
Applicant Address 439 UNIVERSITY AVENUE, SUTIE 1100, TORONTO, ONTARIO M5G 1Y8, CANADA.
Inventors:
# Inventor's Name Inventor's Address
1 PANCOSKA PETR. 901 HINMAN AVENUE #2C, EVANSTON, IL 60202, U.S.A.
2 JANOTA VIT OVENECKA 27, 170 00 PRAHA 7, CZECH REPUBLIC.
3 BENIGHT ALBERT S. 1630 VALLEY VIEW DRIVE, SCHAUMBURG, IL 60193, U.S.A.
4 BULLOCK RICHARD S. 3500 NORTH LAKE SHORE DRIVE, CHICAGO, IL 60657, U.S.A.
5 RICCELLI PETER V. 16830 RICHARDS DRIVE, TINLEY PARK, IL 60477, U.S.A.
6 KOBLER DANIEL 33 WOOD STREET, APARTMENT 1102, TORONTO, ONTARIO M4Y 2P8, CANADA.
7 FIELDHOUSE DANIEL CHAPLIN COURT, BOLTON, ONTARIO L7E 5Y1, CANADA.
PCT International Classification Number C12Q 1/68
PCT International Application Number PCT/CA02/00087
PCT International Filing date 2002-01-25
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 60/263,710 2001-01-25 U.S.A.