Title of Invention

A POLYNUCLEOTIDE HAVING THE NUCLEIC ACID SEQUENCE ACCORDING TO SEQ ID NO:1, 2 OR 3 OR A SEQUENCE FROM THE NUCLEOTIDE NUMBERS

Abstract A polynucleotide having the nucleic acid sequence according to SEQ ID NO:1, 2 or 3 or a sequence from the nucleotide numbers 167 to 1654, 1447 to 4458, 5589 to 8168, 4403 to 4984, 4924 to 5214, 5426 to 5671, 8170 to 8790, 5195 to 5409, 7730 to 7821, 5334 to 5409 or 7730 to 7821, or a continuous sequence on a polynucleotide from the nucleotide numbers 5195 to 5409 and 7730 to 7821, or 5334 to 5409 and 7730 to 7821, each referring to SEQ ID NO:1.
Full Text The present invention refers to a polynucleotide comprising the nucleic acid sequence as
depicted in SEQ ID NO:1, 2 or 3 or the fragment or derivative thereof, or a polynucleotide
hybridizing with the nucleic acid sequence as depicted in SEQ ID NO:1, 2 or 3. The present
invention further refers to polypeptides encoded by the nucleic acid sequence or the fragment or
derivative thereof as depicted in SEQ ID NOT, 2 or 3. The polynucleotides and polypeptides
may be used as medicaments, vaccines or diagnostic substances, preferably for the treatment,
prevention or diagnostic of HIV infections.
Regarding the extent of the global distribution of the Human Immunodeficiency Virus (HIV)
pandemia with an estimated number of more than 40 million infected people worldwide by the
end of this century and more than 90 percent thereof living in developing countries, the
development of an HIV vaccine is considered to be one of the major challenges of modern
industrialized societies. However, so far the development of a successful HIV vaccine is still
limited by the complicated biology of the virus and its complex interaction with the host's
immune system. Those few candidate vaccines, that have been tested to date in developing
countries in clinical phase 3 trials, were majoriy based on the HIV type 1 external glycoprotein
gp120 or gp160. However, the outcome of these studies was somewhat disappointing in that the
vaccines not only failed to induce broadly cross neutralizing antibody and T cell responses, but
could not even prevent breakthrough infections that have been reported for some of the
vaccinated individuals. One of the reasons for that is certainly the extensive sequence variation
between the used antigens that were derived from lab adapted virus strains and the genetically
divergent viruses circulating throughout the testing areas such as Thailand.
Phylogenetic analysis of globally circulating HIV strains have identified a major group (M) of 10
different sequence subtypes (A-J) (Kostrikis et al. 1995; Leitner and Albert, 1995; Gaywee et at.
1996; World Health Organisation Network for HIV Isolation and Characterization, 1994)
exhibiting sequence variations in the envelope protein up to 24 % in addition to group O viruses,

that differ from group M viruses by more than 40% in some reading frames (Loussert Ajaka el
al. 1995; Myers et al. 1996; Sharp et al. 1995; Sharp et al. 1999). HIV evolves by the rapid
accumulation of mutation and intersubtype recombination. Different subtypes cocirculating in
the population of a geographic region represent the molecular basis for the generation and
distribution of interclade mosaic viruses. Although the global HIV-1 variants have been studied
intensively by means of serology and heteroduplex DNA analysis, most phylogenetic studies are
based on envelope sequences, because many of the prevalent subtypes and a variety of
recombinant forms lack fully sequenced genomes.
Non-subtype B viruses cause the vast majority of new HIV-1 infections worldwide. Among
those, clade C HIV-1 strains play a leading role both regarding the total number of infected
people as well as the high incidence of new infections especially in South America and Asia. For
that reason, characterization of clade C viruses is one of the top priorities both for diagnostic,
preventive and therapeutic purposes.
With the exception of Thailand, limited information has been available until recently regarding
the distribution and molecular characteristics of HIV-1 strains circulating throughout Asia. WHO
estimates that South and Southeast Asia have the most rapid rate of HIV spread and will soon
become the world's largest HIV epidemic region. China has very similar social and economic
conditions and direct ethnic and economic connections to these regions. Since early 1995, a rapid
increase of HIV infection was clearly seen in many provinces of China Compared with
accumulated 1774 cases of HIV and AIDS detected from 1985 to 1994, 1421 cases were
detected in 1995 and more than 4000 cases in 1997 alone. The WHO estimated more than
400.000 HIV infections in China by the end of 1997, with estimated 6400 cumulative deaths and
4000 people dying of AIDS in 1997 alone. In the recent national HIV molecular epidemiology
survey, it was found that the subtype prototype B and B'-subtype Thai strains in Yunnan, a
southwestern province of China bordering the drug triangle of Myanmar, Laos and Thailand
(Graf et al. 1998) were spread to central and eastern China by drug users, contaminated blood
and plasma collection services. The second epidemic was imported to the same area most
probable by Indian IDUs carrying subtype C strains in the early 1990s (Luo et al. 1995; Shao et
al. 1999). Within a few years, subtype C viruses spread rapidly in South, Central and even in
Northwest China by drug trafficking and caused a wide spread epidemic in China. According to
a recent Chinese nationwide HIV molecular epidemiology survey, almost all the individuals
infected with subtype C are IDUs and covered about 40% of HIV infected IDUs in China,

suggesting subtype C virus to be one of the major HIV-1 subtypes prevalent among IDUs in
China (Shao et al 1998; Shao et al 1994).
This suggests that the HIV epidemic among IDUs in China extended from a single predominant
subtype (B) within a few years to at least two predominant subtypes, B-Thai and C, increasing
the possibility of the intersubtype recombination. According to our present knowledge on the
variability and the antigenicity of different virus strains, diagnostic tools, therapeutic agents and
vaccines should be adapted to local virus strains. However, the number of molecular reagents for
non-subtype B viruses is still extremely limited. Currently, there are only few non-recombinant
molecular clones and few mosaic genomes available for viruses other than B or C Regarding
clade C HIV-1 viruses, only non-recombinant representatives and 4 A/C recombinants are
published so far, all of them originating from Africa, South America or India (Luo et al 1995;
Gao et al. 1998; Lole et al. 1999). Furthermore, all of the previous data on subtype C viruses in
China was limited to the genetic subtyping of the env gene (Luo et al. 1995, Yu et al 1997;
Salminen etf a/. 1995).
Several clinical trials against HIV infections have been performed with vaccines so far. The
disappointing results that were observed in clinical trials include repeatedly reported
breakthrough infections in the vaccinized people This outcome has been attributed to major
sequence variations between the administered envelope proteins and the infectious input virus,
which in fact was primarily due to an insufficient characterization of the viruspopulation
circulating in a distinct geographic region. This resulted in the generation of humoral and - to a
lesser extent - cell mediated immune responses towards viral antigens, that were not relevant for
the viruses circulating throughout the population in the test field. Moreover, low affinity binding,
envelope specific antibodies have been reported, not only to lack neutralizing capacity but even
contribute to an enhancement of infection via complement- or Fc-receptors. Furthermore, the
selected antigens and delivery systems turned out to be extremely weak inducers of the cell
mediated immune response.
In view of a lack of precise knowledge on cross-clade protective immune responses and
regarding the complex situation in developing countries, where multiple subtypes of HIV-1 are
known to cocirculate, vaccine preparations should include mixtures of representative antigens.
Thus, there is a need for isolation and characterization of clade C viruses, especially for cloning
the coding region.

The problem of the invention is solved by the subject matter of the claims.
The present invention is further illustrated by the figures.
Figure 1 shows an illustration of the phylogenetic relationship of the env gene C2V3 coding
region from clone 97cn54 with the representatives of the major HIV-1 (group M) subtypes, cn-
con-c represents the env consensus sequence of HIV-1 subtype C strains prevalent in China.
Phylogenetic tree was constructed using the neighbour joining method. Values at the nodes
indicate the percent bootstraps in which the cluster to the right was supported. Bootstraps of 70%
and higher only are shown. Brackets on the right represent the major subtype sequences of HIV-
1 group M.
Figure 2 shows an illustration of the Recombinant Identification Program analysis (RIP, version
1.3) of the complete gagpol coding region of 97cn54 (window size: 200, threshold for statistical
significance; 90%, Gap handling: STRIP). Positions of the gag and pol open reading frames are
indicated by arrows on top of the diagram. Rip analysis was based on background alignments
using reference sequences derived from selected virus strains representing the most relevant
HIV-1 subtypes. The standard representatives are marked by different colors as indicated. The x
axis indicates the nucleotide positions along the alignment. The y axis indicates the similarity of
the 97cn54 with the listed reference subtypes.
Figure 3 shows an illustration of the phylogenetic relationship of different regions within the
97cn54 derived gagpol reading frames with standard representatives of the major HTV-1 (group
M) subtypes. Phylogenetic trees were constructed using the neighbour joining method based on
the following sequence stretches: (A) nucleotides 1-478, (B) 479-620, (C) 621-1290, (D) 1291-
1830, (E) 1831-2220, (F) 2221-2520 and (G) 2521-2971 Given positions refer to the first
nucleotide of the gag open reading frame. Grey areas highlight clustering of the analyzed
sequences either with clade C- (A, C, E, G) or B- (B, D, F) derived reference strains. Values at
the nodes indicate the percent bootstraps in which the cluster to the right was supported.
Bootstraps of 70% and higher only are shown.
Figure 4 shows an illustration of the Recombinant Identification Program analysis (RIP, version
1.3) of different regions of 97cn54 (window size: 200, threshold for statistical significance: 90%,

Gap handling: STRIP). Analysis included (A) a sequence stretch of 1500 bp from the start codon
of the vif gene to the 5' end of env, including vif, vpr, first exon of tat and rev, vpu and first 200
bp of env gene and (B) an about 700 bp fragment overlapping 300 bp from the 3' end of env
encompassing the complete nef gene and parts of the 3' LTR. Positions of the start codons vpr,
tat, vpu, env, nef as well as the 5' end of the 3'-LTR are indicated by arrows on top of the
diagrams, respectively. Rip analysis was based on background alignments using sequences
derived from selected virus strains representing the most relevant HIV-1 subtypes. The indicated
standard representatives are marked by different colors. The x axis indicates the nucleotide
positions along the alignment. The y axis indicates the similarity of the 97cn54 with the listed
reference subtypes. (C) and (D) Rip analysis of sequences from two independent clade C-isolates
(xj24 and xj158) from China overlapping the vpr and vpu genes including the first exon of tat.
Figure 5 shows a phylogenetic tree analysis. Phylogenetic trees were constructed using the
neighbour joining method based (A) on a 380 bp fragment overlapping the 3' 150 bp of the vpr
gene to the end of the vpu reading frame, (B) on the first 290 bp of the nef coding region and (C)
on the 3' 320 bp of the nef gene. Values at the nodes indicate the percent bootstraps in which the
cluster to the right was supported. Bootstraps of 70% and higher only are shown. Brackets on the
right represent the major subtype sequences of HIV-1 group M
Figure 6 is an illustration of the schematic representation of the mosaic genome organization of
97cn54.
Figure 7 is an illustration of the comparison between known and experimentally proven
prototype B (H1V-1I.AI) derived CTL epitopes and the corresponding amino acid sequences in the
gag, pol and env polypeptides of the clade C strain 97cn54. Functional domains in GAG (p17
matrix, p24 capsid, p15 nucleocapsid and linker protein), POL (PR protease, RT reverse
transcriptase, IN integrase) and ENV (gp120 external glycoprotein, gp41 transmembrane
protein) are indicated. Numbers underneath the open reading frames indicate amino acid position
relative to the aminotermini of the polypeptides, respectively. Haplotype restrictions of the
known HIV-lLA1 derived CTL epitopes are indicated at the left and right margin respectively.
Green bars represent sequence identity between the known epitope and the corresponding clade
C sequence, blue bars indicate 2 or less conservative mismatches. Red bars represent clade C
derived sequence stretches with more than 2 conservative mismatches or any nonconservative
substitution as compared to the corresponding LAI derived epitope.

Figure 8 shows the full length coding nucleotide sequence of clade C HIV-1 97cn54 (SEQ ID
NO: 1) with the corresponding amino acids in one letter code. All three reading frames are given.
The asterisks present stop codons.
Fig. 9 shows in an illustration the result of the activities of cytotoxic T cells in mouse BALB/c
spleen cells after intramuscular immunization with the respective DNA plasmids. Lymphoid
cells obtained 3 weeks after a primary immunization from 5 mice each per group were co-
cultured with syngenic P815 mastocytoma cells (irradiated with 20,000 rad) loaded with a gag
polypeptide having the amino acid sequence AMQMLKETI. Controls included spleen cells from
non-immunized mice which were stimulated with peptide loaded P815 cells. Cytotoxic effector
cell populations were harvested after 5 days of culture in vitro. The cytotoxic responses were
read against A20 cells loaded with the above mentioned nonameric peptide or against unloaded
A20 cells in a C release standard assay. The shown data represent the mean value from
approaches performed three times each The determined standard deviations were each lower
than 15% measured with regard to the mean value.
The term "epitope" or "antigenic determinant" as used herein refers to an immonulogical
determinant group of an antigen which is specifically recognized by an antibody. An epitope
may comprise amino acids in a spatial or discontinuous confirmation comprising at least 3,
preferably at least 5 amino acids An epitope may also comprise a single segment of a
polypeptide chain comprising a continuous amino acid sequence
The term "polynucleotide" as used herein refers to a single-stranded or double-stranded
heteropolymer of nucleotide units of any length, either of ribonucleotides or
deoxyribonucleotides. The term also includes modified nucleotides.
The term "derivative" as used herein refers to a nucleic acid also coding the one or more
polypeptide(s) which is or are coded by another nucleic acid sequence although its nucleic acid
sequence differs from the other nucleic acid sequence. In this sense the term "derivative" refers
also to equivalents of the other nucleic acid sequence which exists because of the degeneration of
the genetic code. Thus, the term derivative includes e.g. nucleic acids coding the same
polypeptides as the nucleic acids according to SEQ ID NO: 1, 2 or 3 but having another nucleic
acid sequence. Furthermore, the term includes nucleic acid fragments coding the same

polypeptide as the nucleic acid fragments of the nucleic acid sequence according to SEQ ID NO:
1,2 or 3.
The term "polypeptide" as used herein refers to a chain of at least two amino acid residues
connected by peptide linkages. The term comprises, therefore, any amino acid chains, e.g.
oligopeptides and proteins. The term also refers to such amino acid chains wherein one or more
amino acid(s) is(are) modified, e.g. by acetylation, glycosylation or phosphorylation.
The term "continuous sequence" or "fragment" as used herein refers to a linear nucleotide or
amino acid stretch derived from a reference sequence, eg the sequences of the present invention
set forth in the sequence listing.
The term "selective hybridization" or "selectively hybridizable" as used herein refers to
hybridization conditions wherein two polynucleotides form duplex nucleotide molecules under
stringent hybridization conditions. Those conditions are known in the state of the art and are set
forth e.g. in Sambrook et al.. Molecular Cloning, Cold Spring Harbour Laboratory (1989), ISBN
0-87969-309-6. Examples for stringent hybridization conditions are. (1) hybridization in 4 x SSC
at 65° C or (2) hybridization in 50% formamide in 4 X SSC at 42° C, both followed by several
washing steps in 0,1 x SSC at 65°C for I hour
The term "viral vector" or "bacterial vector" as used herein refers to genetically modified viruses
or bacteria useful for the introduction of the DNA sequences according to SEQ ID NO: 1, 2 or 3
or derivatives, fragments, sequences thereof coding for epitopes or epitope strings into different
cells, preferably into antigen presenting cells, eg dendritic cells. In addition, a bacterial vector
may be suitable to directly express a polypeptide encoded from SEQ ID NO: I, 2 or 3 or derived
epitopes or epitope strings therefrom.
One aspect of the present invention refers to a nucleotide sequence as depicted in SEQ ID NO 1,
SEQ ID NO.2 or SEQ ID NO.3. In order to gather necessary information on representative and
virtually full-length viral genomes, a molecular epidemiology study was first conducted among
more than hundred HIV-1 subtype C seropositive intravenous drug users (IDUs) from China.
Genotyping based on the constant region 2 and variable region 3 (C2V3) within the viral
envelope glycoprotein gene revealed highest homology of the most prevalent virus strains
circulating throughout China to subtype C sequences of Indian origin. Based on these results a

virtually full length genome representing the most prevalent class of clade C strains circulating
throughout China was amplified and subcloned from peripheral blood mononuclear cells
(PBMCs) of a selected, HIV infected IDU. Sequence analysis identified a mosaic structure
suggesting extensive intersubtype recombination events between genomes of the prevalent clade
C and (B')-subtype Thai virus strains of that geographic region. RIP (Recombinant Identification
Program) analysis and phylogenetic bootstrapping suggested altogether ten break points (i) in the
gagpol coding region, (ii) in vpr and at the 3' end of the vpu gene as well as (iii) in the nef open
reading frame. Thai (B')-sequences therefore include (i) several insertions in the gagpol coding
region (nucleotides 478-620, 1290-1830, 2221-2520, referring each to the first nucleotide within
the start codon of the gag and gagpol reading frame, respectively), (ii) 3'-vpr, complete vpu, the
first exons of tat and rev (approx. 1000 nucleotides starting from nucleotide 138 referring to the
start codon of the Vpr reading frame) as well as (iii) the 5' half of the nef gene (nucleotides 1-
300). The remainder of the parts within the sequence comprising 9078 nucleotides (SEQ ID
NO: I, table 3) show highest homologies to the known subtype C isolates Breakpoints located in
the vpr/vpu coding region as well as in the nef gene of 97cn54 were found at similar positions of
many subtype C strains isolated from IDUs living in different areas of China suggesting a
common ancestor for the C/B' recombinant strains. More than 50 % of well-defined subtype B-
derived CTL epitopes within Gag and Pol and 10 % of the known epitopes in Env were found to
exactly match sequences within in this clade C/B' chimeric reference strain. These results may
substantially facilitate vaccine-related efforts in China by providing highly relevant templates for
vaccine design and developing reagents for the most appropriate immunological/virologicai
readouts.
The use of the described HIV-1 sequence of the present invention representing the most
prevalent C type virus strain of China as a basis and source is of advantage for the development
of preventive or therapeutic vaccines. Necessary consequences for the development of a
successful HIV candidate vaccine is (i) a detailed knowledge of the respective epidemiological
situation and (ii) the availability of a cloned coding sequence representing the most prevalent
virus strain within a geographic region or distinct population. Such sequences represent the basis
(i) for the rational design of preventive and therapeutic applicable HIV candidate vaccines, (ii)
for the development of specific therapeutic medicaments e.g. therapeutic effective decoy
oligonucleotides and proteins, antisense constructs, ribozyme and transdominat negative
effective mutants, (iii) for the development of lentiviral vectors for gene therapy and (iv) for the
production of reagents which may be utilized for diagnostics and monitoring of HIV infections

and for immunological/viral monitoring of the vaccination process.
This is especially true for candidate vaccines that are based on the HIV envelope proteins, which
were shown to be most variable among all HIV proteins. Besides that, a successful vaccine will
have to induce most probably both arms of the immune system: neutralizing antibodies directed
ideally to conformational epitopes in the envelope protein as well as cell mediated immune
responses (CD4 positive T-helper cells, CD8 positive cytolytic T-cells, Thl type cytokines, ß-
chemokines) generated against epitopes of different viral proteins. The conformational epitope
according to the present invention consists of at least 3 amino acids involved in the antibody
binding and preferably of 5 or more amino acids. Conformational epitopes may also consist of
several segments either of a single protein or - in case of oligomer complexes e.g. of the trimeric
glycoprotein envelope complex - of several segments of different subunits. A linear epitope
according to the present invention normally varies in length comprising from at least 8 amino
acids to about 15 amino acids or longer, preferably comprising 9 to 11 amino acids, in particular
in case of MHC class I restricted CTL epitopes.
Thus, the present invention further relates to polypeptides encoded by the nucleic acid sequence
or fragment or derivative of the nucleic acid sequence according to SEQ ID NO:1, 2 or 3 The
present invention further relates to polypeptides comprising a continuous sequence of at least 8
amino acids encoded by the nucleic acid sequence or fragments or derivatives of the nucleic acid
sequence according to SEQ ID NO: I, 2 or 3. Preferably the polypeptide of the present invention
comprises an antigenic determinant causing naturally an immune reaction in infected subjects.
More preferred are polypeptides comprising an amino acid sequence encoded from the nucleic
acid sequence according to SEQ ID NO:2 or 3 or the fragment or derivative thereof. Most
preferred are epitopes comprising a continuous region of 9 to 11 amino acids which are identical
to the polypeptides encoded by SEQ ID NOT and a HIV-II.AI reference isolate, or which consist
of 2 or less conserved amino acid substitutions within the sequence comprising 9 to II amino
acids. Examples for such epitopes are given in example 11. The polypeptides of the present
invention may be used e.g. as vaccines and therapeutic substances or for diagnostics.
A further aspect of the present invention relates to a polynucleotide according to SEQ ID NO:1,
2 or 3. The present invention further relates to a polynucleotide fragment of the nucleotide
sequence according to SEQ ID NOT, 2 or 3 or to a polynucleotide comprising at least one
continuous sequence of nucleotides capable of selectively hybridizing to the nucleotide sequence

as depicted in SEQ ID NO.1, 2 or 3. Further, the present invention relates to derivatives of the
polynucleotides or polynucleotide fragments of the present invention. Preferably the
polynucleotide or the polynucleotide fragment comprises a continuous sequence of at least 9
nucleotides, preferably at least 15 nucleotides, more preferably at least 27 nucleotides, or longer.
The polynucleotide or the polynucleotide fragment may also comprise the coding region of the
single HIV genes, e.g. gag, pol, env. Examples are set forth in SEQ ID NO.2 and SEQ ID NO :3.
Another aspect of the present invention relates to a polynucleotide comprising at least two
polynucleotide fragments of the present invention wherein the sequences of the polynucleotide
fragments can overlap or can be separated by a nucleotide sequence spacer. The sequences of the
polynucleotide fragments may be identical or different. The polynucleotides or polynucleotide
fragments of the present invention can be used as vaccines or therapeutic substances or for
diagnostics.
The cloned clade C HIV-1 97cn54 coding sequence and derivatives thereof according to SEQ ID
NO: 1 can be used as the basis for the following applications:
Development of clade-C specific HIV-1 vaccines for therapeutic and preventive purposes. These
clade-specific vaccines can be used worldwide in all geographic regions, where clade C virus
strains play a major role in the HIV epidemic such as e.g. in Latin America, in Africa as well as
in Asia. More specifically, HIV vaccines to be tested in and developed for Southeast Asia and
China should be based on the described 97cn54 coding sequence in order to induce subtype
specific humoral and cell mediated immune responses Furthermore, such clade-C specific HIV-
1 vaccines can be used as a component in a cocktail vaccine considering either all or a defined
selection of the relevant worldwide HIV subtypes.
The antigens or coding sequences to be delivered to the immune system include (i) short
continuous stretches from at least 3 to about 5 amino acids or longer stretches derived from one
of the open reading frames depicted in table 3, (ii) stretches of preferably 9 to 11 amino acids,
(iii) combinations of these stretches delivered either separately or as polypeptide strings (epitope
strings) wherein the epitope strings and their amino acid sequences, respectively, either overlap
or may be separated by amino acid or other spacer, and most preferably complete proteins or the
corresponding coding sequences or variations thereof that may also include extended deletions in
order to induce proper humoral and cell mediated immune responses in the vaccinees. Therefore,
another object of the present invention relates to polypeptides which are encoded by the

nucleotide sequence or fragments thereof depicted in SEQ ID NOl, SEQ ID NO:2 and SEQ ID
NO:3. Preferably the polypeptide comprises a continuous sequence of at least 8 amino acids,
preferably at least 9 to 11 amino acids, more preferably at least 15 amino acids or longer
sequences or discontinuous epitopes preferably composed of at least three amino acids of a
single polypeptide chain or in case of oligomer protein complexes of different polypeptide
chains. Vaccine constructs on the basis of the 91cn54 coding sequence include all antigenic
forms known in the state of the art and include all known delivery systems.
Short epitopes encoded by fragments of the nucleic acid sequences according to SEQ ID NO: 1 to
3 and comprising at each case 3 to 5 amino acids, preferably 9 to 11 or more amino acids, can
preferably produced synthetically. Such peptides consist of either a B cell epitope, a MHC class
II restricted T helper epitope, a MHC class I restricted cytotoxic T cell epitope or a combination
of the mentioned variants. Individual epitopes may overlap or are separated by a spacer,
preferentially consisting of glycine and/or serine moieties Branched peptides may according to
the state of the art either be generated during the synthesis or by means of the known and
commercially available homo and hetero bifiinctional chemical crosslinkers after the synthesis
and purification of the respective peptides. Alternatively, per se little immunogenic peptides may
conjugated to selected carrier proteins e.g. ovalbumin by crosslinking, inserted genetically into
carrier proteins or fused to their N and C termini, respectively. Preferably, such carrier proteins
are able to form particular structures in which B cell epitopes are lying preferably on the surface
of the particular carrier (i) during expression in suitable cell culture systems (see below) or (ii)
after suitable back folding of the purified denatured protein. Numerous examples of polypeptides
inclining to the formation of particular structures are known in the meantime e.g. the Hepatitis B
virus (HBV) core antigen (HBcAg), the HBV surface protein (HBsAg), the HIV group specific
antigen, the polyomavirus VP1 protein, the papillomavirus LI protein or the yeast TyA protein.
Due to the fact that the majority of the so far described particle forming proteins are derived
from the capsid or structural proteins of different viruses they are also named virus like particles;
see special edition Vaccine (1999), Vol. 18, Advances in, Protein and Nucleic Acid Vaccine
Strategies, edited by Pof. P.T.P. Kauyama.
Epitope strings and polypeptides encoded by fragments of the nucleic acid sequences according
to SEQ ID NO:1 to 3 having a length of more than 30, preferably more than 50 amino acids, and
polypeptides having a tendency to form particular structures (VLP) can be produced and purified
in prokaryotes by means known in the state of the art. The plasmids include accordingly a

bacterial origin of replication such as ColEl, generally a selection marker such as a resistance
against kanamycin or ampicillin, a constitutive active or inducible transcription control unit such
as the LacZ or Tac promotor, and translation start and stop signals. For a simplified expression
and affinity purification optionally separatable fusion parts and purification means such as
glutathion-S-transferase or oligohistidin tags may be used.
The DNA or RNA sequences used (i) for the production of said epitope strings, complete
proteins or virus like structures in eukaryotic cell cultures such as yeast cells, fungi, insect cells
or mammalian cells or (ii) for the direct delivery of DNA for immunization purposes may rely on
a codon usage that is utilized by the virus itself. Alternatively, where ever technically feasible the
codon usage may be adapted to that of most or second most frequently used codons in genes
being highly expressed in the respective production system. Examples for the optimization of
the codon usage in a polygene optimized for security aspects including the genes Gag, Pol and
Nef as well as in the envelope gene are set forth in SEQ ID NO:2 and 3. The SEQ ID NO: 2 and
3 are more specified in example 15.
The establishing of cell lines to produce epitope strings, polypeptides or virus like particles in the
mentioned cell culture systems may be based on vectors according to the state of the art. Said
vectors again may include a bacterial origin of replication, a positive or negative selection
marker and primarily the respective control regions for the normal transcription and translation
of the foreign protein. The subsequent described components of the DNA vaccine constructs
represent exemplary also those modules which are found in vectors to express epitope strings,
polypeptides or complete proteins in different mammalian cell cultures.
The simplest form of the immunization is the direct application of a pure DNA vaccine. Said
vaccine includes essentially 5' of the coding region a transcription control region also called
promotor/enhancer region optionally followed by a functional intron to enhance the gene
expression, (ii) a Kozak consensus sequence including a translation start codon as well as a
translation termination codon followed by a polyadenylation signal at the 3' end of the foreign
gene. Preferentially, the promotor/enhancer region may support the constitutive expression of the
desired gene product and is derived e.g. from the transcription control region of a
cytomegalovirus immediate early gene (CMV-IE) or the Rous sarcoma virus long terminal
repeat (RSV-LTR). Alternatively, an inducible form of a transcription control region may be
used such as a Tet on let off promotor regulating the transcription e.g. by the application of

tetracycline or respective analoga. Furthermore, the use of cell type specific regulated
transcription control regions is advantageous e.g. the upstream of the muscle creatin kinase gene
(MCK gene; muscle specific expression) or of the CD4 receptor gene or the MHC class II gene
(preferential expression in antigen presenting cells) positioned promotor/enhancer regions In
some cases also chimeric combinations from (i) cell type specific promotors and (ii) viral
enhancer regions are used to combine the advantages of a tissue specific expression with those of
a strong transcription activity of viral enhancers. The enhancement of the gene expression by
integration of a functional intron positioned normally 5' of an open reading frame is due to an
enhanced export rate from the nucleus of spliced transcripts in comparison to unspliced and is
obtained by the insertion of an intron positioned in the B-globin gene.
A preferred DNA vaccine based on SEQ ID NO: 1, 2 or 3 in addition includes a replicon derived
from alpha viruses such as Semliki-Forest (SFV) or Venezuela-Encephalitis virus (VEE). Here,
the aforementioned nuclear transcription control region and the optionally considered intron
follow first the coding region for the VEE or SFV non structural proteins (NS). Only 3' follows
the real foreign gene whose cytoplasmic transcription is regulated by a NS sensitive promotor.
Correspondingly, a long transcript over several open reading frames is generated starting from
the nuclear transcription control unit and is then translocated into the cytoplasm. The NS proteins
synthesized here then activate the cytoplasmic transcription of the foreign genes by binding to
the respective control region. This amplification effect normally leads to an abundant RNA
synthesis and hence to a high synthesis rate of the foreign protein. The latter normally allows a
significant reduction of the plasmid amount to be administered with at least comparable
immunogenicity in direct comparison with conventional plasmids which give up the described
effect of cytoplasmic RNA amplification.
The afore described peptides, proteins, virus like particles and DNA constructs can be
administered by intramuscular, subcutaneous, intradermal, intravenous injection, whereby the
respective prior art is used for the administration of the proteinecous antigens. Either
conventional syringes with injection needles or means without needles normally introducing the
DNA by air pressure directly into the desired tissue may be used for the DNA immunization.
This comprises in particular also the intranasal and oral application of DNA containing vaccine
formulations by spray-type means. Alternatively, the DNA can also be conjugated to solid
supports such as gold beads and be administered via air pressure into the respective tissues.

To enhance or modulate the immune response the mentioned proteinecous antigens and DNA
constructs may be administered in combination or in sequential chronology with so called
adjuvants which are normally stimulators of the immune response. Conventional adjuvants such
as aluminium hydroxide or aluminium hydroxyphosphate result in a stimulation of the humoral
immune response showing a high antibody titer of the IgGl subtype. More modern adjuvants
such as CpG oligonucleotides (consensus core sequence. purine-purine-CpG-pyrimidine-
pyrimidine) or chemically modified derivatives thereof (phosphorothioate oligonucleotides,
oligonucleotides with a peptide backbone) usually enhance the cellular arm of the immune
response and support primarily the celt mediated immunity of the Thl type, which is
characterized by a high antibody titer of the IgG2a subtype and the induction of Thl cytokines
such as y-lFN, IL-2 and IL-12.
The administration and uptake of peptides, proteins and DNA vaccine constructs can be
improved in particular by binding to or incorporation into higher molecular structures such as
biodegradable particles, multilamellar, preferably cationic liposomes, immune stimulating
complexes (ISCOMS), virosomes or in vitro assembled virus particles. Said biodegradable
particles are e.g. PLA- (L-lactic acid), PGA- (polyglycolic) or PLGA- [poly (D,L-lactide-co-
glycolide)] microspheres or derivatives thereof, cationic microparticles or carrier substances
derived from bacterial polysaccharide capsules. The collective term ISCOMS designates immune
stimulating complexes derived from water soluble extracts from the bark of Quillaja sapottaha
and are purified by chromatographic methods. A detailed summary of the prior art of the various
adjuvants and administration means is given in
http://www.niaid.gov/aidsvaccine/'pdf/compendium.pdf [Vogel, F. R, Powell, M. F. and Alving,
C. R. „A Compendium of Vaccine Adjuvants and Excipients (2nd Edition)].
Furthermore, viral and alternatively bacterial vectors may be used for a suitable presentation of
epitope strings, polypeptides and virus like particles
According to the current state of the art e.g. genetically modified salmonellae and listeriae are
suited preferably due to their natural cell tropism to introduce DNA vaccine constructs into
antigen presenting cells like monocytes, macrophages and primarily into dendritic cells. Besides
the benefit of cell type specificity the genetic modifications can contribute to the fact that the
DNA can enter the cytoplasm of the antigen presenting cell without damage. In this case a DNA
vaccine construct enters the cell nucleus where the respective reading frame is transcribed via an

eukaryotic preferably viral or cell type specific promoter with use of the cellular resources and
proteins. The respective gene product is translated after the transport of the RNA into the
cytoplasm and is according to the respective conditions modified posttranslationally and
assigned to the respective cellular compartment
Bacterial vectors (salmonellae, listeriae, yersiniae etc.) may be used also for the induction of a
mucosal immunity preferably after an oral administration The respective antigens are produced
by the bacterial transcription and translation machinery thereby and is therefore not subject to the
posttranslational modifications usually present in mammalian cells (no respective glycolysation;
no secretory pathway).
In addition, a plurality of attenuated viral vectors exist now which are helpful in expressing the
desired antigens successfully and in high yields. Such viral vectors can be used directly for the
immunization besides their capability of the mere antigen production. Said production may take
place firstly either ex vivo e.g. for the infection of antigen presenting cells administered
subsequently to the vaccinee, or directly in vivo by subcutaneous, intradermal, intracutaneous,
intramuscular or intranasal immunization with the recombinant virus resulting in a beneficial
antigen presentation with the respective immunization success. Thus, exemplary adequate
humoral and cell mediated immune responses may be induced in the vaccinated subjects by
immunization with recombinant vaccine viruses such as Modified Vaccinia Ancara virus (MVA)
attenuated by passage through chicken cells, the genetically attenuated vaccinia type New York
(NYVAC) or the in birds endemic aviary vaccinia viruses {Fowlpox, Canaypox). Alternatively,
several other viruses are also qualified e.g. recombinant alpha-viruses, e.g. the Semliki-Forest
virus or the Venezuela-Enzphalitis virus, recombinant adenoviruses, recombinant Herpes
simplex viruses, influenza viruses and others.
Finally, also attenuated HIV viruses may be generated based on SEQ ID NO: 1, 2 or 3 and used
for immunization purposes, if the regulation sequences (LTR, long terminal repeat) flanking the
coding part are supplemented according to the prior art by cloning methods. A sufficient
attenuation can subsequently be obtained by one or more deletions e.g. in the nef gene according
to the prior art.
The nucleic acid sequences depicted in SEQ ID NO: I and 3 as well as the derived peptides,
proteins and virus like particles therefrom may also be used as components of viral vectors for

the gene transfer.
The polypeptides encoded by the GagPol gene (SEQ ID NO 1, nucleotide 177-4458; table 3) can
e.g. provide the packaging and receptor functions of e.g. lentiviral or retroviral vectors. Virus
particles may be generated which are also able to transduce resting postmitotic or finally
differentiated cells e.g. after transient transfection of mammalian cells by suitable plasmid
vectors which support the simultaneously expression of the GagPol and VSV-G (vesicular
stomatitis virus envelope protein) gene and ensure the packaging of the therapeutic transgene.
Said method to generate transduction competent virus particles can significantly be facilitated
and efficiently be configured e.g. by establishing of stable cell lines e.g. based on human
embryonic kidney cells (HEK293) which express the GagPol polyprotein constitutively or under
the control of an inducible promotor Alternatively, recombinant adenoviruses may be generated,
which encode the packaging functions, the receptor functions and the transgene function or
combination thereof and, thus, serve as a tool for the ex vivo, in situ and in vivo delivery of
retroviral or lentiviral vectors.
The envelope proteins or derivatives thereof encoded by SEQ ID NO.3 can provide the receptor
function for lentiviral, spumaviral or retroviral vectors or other vectors based on coated viruses
by incorporation in the lipid bilayer For this purpose eg. packaging cell lines may be generated
which either express the GagPol proteins from retroviruses, spumaviruses and preferably
lentiviruses as well as the envelope proteins derived from SEQ ID NO:1 and 3 constitutively or
under the control of an inducible, alternatively, a regulable promotor. Alternatively, chimeric
viruses based on the genome of type C or D retroviruses or other membrane coated viruses such
as influenza virus or herpes virus may be generated carrying on the surface in addition to or
instead of the naturally occurring envelope protein an envelope protein derived from SEQ ID
NO:1 or 3.
Against the peptides, proteins or virus like particles derived from SEQ ID NO: I to 3 (i)
polyclonal antisera, (ii) monoclonal antibodies (murine, human, camel), (iii) antibody derivatives
such as single-chain antibodies, humanized antibodies, bi-specific antibodies, antibody phage
libraries or (iv) other high affinity binding polypeptides such as derivatives of the hPSTI (human
pancreatic secretory trypsine inhibitor) may be generated. Said reagents may be used for
therapeutic purposes e.g. for the treatment of HIV infections or for diagnostic purposes e.g. for
the production of test kits.

Similarly, analogous peptides, proteins or nucleic acid sequences derived from SEQ ID NO:1, 2
or 3 may be used for diagnostic purposes e.g. for serodiagnosis, applying nucleic acid
hybridization technologies, employing nucleic acid amplification systems or combinations
thereof. Preferably, the polynucleotide fragments of the nucleotide sequence according to SEQ
ID NO: 1 according to the invention may be used in a polymerase chain reaction. Particularly
preferred is the use of the polynucleotide fragments of the nucleotide sequence according to SEQ
ID NO: 1 according to the invention for diagnostics by means of DNA chip technology
The invention is illustrated but not limited by the following examples.
Examples
Example 1. Blood samples. All the blood samples used in this study were collected from HIV-1
subtype C seropositive injection drug users (IDUs) in the national molecular epidemiology
survey during 1996-1997 at several HIV epidemic areas in China. Peripheral blood mononuclear
cells (PBMCs) were separated by ficoll gradients Viruses were isolated by cocultivating the
PBMCs from seropositive IDUs with phytohemagglutinin (PHA) stimulated donor PBMCs.
Positive virus culture was detected from cell culture supernatants by HIV-1 p24 Core Profile
EL1SA kit (DuPont Inc., Boston, MA).
Example 2. Polymerase chain reactions and DNA sequencing. Proviral DNA was extracted from
productively infected PBMCs of more than one hundred preselected HIV-1 positive IDUs from
the Northwestern provinces of China (Qiagen Inc., Valencia, CA). Nested-PCR was used to
amplify the envelope C2V3 coding region. PCR products were directly sequenced by Taq-cycle
sequencing using fluorescent dye-labeled terminators (Applied Biosystems, 373A, Foster City,
CA) as previously described (Bai et al. 1997; Yu el at. 1997). Multiple sequence alignments
were performed by applying the Wisconsin software package Genetics Computer Group with
correction methods of Kimura (GCG, 1997, version 9)
Example 3. Phylogenetic tree analysis of all obtained sequences were performed by using the
PHYLIP software package Evolutionary distances were calculated by the maximum parsimony
method and is indicated by cumulative horizontal branch length. The statistical robustness of the
neighbour joining tree was tested by bootstrap resampling as described (Graf et al. 1998).

Example 4: Selection of a representative C-clade HIV-1 isolate from Chinese FDUs. The
calculated average intra-group distances within the C2V3 coding region were as low as
2,26±1,43 on DNA level, indicating that the epidemic in this area is still very young. Inter-group
differences between the Chinese clade C sequences and those of Indian, African and South
American origin were 967+231 (India), 15.02+4.13 Africa and 8.78±3.41 (South America),
respectively. This demonstrates a close phylogenetic relationship between Indian and Chinese
clade C sequences (Lole et al. 1999) and a substantial genetic distance to the per se relatively
heterogeneous group of African clade C HIV-1 strains.
Example 5: Identification of a virus isolate representing best the prevalent clade C virus strain
circulating throughout China From the analyzed specimens, a representative isolate referred to
as 97cn54 was identified exhibiting highest homology (99.6%) to a calculated consensus
sequence (cn-conV3), which has been established on the basis of the characterized local HIV-
sequences (Table 1). Multiple amino acid sequence alignments including primary C-clade
representatives V3-loop sequences selected from different epidemic regions as well as consensus
sequences of other clades (A-H, O, CPZ) underlined the subtype C character of the selected
primary isolate 97cn54 (Tablet). Compared with an overall V3 consensus sequence (consensus),
97cn54 as well as cn-con-c show amino acid alterations at position 13 (H→R) and 19 (A→T),
both of which are characteristic for subtype C isolates (C consensus).



Table 1. V3 amino acid alignment of consensus sequences from different H1V-1 clades (A-O)
and selected subtype C isolates from different countries. The overall V3 consensus sequence was
constructed by aligning consensus sequences from different clades (A-O). cn-con-V3 represents
the consensus sequence of HIV-1 subtype C strains prevalent in China. 97cn54 has been selected
as the standard representative isolate of the most prevalent clade C H1V-1 strains circulating
throughout China. "-" indicates no exchange to the V3 consensus sequence, lower case letters
indicate an amino acid substitution and "." indicate gaps All consensus and isolate sequences for
multiple alignments were obtained from the Los Alamos database.
Example 6: The 97cn54 envelope protein coding sequence is most closely related to Indian
clade C virus strains. Phylogenetic tree analysis, initially based on the C2V3 sequences of the
envelope gene, revealed that both 97cn54 as well as the consensus sequence of Chinese clade C
isolates cluster to the subtype C strains from India (ind8, d!024, c-93in905, c-93in999, c-
93inll246), Africa (c-eth2220, c-ug286a2) and South America (92br025, nof, cam20 and
sml45). This suggests that the Indian C-clade virus strains might be the source of the HIV-1
subtype C epidemic in China (Fig. 1). This hypothesis is also in agreement with our early
epidemiology reference confirming that the HIV-1 subtype C infected individuals in Yunnan
shared the needles with the Indian jewellery businessmen in the boundary area (Shao el al.
1999).
Example 7. Cloning of the virtually full length HIV-1 genome. Virtually full-length HIV-1
genomes were amplified using the Expand Long Template PCR system (Boehringer-Mannheim,

Mannheim, Germany) as described (Graf et al. 1998; Salminen et at. 1995). Primers were
positioned in conserved regions within the HIV-1 long-terminal repeats (LTR): TBS-A1 (5'-
ATC TCT AGC AGT GGC GGC CGA A) and NP-6 (5'-GCA CTC AAG GCA AGC TTT ATT
G). Purified PCR-fragments were blunt-end ligated into a SrfI digested pCR-Script vector
(Stratagene, Heidelberg, Germany) and transformed into E. coli strain DH5α. Several
recombinant clones containing virtually full-length HIV-1 genome were identified by restriction
fragment length polymorphism (RFLP) analysis and sequencing of the V3-loop coding sequence.
According to RFLP analysis, using different combinations of restriction endonucleases, followed
by sequencing of the V3-loop coding sequence, 77% of the positive full-length constructs were
close to identical. A provirus construct representing the vast majority of the positive clones was
selected and sequenced as described above using the primer-walking approach (primers were
designed approximately every 300 bp along the genome for both strands).
Example 8. DNA sequences were assembled using Lasergene Software (DNASTAR, Inc,
Madison, WI) on Macintosh computers. All the reference subtype sequences in this study are
from the Los Alamos HIV database Nucleotide sequence similarities were calculated by the
local homology algorithm of Smith and Waterman. Multiple alignments of sequences with
available sequence data of other subtypes was performed using the Wisconsin software package
Genetics Computer Group (GCG, 1997, version 9).
Example 9. Overall structure of the 97cn54 coding sequence. The 9078 bp genomic sequence
derived from isolate 97cn54 contained all known structural and regulatory genes of an HIV-1
genome No major deletions, insertions or rearrangements were found. Nucleotide sequence
similarities were examined by comparing all coding sequences (CDS) of 97cn54 to consensus
sequences of different genotypes and selected subtype isolates (Table2). The highest homologies
of gag, pol, env and vif reading frames to the corresponding clade-C consensus sequences were
within a range of 93 93%-95.06%. This observation considerably extended the above C2V3
based sequence comparison and phylogenetic tree analysis (see Table 1 and Figure 1) and
therefore clearly confirmed the belonging of the selected virus isolate to the group of previously
published C-clade virus strains However, the homology values determined by this kind of
analysis for the tat, vpu, vpr and nef genes were not sufficient to allow a clear assignment of
these reading frames to clade-B or C virus strains (Table 2). For the vpu gene, the highest
homologies were notified to clade-B (94.24%) compared with only 78.23% to a clade-C
consensus sequence. Similar observations were made for the tat gene with highest homology to

the B'-rl42 isolate (>91%) as compared to 87.9% (C-92br025) and 85.5% (C-eth2220) tor
selected primary C-clade representatives or 89.01% for the clade-C consensus sequence. These
data, together with the occurrence of B, C and E genotypes throughout the epidemic area of
Yunnan suggested that the analyzed virus isolate might represent a mosaic virus strain that
resulted from a B7C interclade recombination event

Table 2: Nucleotide sequence comparison of all coding sequences (CDS) between 97cn54 and
DNA sequences, representing either (1) consensus sequences of distinct HIV-1 clades (obtained
from Las Alamos HIV database) or (2) standard subtype C (92br025 and eth2220) and B (mn
and rl42) isolates. The data indicate the percentage identity of a given sequence to 97cn54.
Ambiguous nucleotide positions within consensus sequences were scored as a match. The
highest degrees of homology are highlighted in boldface. /, no consensus sequence was available
from the Los Alamos database.
Example 10. Determination of intersubtype recombinations. Recombinant Identification

Program (RIP, version 1.3; http://hiv-web.lanl.gov/tools) was used to identify potential mosaic
structures within the full-length sequence of this clone (Window size: 200; Threshold for
statistical significance: 90%; Gap handling: STRIP; Informative mode: OFF). Gaps were
introduced in order to create the alignment. The background subtypes sequences in this analysis
were: u455 (subtype A), RL42 (Chinese subtype B-Thai (B')), eth2220 (subtype C), z2d2
(subtype D), 93th2 (subtype A/E).
Example 11. Interclade recombination in( the Gag-Pol coding region of 97cn54. Albeit
substantial homologies to C-clade virus strains were observed within the highly conserved gag
and pol reading frames, RIP analysis identified 3 areas of intraclade recombination within gagpol
around positions 478-620, 1290-1830 and 2221-2520 upstream of the gag start codon. These
dispersed stretches are located within gag and pol reading frames showing highest homology
towards prototype B (data not shown) and in particular highest towards a subtype-B(B') isolate
originating from Yunnan (Fig. 2). This observation clearly underlines the importance of RIP
analysis, since simple homology alignments based on complete genes were not able to identify
these small interspersed fragments of a different subtype In order to confirm the data obtained
by RIP analysis we created several phylogenetic trees using regions either flanking or spanning
the stretches of proposed recombination (Fig 3). Using various standard representatives of
different subtypes and some selected C-clade primary isolates all proposed areas of
recombination could be confirmed by differential clustering of 97cn54 with the respective C
(Fig. 3 A, C, E, G) or B-clade reference isolates (Fig. 3 B, D, F).
Example 12. Interclade recombination in the Env coding region of 97cn54. As expected from
the sequence alignments summarized in table 2, the RIP analysis clearly confirmed the
intersubtype recombination between subtype (B')-Thai and C (Fig. 4) A fragment of about 1000
bp extending from 3' 150 bp of vpr through the first exon of tat and rev to vpu showed the
highest degree of homology with the local subtype (B') representative (rl42) (Fig 4 A).
Furthermore, an about 300 bp sequence stretch overlapping the 5'-half of the nef gene showed
highest homology to the (B')-Thai subtype whereas the remaining part including a 300 bp
fragment extending to the 3'-LTR clustered with subtype C (Fig. 4 B).
Extending the RIP analysis, phylogenetic trees showed closest relationship of vpr/vpu and the 5'-
portion of the nef gene to clade-B isolates (Fig. 5 A, B), whereas the 3' -nef fragment clearly
clustered with subtype C representatives (Fig 5 C). Further analysis confirmed that the subtype

B sequence within this mosaic is more closely related to a very recently described Thai-CB')
strain (rl42) isolated from a Chinese IDU (Graf et al. 1998) than to prototype B isolates (ran and
sf2) (table 2).
Example 13. Representative character of 97cn54 Breakpoints located in the vpr/vpu coding
region as well as in the nef gene of 97cn54 were found at almost identical positions of all
subtype C strains isolated from IDUs living in the Northwestern provinces of China. Two RIP
analysis representative for 8 independently isolated and analyzed HIV-1 strains from different
HIV-1 infected individuals in the Xinjiang autonomous region are shown in Fig. 4 C and D.
Regarding the origins of 97cn54 (southwest of China) and xj24 and xjl5 (northwest area), these
data suggest a common ancestor for the C/B' recombinant strains circulating throughout China.
In conclusion, our results demonstrate that 97cn54 represents a C/(B') interclade mosaic virus
with 10 breakpoints of intraclade recombination that is most prevalent among the IDUs within
the Northwestern provinces of China. A schematic representation of the (B'/C) mosaic genome
of isolate 97cn54 is given in Figure 6.
Example 14. Prediction of cross-clade specific epitopes for HIV specific cytolytic T cells.
Genomic sequences offer the opportunity to assess conservation of known CTL epitopes, that
may have impact on the efficacy of HIV-1 candidate vaccines. Most reagents and data on CTL
epitopes are derived from clade B HIV-1 Lai sequences In order to provide an estimate of cross
clade CTL-epitope conservation, the predicted protein sequences of 97cn54 were compared to
the known and best mapped LAI specific CTL epitopes Of 194 reported HIV-1 CTL epitopes,
75, 55, 40 and 24 are located in Gag (pi7, p24, pi 5), in the reverse transcriptase (RT), in gpl20
and gp41, respectively. Whereas almost 50 % or more of the epitopes in Gag and RT are
completely identical, only 5% and 17% of the gpl20 and gp41 H1V-1|JAI derived CTL epitopes
exactly matched the predicted amino acid sequences of 97cn54. However, allowing as much as 2
conservative mismatches in a given CTL epitope, an additional portion of 48% (pi7), 33% (p24),
40% (RT), 57% (gpl20) and 33% (gp41) of the known HIV-1 LAI CTL epitopes was related to
the sequences in the corresponding 97cn54 derived polypeptides. Of course, the latter
consideration has to be taken with some caution, as even nonconservative changes might
abrogate HLA-binding or T-cell receptor recognition of an antigenic peptide. However, taken
together, these observations clearly predict a considerable cross-clade CTL reactivity especially
regarding the functionally and immunologically conserved HTV-1 proteins In addition, these
data suggest, that a considerable portion of the reagents (peptides, vaccinia virus constructs) that

have been synthesized and established for the mapping and characterization of clade B CTL
epitopes may be also useful in determining CTL reactivities on the basis of clade C HIV
sequences.

Numbering refers to the 5' end of the DNA sequence depicted in SEQ ID NO: 1.
Example 15. (A) Description of the synthetic C54 gp!60 coding region: C-gp160. The C-gp120
gene was cloned into the unique KpnI/SacI restriction sites of the pCR-Script amp(+) cloning
vector (Stratagene, Genbank Accession: U46017). The synthetic C54 gp160 coding region which
is codon-optimized to high expressing mammalian genes is set forth in SEQ ID NO 3. The
synthetic signal sequence encodes a transport signal for the import of the encoded polypeptide
into the endoplasmic reticulum.


(B) Description of the synthetic C54 gagpolnef sequence: C-gpnef. The C-gpnef gene was
cloned into the unique KpnI/SacI unique restriction sites of the pCR-Script amp(+) cloning
vector (Stratagene). The synthetic C54 gagpolnef sequence which is codon-optimized to high
expressing mammalian genes is set forth in SEQ ID NO:2. In the present construct the N
terminal glycine is replaced by alanine (nucleotide sequence GGC) to prevent a targeting of the
polypeptide to the cytoplasm membrane and the following secretion of assembled virus like
particles via budding. Simultaneously, a (-1) frame shift was introduced at the naturally frame
shift sequence to guarantee an obligatory read through of the ribosomes out of the Gag into the
Pol reading frame and, thus, guarantee the synthesis of a GagPolNef polyprotein
Positions of the different coding regions are as follows:

Example 16. The GagPolNef polygene encoded by SEQ ID NOT was inserted via a Kpnl/Xhoi
site into the vector pcDNA3.1 and transformed into E.coli strain XLlblue. The capability of the
GagPolNef expression vector to induce a Gag specific antibody response was analyzed in female
BALB/c mice (Fig. 9) Two groups of 5 animals each received an intramuscular (i.m.) first
immunization of each 100 ug DNA per immunization followed by two further i.m.
immunizations after 3 and 6 weeks (group 1: pcDNA-GagPolNef; Group 2: pcDNA). A control
group (group 3) was immunized with PBS only. The total titer of Gag specific IgG was
determined against purified Gag protein by ELISA. The immunization with pcDNA-GagPolNef
resulted in a rapid induction of a high titer of Gag specific antibodies (1:4,000) characterized by
a typical Thl profile of antibody isotypes (lgG2a » IgGl). Both control groups 2 and 3 yielded
no evidence for a generation of Gag specific antibodies. The antibody titer increased nearly to
the hundredfold (1:20,000) 1 week after the first further immunization and resulted in a Gag

specific end titer of 1:80,000 1 week after the second boost At no time a significant Gag specific
antibody response could be verified in the two control groups.
Example 17. The antigen specific cytokine secretion was analyzed from spleen cells each
dissected 5 days after the second further immunization as an evidence for the induction of a T
helper memory response. The spleen cells of those mice received three i.m. immunizations with
pcDNA-GagPolNef responded to the Gag specific antigen stimulus with a significant γIFN
secretion (table 3). A comparatively reduced γIFN production was observed in spleen cells which
were dissected from mice after triple subcutaneous (s c.) or intradermal (i.d.) immunization with
pcDNA-GagPolNef according to the same schema as above. In all immunization groups no
significant IL4 and IL5 secretions from the specific restimulated spleen cells in vitro were
determined independently from the immunization route A cytokine secretion from non
stimulated spleen cells was not observed.
According to this, the i.m. immunization with pcDNA-GagPolNef resulted in a strong Thl
cytokine profile whereas the s.c. administration induced a more weakly Tht response.
Table 4: Cytokine profile from in vitro stimulated mouse spleen cells with Gag after
immunization (injections with a needle) or id. or s.c. immunization with the mentioned DNA
constructs by means of a particle gun.

Example 18. To verify the capability of pcDNA-GagPolNef for the inducting of Gag specific
CTLs spleen cells were specifically restimulated in vitro 3 weeks after a first immunization with
pcDNA-GagPolNef (group 1), pcDNA (group 2) and PBS (group 3) in a mixed lymphocyte
tumor cell culture for 6 days and investigated for their cytotoxic activity subsequently. It is
known that the nonameric AMQMLKETI peptide (single letter code) derived from the Gag

protein of the subtype B virus (MB isolate) is a Dd restricted CTL epitope in BALB/c mice. Said
peptide was used in the experiment to restimulate the specific cytotoxic activity in vitro as well
as to determine said activity. Gag specific cytotoxic T cells could be determined after a single
i.m. injection with the pcDNA-GagPolNef plasmid but not in the control groups 2 and 3 The
treatment of spleen cells with said plasmid did not result in an in vitro priming of Gag specific
cytotoxic T cells. These results confirmed (i) the capability of pcDNA-GagPolNef to induce
specific cytotoxic T cells which are (ii) subtype spanning active (Fig. 9).

References
Bai, X., Su, L., Zhang, Y., and et al(1997). Subtype and sequence analysis of the C2V3 region of
gp120 gene among HIV-1 strains in Xinjiang. Chin. ./. Virology 13.
Carr, J. K., Salminen, M O., Koch, C, Gotte, D , Artenstein, A. W„ Hegerich, P. A., St Louis,
D., Burke, D. S., and McCutchan, F. E.(1996). Full-length sequence and mosaic structure of a
human immunodeficiency virus type 1 isolate from Thailand. J. Virol. 70, 5935-5943.
Carr, J. K, Salminen, M. O , Albert, J., Sanders Buell, E, Gotte, D., Birx, D. L„ and
McCutchan, F. E.(1998). Full genome sequences of human immunodeficiency virus type 1
subtypes G and A/G intersubtype recombinants. Virology 247, 22-31.
Esparza, J., Osmanov, S., and Heyward, W. L.(1995). HIV preventive vaccines. Progress to date.
Drugs 50, 792-804.
Expert group of joint United Nations programme on HIV/ATDS(1999). Implications of HIV
variability for transmission, scientific and policy issues. AIDS 11, UN AIDS 1-UNAIDS 15.
Gao, F., Robertson, D L., Morrison, S. G., Hui, H., Craig, S., Decker, J., Fultz, P. N., Girard,
M, Shaw, G. M, Hahn, B. H„ and Sharp, P M.(1996). The heterosexual human
immunodeficiency virus type 1 epidemic in Thailand is caused by an intersubtype (A/E)
recombinant of African origin../. Virol. 70, 7013-7029.
Gao, F., Robertson, D. L , Carruthers, C. D., Morrison, S. G, Jian, B , Chen, Y., Barre Smoussi,
F., Girard, M., Srinivasan, A., Abimiku, A. G., Shaw, G. M„ Sharp, P. M., and Hahn, B.
H.(1998). A comprehensive panel of near-full-length clones and reference sequences for non-
subtype B isolates of human immunodeficiency virus type I.J. Virol. 72, 5680-5698.
Gaywee, J., Artenstein, AW., VanCott, T. C , Trichavaroj, R., Sukchamnong, A., Amlee, P., de
Souza, M., McCutchan, F. E., Carr, J. K., Markowitz, L. E., Michael, R., and Nittayaphan,
S.(1996) Correlation of genetic and serologic approaches to HIV-1 subtyping in Thailand. J.
Acquir. Immune. Defic. Syndr. Hum. Retrovirol. 13, 392-396.
Graf, M„ Shao, Y„ Zhao, Q., Seidl, T., Kostler, J., Wolf, H., and Wagner, R.(1998). Cloning and

characterization of a virtually full-length HIV type 1 genome from a subtype B'-Thai strain
representing the most prevalent B-clade isolate in China. AIDS Res. Hum. Retroviruses 14, 285-
288.
Graham, B. S. and Wright, P. F.(1995). Candidate AIDS vaccines. N. Engl. ./. Med. 333, 1331-
1339.
Kostrikis, L. G , Bagdades, E., Cao, Y., Zhang, L., Dimitriou, D., and Ho, D. D.(1995). Genetic
analysis of human immunodeficiency virus type 1 strains from patients in Cyprus: identification
of a new subtype designated subtype I../. Virol. 69, 6122-6130.
Leitner, T. and Albert, J.(1995). Human Retroviruses and AIDS 1995: a compilation and
analysis of nucleic acid and amino acid sequences. (Myers, G., Korber, B., Wain-Hobson, S..
Jeang, K., Mellors, J, McCutchan, F., Henderson, L, and Pavlakis, G. Eds.) Los Alamos
National Laboratory, Los Alamos, N. Mex. 111147-10150
Lole, K. S„ Bollinger, R. C , Paranjape, R. S., Gadkari, D, Kutkami, S. S., Novak, N. G,
Ingersotl, R., Sheppard, H. W, and Ray, S. C.(1999). Full-length human immunodeficiency
virus type 1 genomes from subtype C-infected seroconverters in India, with evidence of
intersubtype recombination../. Virol. 73, 152-160
Loussert Ajaka, I., Chaix, M. L., Korber, B., Letourneur, F., Gomas, E., Allen, E., Ly, T. D.,
Brun Vezinet, F., Simon, F., and Saragosti, S(1995). Variability of human immunodeficiency
virus type I group O strains isolated from Cameroonian patients living in France. . ./. Virol. 69,
5640-5649.
Luo, C. C, Tian, C, Hu, D. J., Kai, M, Dondero, T., and Zheng, X.(1995). HIV-1 subtype C in
China [letter]. Lancet 345, 1051-1052.
Myers, G., Korber, B., Foley, B., Jeang, K. T., Mellors, J. W , and Wain Hobson, S (1996)
Human retroviruses and AIDS: a compilation and analysis of nucleic acid and amino acid
sequences. (Anonymous Theoretical Biology and Biophysics Group, Los Alamos, N. Mex
Salminen, M. O., Koch, C, Sanders Buell, E„ Ehrenberg, P. K., Michael, N. L., Carr, J. K ,

Burke, D. S., and McCutchan, F. E(1995). Recovery of virtually full-length HIV-1 provirus of
diverse subtypes from primary virus cultures using the polymerase chain reaction. Virology 213,
80-86.
Shao, Y , Zhao, Q , Wang B , and et al(1994). Sequence analysis of HIV env gene among HIV
infected IDUs in Yunnan epidemic area of China Chin. ./. Virology 10, 291-299.
Shao, Y., Su, L„ Sun, X., and et al(1998). Molecular Epidemiology of HIV infection in China.
12th world AIDS conference, Geneva 13132, (Abstract)
Shao, Y., Guan, Y., Zhao, Q., and et al(1999). Genetic variation and molecular epidemiology of
the Ruily HIV-1 strains of Yunnan in 1995. Chin. J. Virol. 12, 9.
Sharp, P. M., Robertson, D L., and Hahn, B. H.(1995). Cross-species transmission and
recombination of'AIDS' viruses. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 349, 41-47.
Sharp, P. M, Bailes, E., Robertson, D. L., Gao. F , and Hahn, B. H.(1999) Origins and evolution
of AIDS viruses. Biol. Bull. 196, 338-342.
World Health Organisation Network for HIV Isolation and Characterization(I994). HIV-1
variation in WHO-sponsored vaccine-evaluation sites.genetic screening, sequence analysis and
preliminary biological characterization of selected viral strains. AIDS Res. Hum. Retroviruses
10, 1327-1344.
Yu, H, Su, L., and Shao, Y.(1997). Identification of the HIV-1 subtypes by HMA and
sequencing. Chin. J. Epidemiol. 18, 201-204

WE CLAIM :
1 A polynucleotide having the nucleic acid sequence as depicted in SEQ ID NO: 1, 2 or 3
or sequence from the nucleotide numbers 177 to 1654, 1447 to 4458, 5589 to 8168,
4403 to 4984, 4924 to 5214, 5426 to 5671, 8170 to 8790, 5195 to 5409, 7730 to 7821,
5334 to 5409 or 7730 to 7821, or a continuous sequence on a polynucleotide from the
nucleotide numbers 5195 to 5409 and 7730 to 7821, or 5334 to 5409 and 7730 to
7821, each referring to SEQ ID NO: 1.
2. The polynucleotide having more than one continuous nucleic acid sequence according
to claim 1.
3. The polynucleotide according to claim 2, wherein at least two of the continuous
nucleic acid sequences are separated by a nucleotide sequence spacer
4. Codon-optimized polynucleotide encoding a polypeptide which is encoded by a
polynucleotide according to any one of claims 1 to 3.
5. DNA, bacterial or viral vectors, comprising the polynucleotide according to any one of
claims 1 to 4.
6. The polynucleotide according to any one of claims 1 to 4 as a medicament, vaccine or
diagnostic substance, or a polynucleotide having one or more at least 27 nucleotide
long fragment(s) of one or more polynucleotide(s) according to any one of claims 1 to
4 as a medicament, vaccine or diagnostic substance, or a polynucleotide having two or
more at least 27 nucleotide long fragments separated by nucleotide sequence spacer of
one or more polynucleotide(s) according to any one of claims 1 to 4 as a medicament,
vaccine or diagnostic substance.
7. The use of the polynucleotide according to any one of claims 1 to 4, or of a
polynucleotide having one or more at least 27 nucleotide long frangment(s) of one or
more polynucleotide(s) according to any one of claims 1 to 4, or of a polynucleotide

having two or more at least 27 nucleotide long frangments separated by nucleotide
sequence spacer of one or more polynucleotide(s) according to any one of claims 1 to
4, for the manufacture of a medicament or vaccine for the treatment or prevention of
HIV infections, or of a diagnostic substance for the diagnosis of HIV infections.
8. Polypeptide, encoded by the nucleic acid sequence according to any one of claims 1 to
4 or an at least 9 amino acid long fragment thereof.
9. The polypeptide according to claim 8, having more than one continuous fragment of at
least 9 amino acids.
10. The polypeptide according to claim 9, said fragments being separated by an amino
acid spacer.
11. The polypeptide according to any one of claims 8 to 10, said amino acid sequence
corresponds to the HIV envelope protein or a fragment thereof..
12. The polypeptide according to any one of claims 8 to 11, further comprising an B cell
epitope, an MHC class II restricted T helper epitope, an MHC class I restricted
cytotoxic T cell epitope or a combination thereof
13. The polypeptide according to claim 12, said B cell epitope being a conformational or a
linear epitope, and said MHC class II restricted T helper epitope or said MHC class I
restricted cytotoxic T cell epitope being a linear epitope.
14. The polypeptide according to any one of claims 8 to 13 as a medicament, vaccine or
diagnostic substance.
15. The use of the polypeptide according to any one of claims 8 to 13 for the manufacture
of a medicament or vaccine for the treatment or prevention of HIV infections, or of a
diagnostic substance for the diagnosis of HIV infections.
16. Isolated polypeptide specifically binding a polypeptide according to any one of claims
8 to 13.

17. The isolated polypeptide according to claim 16, said polypeptide being an antibody,
antibody derivative or a derivative of the human pancreatic secretory trypsine
inhibitor (hPSTI).
18. The isolated polypeptide according to claims 16 or 17 as a medicament or diagnostic
substance.
19. The use of the isolated polypeptide according to any one of claims 16 to 18 for the
manufacture of a medicament for the treatment or prevention of HIV infections, or of a
diagnostic substance for the diagnosis of HIV infections.
20. Eukaryotic packaging cell line transformed with a polynucleotide having a nucleic
acid sequence as depicted in SEQ ID NO:1 and/or 3, or a codon-optimized
polynucleotide encoding a polypeptide which is encoded by a polynucleotide
according to any one of claims 1 and/or 3.

A polynucleotide having the nucleic acid sequence according to SEQ ID NO:1, 2 or 3 or a
sequence from the nucleotide numbers 167 to 1654, 1447 to 4458, 5589 to 8168, 4403 to
4984, 4924 to 5214, 5426 to 5671, 8170 to 8790, 5195 to 5409, 7730 to 7821, 5334 to
5409 or 7730 to 7821, or a continuous sequence on a polynucleotide from the nucleotide
numbers 5195 to 5409 and 7730 to 7821, or 5334 to 5409 and 7730 to 7821, each referring to SEQ ID NO:1.

Documents:

IN-PCT-2002-507-CORRESPONDENCE 1.3.pdf

in-pct-2002-507-kol-abstract.pdf

in-pct-2002-507-kol-assignment.pdf

in-pct-2002-507-kol-claims.pdf

IN-PCT-2002-507-KOL-CORRESPONDENCE 1.2.pdf

IN-PCT-2002-507-KOL-CORRESPONDENCE 1.4.pdf

IN-PCT-2002-507-KOL-CORRESPONDENCE-1.1.pdf

IN-PCT-2002-507-KOL-CORRESPONDENCE.pdf

in-pct-2002-507-kol-description (complete).pdf

in-pct-2002-507-kol-drawings.pdf

in-pct-2002-507-kol-examination report.pdf

in-pct-2002-507-kol-form 1.pdf

in-pct-2002-507-kol-form 13.pdf

in-pct-2002-507-kol-form 18.pdf

in-pct-2002-507-kol-form 2.pdf

IN-PCT-2002-507-KOL-FORM 27-1.1.pdf

IN-PCT-2002-507-KOL-FORM 27.pdf

IN-PCT-2002-507-KOL-FORM 3.1.1.pdf

in-pct-2002-507-kol-form 3.pdf

in-pct-2002-507-kol-form 5.pdf

in-pct-2002-507-kol-pa.pdf

in-pct-2002-507-kol-priority document.pdf

in-pct-2002-507-kol-specification.pdf


Patent Number 239166
Indian Patent Application Number IN/PCT/2002/507/KOL
PG Journal Number 11/2010
Publication Date 12-Mar-2010
Grant Date 09-Mar-2010
Date of Filing 23-Apr-2002
Name of Patentee GENEART AG
Applicant Address JOSEF-ENGERT-STR. 11, 93053 REGENSBURG, DEUTSCHLAND
Inventors:
# Inventor's Name Inventor's Address
1 WOLF, HANS JOSEF JAGERHUBERSTRASSE 9, 82319 STARNBERG, DEUSCHLAND
2 WAGNER, RALF FRANZ-VON-TAXIS-RING 59, 93949 REGENSBURG
3 GRAF, MARCUS SPIEGELGASSE 3 B, 93047 REGENSBURG, DEUTSCHLAND
PCT International Classification Number C12N 15/49
PCT International Application Number PCT/DE2000/04073
PCT International Filing date 2000-11-16
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 19955089.1 1999-11-16 Germany