Poster presentation at:
The XI International HIV Drug Resistance Workshop
Seville, 2-5 July 2002
A Collaborative HIV Resistance Response Database Initiative: Predicting
Virological Response Using Neural Network Models.
D Wang, V.DeGruttola, S. Hammer, R. Harrigan, B. Larder, S. Wegner,
D. Winslow & M. Zazzi. On Behalf of The HIV Resistance Response
Database Initiative (RDI).
Background: The goal of the RDI is to create a sufficiently
large relational database to correlate baseline HIV-1 resistance genotypes
and drug therapy to virological response. Initially, we wished to calculate
the approximate database size required to derive significant relational
algorithms and to develop predictive neural network (NN) models.
Methods: Baseline genotype, viral load (VL) and treatment data, plus
week 24 VL from clinical studies is being collected in a customised
Oracle database. We aim to collect data from 5000 - 10,000 individuals.
Power calculations for database size were based on complex (12 drugs
and 49 mutations) or simplified (6 drugs and 31 mutations) data input
parameters. 400 patient samples from the Vigilance II trial were used
to derive 700 'cases' (VL from weeks 0-8, 0-16, or 0-24) for NN training
of a complex and simplified model.
Results: The complex input parameters gave an approximate required
database size of >9500 patient samples to obtain a high predictive
accuracy and using the simplified parameters, a requirement for >5000
samples was estimated. Training of both NN models using the 700 patient
cases was surprisingly successful. For the complex NN model, the correlation
between the predicted and actual absolute VL change for the training
set gave an
value of 0.85 (p<0.0001). In independent test and
validation sets, this correlation was less accurate, but nonetheless
highly significant, giving an average
value of 0.5±0.05 (p<0.0001).
The VL trajectory was correctly predicted in 75% (±1.8%) of cases
in the test sets by the complex NN model. Of interest, the simplified
NN model appeared slightly less accurate in predicting absolute VL change
(average
= 0.47±0.05 for test and validation sets) or the
VL trajectory (72±1.8% correct for the test and validation sets).
However, these differences between the two models were not statistically
significant (p>0.05). The complex model was used to predict 'in silico'
VL response using a hypothetical genotype (RT mutations at codons: 41,
67. 118, 210, 215; Protease inhibitor mutations at codons: 10, 46, 82,
90) and a variety of treatment regimens. The following was the order
of superior predicted potency of the regimens tested: AZT/3TC/LPV/rtv
> d4T/ddI/LPV/rtv > AZT/3TC/IDV > d4T/ddI/IDV.
Conclusions: Estimation of sample size revealed that a database
will require large numbers to derive sophisticated predictive virological
response algorithms. Initial development of NN models resulted in successful
training with a limited dataset and demonstrated that it is possible
to use this approach to predict the absolute VL change and VL trajectory
from complex input variables. However, training with larger data sets
will be required to increase accuracy and determine which of the two
NN models is superior.