Poster presentation at:
International HIV Drug Resistance Workshop
2nd July 2002 - 5th July 2002
Wang D, DeGruttola V, Hammer S, Harrigan R, Larder BA, Wegner S, Winslow D & Zazzi M. On Behalf of The HIV Resistance Response Database Initiative (RDI)
Background The goal of the RDI is to create a sufficiently large relational database to correlate baseline HIV-1 resistance genotypes and drug therapy to virological response. Initially, we wished to calculate the approximate database size required to derive significant relational algorithms and to develop predictive neural network (NN) models.
Methods Baseline genotype, viral load (VL) and treatment data, plus week 24 VL from clinical studies is being collected in a customised Oracle database. We aim to collect data from 5000 - 10,000 individuals. Power calculations for database size were based on complex (12 drugs and 49 mutations) or simplified (6 drugs and 31 mutations) data input parameters. 400 patient samples from the Vigilance II trial were used to derive 700 'cases' (VL from weeks 0-8, 0-16, or 0-24) for NN training of a complex and simplified model.
Results The complex input parameters gave an approximate required database size of >9500 patient samples to obtain a high predictive accuracy and using the simplified parameters, a requirement for >5000 samples was estimated. Training of both NN models using the 700 patient cases was surprisingly successful. For the complex NN model, the correlation between the predicted and actual absolute VL change for the training set gave an value of 0.85 (p<0.0001). In independent test and validation sets, this correlation was less accurate, but nonetheless highly significant, giving an average value of 0.5A+-0.05 (p<0.0001). The VL trajectory was correctly predicted in 75% (A+-1.8%) of cases in the test sets by the complex NN model. Of interest, the simplified NN model appeared slightly less accurate in predicting absolute VL change (average = 0.47A+-0.05 for test and validation sets) or the VL trajectory (72A+-1.8% correct for the test and validation sets). However, these differences between the two models were not statistically significant (p>0.05). The complex model was used to predict 'in silico' VL response using a hypothetical genotype (RT mutations at codons: 41, 67. 118, 210, 215; Protease inhibitor mutations at codons: 10, 46, 82, 90) and a variety of treatment regimens. The following was the order of superior predicted potency of the regimens tested: AZT/3TC/LPV/rtv > d4T/ddI/LPV/rtv > AZT/3TC/IDV > d4T/ddI/IDV.
Conclusions Estimation of sample size revealed that a database will require large numbers to derive sophisticated predictive virological response algorithms. Initial development of NN models resulted in successful training with a limited dataset and demonstrated that it is possible to use this approach to predict the absolute VL change and VL trajectory from complex input variables. However, training with larger data sets will be required to increase accuracy and determine which of the two NN models is superior.