Abstract for XV International HIV Drug Resistance Workshop, Sitges, Spain.
Accurate prediction of virologic response to HAART using three computational modelling techniques
BA Larder1, D Wang1, F De Wolf 2, J Lange3, A Revell1, S Wegner4, J Montaner5, R Harrigan5, JA Metcalf 6, HC Lane6.
1: HIV Resistance Response Database Initiative (RDI), London, UK; 2: Netherlands HIV Monitoring Foundation, Amsterdam, the Netherlands; 3: Academic Medical Centre of the University of Amsterdam, Amsterdam, the Netherlands; 5: BC Centre for Excellence in HIV/AIDS, Vancouver, Canada; 6: National Institute of Allergy and Infectious Diseases (NIAID), Bethesda, MD, USA.
Introduction
Virologic response to different antiretroviral (ARV) combinations is highly complex and dependent on many factors. Accurate prediction of response to optimize ARV selection is likely to require sophisticated modelling. The RDI has demonstrated that artificial neural networks (ANN) can predict response to combination therapy from genotype, viral load, CD4 count and treatment history. Here we compare ANN with alternative high-level modelling methods: Support Vector Machines (SVM) and Random Forests (RF).
Methods
10 ANN, RF and SVM models were trained to predict virologic response (
VL) using 76 input variables (55 mutations, antiretroviral drugs, viral load, CD4 count, treatment history and time to follow-up) from 1,154 treatment change episodes (TCEs) from the RDI database (from a large number of clinical sources). These models were then tested using the input data from two independent tests sets:
50 TCEs selected at random from the RDI database (different patients from the training set)
50 TCEs from clinics in the Netherlands (Athena database) without data in the training set.
The averaged models’ predictions of virologic response (
VL) were then compared with the actual
VL values. The outputs of the three methods were then combined for each test TCE and these predictions were also compared with actual
VL values.
Results
Correlations between predicted and actual
VL for the RDI test set gave
values of 0.69, 0.62 and 0.71 for ANN, SVM and RF respectively. The
values for the Athena test set gave
values of 0.46, 0.48 and 0.47. There were no statistically significant differences between methods but the predictions for the RDI test set were significantly more accurate than for the Athena test set. The combined outputs produced an
value of 0.73 for the RDI test set and 0.52-0.53 for the Athena data.
Discussion
These results suggest that all three methods can yield fairly accurate predictions of response and their combination may provide a modest improvement in accuracy over any one method. As with previous studies, the predictions for clinics with data in the training data set were somewhat more accurate than for ‘unfamiliar’ clinics. The accuracy of these models, trained using relatively small datasets, was encouraging.