Poster presentation at:
XV International HIV Drug Resistance Workshop
13th June 2006 - 17th June 2006
Sitges, Spain
Larder BA1, Wang D1, De Wolf F2, Lange J3, Revell AD1, Wegner S4, Montaner JS5, Harrigan R5, Metcalf JA6, Lane HC6.
1: RDI, London, UK; 2: Netherlands HIV Monitoring Foundation, Amsterdam, the Netherlands; 3: Academic Medical Centre of the University of Amsterdam, the Netherlands; 4: US Military HIV Research Program, Rockville, USA; 5: BC Centre for Excellence in HIV/AIDS, Vancouver, Canada; 6: National Institute of Allergy and Infectious Diseases (NIAID), Bethesda, MD, USA.
Introduction
Virologic response to different antiretroviral (ARV) combinations is highly complex and dependent on many factors. Accurate prediction of response to optimize ARV selection is likely to require sophisticated modelling. The RDI has demonstrated that artificial neural networks (ANN) can predict response to combination therapy from genotype, viral load, CD4 count and treatment history. Here we compare ANN with alternative high-level modelling methods: Support Vector Machines (SVM) and Random Forests (RF).
Methods
10 ANN, RF and SVM models were trained to predict virologic response (ΔVL) using 76 input variables (55 mutations, antiretroviral drugs, viral load, CD4 count, treatment history and time to follow-up) from 1,154 treatment change episodes (TCEs) from the RDI database (from a large number of clinical sources). These models were then tested using the input data from two independent tests sets:
50 TCEs selected at random from the RDI database (different patients from the training set)
50 TCEs from clinics in the Netherlands (Athena database) without data in the training set.
The averaged modelsaEUR(TM) predictions of virologic response (ΔVL) were then compared with the actual ΔVL values. The outputs of the three methods were then combined for each test TCE and these predictions were also compared with actual ΔVL values.
Results
Correlations between predicted and actual ΔVL for the RDI test set gave r@ values of 0.69, 0.62 and 0.71 for ANN, SVM and RF respectively. The r@ values for the Athena test set gave r@ values of 0.46, 0.48 and 0.47. There were no statistically significant differences between methods but the predictions for the RDI test set were significantly more accurate than for the Athena test set. The combined outputs produced an r@ value of 0.73 for the RDI test set and 0.52-0.53 for the Athena data.
Discussion
These results suggest that all three methods can yield fairly accurate predictions of response and their combination may provide a modest improvement in accuracy over any one method. As with previous studies, the predictions for clinics with data in the training data set were somewhat more accurate than for aEUR~unfamiliaraEUR(TM) clinics. The accuracy of these models, trained using relatively small datasets, was encouraging.