RDI - HIV Resistance Response Database Initiative
RDI AIDS HIV Resistance
 
Who we are
What we do
Services
Current Activities
News
Scientific Publications
Funding
Contact UsSite Map       

Back



The genetic basis of HIV drug resistance is extremely complex because of the number of individual mutations involved and the complex interactions between them. The aim of the RDI is to improve the interpretation of genotypic information and predict virological response to combination HIV therapy.  Current algorithms only take into account the effects of a sub-set of mutations that have the largest and best-documented effects on susceptibility. Rules are applied that predict the effects of mutations on the susceptibility of the virus to individual drugs.  The predictions are categorical, typically predicting that the virus will be resistant, intermediate or sensitive to each drug.

The RDI is using a variety of computational modelling techniques to explore the relation between variations throughout the protease and reverse transcriptase genes and drug susceptibility. The main techniques that are currently being employed are artificial neural networks, random forests and support vector machines.  These models are trained using large amounts of data to predict the virological response to combination antiretroviral therapy.

The models are trained using data from large numbers of TCEs from the RDI database. The following input data are provided and the models trained to predict the single output variable of follow-up viral load:

  • Baseline viral load
  • Baseline genotype
  • Baseline CD4 count
  • Treatment history information
  • Drugs in new regimen
  • Time to follow-up

Once trained the models are tested using the input variables from an independent test dataset .  The models’ predictions of virological response for these test cases are compared to the actual virological responses in terms of the correlation and mean absolute difference between them. 

More details of computational modelling

Artificial neural networks (ANN)
An ANN model consists of several layers of neural units that are connected from the input layer to hidden layers and from hidden layers to the output layer. The relationship between the follow-up viral load and the baseline information is expressed by the weights on the connections between the neural units. The weights are adjusted during the training procedure and the final values of the weights are obtained by minimising an error function. Theoretically, 3-layer neural networks can be used to approximate any function. Therefore, we used only 1-hidden-layer neural networks. A cross-validation scheme is used in the ANN modelling to assess the accuracy of the models during training and generate ANN committee member models. 

Random Forests (RF)
An RF model consists of an ensemble of individual trees. The individual trees are built using different sets of samples from the original training dataset. In each node of a tree, the splitting feature is selected from a randomly chosen sample of features. There is no need for cross-validation in RF modelling because the training dataset of the individual trees are built by bootstrap replication, this leaves about one-third of the samples out of the bootstrap sample, which can be used for the validation purpose. The outputs of all trees are aggregated to produce a final prediction.

Support Vector Machines (SVM)

The principle of SVM is to map the data into a high-dimensional feature space and perform linear regression in this space. SVM searches for a global solution, and does not control model complexity by keeping the number of input variables small. It is thought to be more resistant to ‘over-fitting’ based on the training data set and, hence potentially more generalisable to new data. The drawbacks of SVM are its high algorithmic complexity and the length of time taken for training.

 

Who we are | What we do | Services | Current activities | News | Scientific Publications | Funding
Home | Site map | Contact us | Legal Notice | Privacy Statement
©Copyright RDI 2003-2007 - All Rights Reserved