The predictions of treatment response are made by a selection of random forest models. The models used depend on the data available and entered into the system by the user. For example, if there is a recent viral genotype available that can be uploaded (or the mutations entered) then models that have been trained to use this information will be used. If not models that do not require a genotype will be used. For each case the predictions are made by a ‘committee of 5-10 appropriate models and their outputs averaged for the final prediction.
There are two classes of models: those that estimate the probability of a virological response (plasma HIV RNA <50 copies/ml), which are called ‘Classifier’ (C) models and those that estimate the absolute viral load at different time points, which are called ‘Absolute’ (A) models.
The models vary in accuracy. Models that estimate the probability of response (viral load <50 copies HIV RNA/ml), C models, achieve accuracy of around 80% in independent testing. Those that use a genotype in their predictions usually perform 2-3 percent better than those that do not. Models that can make predictions with missing values e.g. the baseline CD4 count is not available, are generally a couple of percentage points less accurate. The exact accuracy of the models in training and testing can be found in our version history and in our publications.
Models that estimate the absolute viral load correlate with actual viral load values in independent testing with a correlation coefficient of around 0.7 and a mean absolute error of about 0.7 log.
The output of the models is an estimate of the probability of the HIV viral load going below 50 copies HIV RNA/ml following a change of antiretroviral treatment. During the development and cross validation of the models we identified the optimum operating point (OOP) - a cut-off value for response and failure that provides the best overall accuracy of the system. Any estimated pro0bability of response below that value is classified as a prediction of failure and any above as a prediction of success. In order to optimise the performance of the system across all different drugs and regimens we use different OOPs for regimens containing certain drugs.
The RDI database contains data from over 250,000 patients. From this database treatment change episodes (TCEs) are extracted that meet the criteria required for training each of the different types of models. The training sets typically vary from between 20,000 and 50,000 TCEs. The models are then tested with independent test sets that are typically 5% of the size of the training set.
The system makes its predictions of response to a new antiretroviral treatment based on the individual patient’s treatment history (the drugs to which they have been exposed and an estimate of the time they have spent on treatment), the baseline genotype, viral load, CD4 count and the time to follow-up, as entered by the healthcare professional. Different models have been trained to make predictions with certain of these data missing.
Each time we develop a new set of models we select cases from our database with all the information required to develop those particular models. Having done so we then check to see how much data there is on treatment with each of the available drugs. Sometimes there is insufficient data on a particular drug to include it in those particular models. This particularly applies to drugs that have not been in use for long as there may be insufficient data collected by the clinics and provided to RDI at the time of model development.
To date there are over 700 users in 88 countries using the system. New users are signing-up and new cases being entered each month.
Log-in to TRePS and then in the top right hand corner of the home screen select 'My Account' on this screen and then 'Account Details' you will then see an option at the bottom of the screen entitled 'Email Reports' here you can change it to 'Yes' or 'No' and then Save your preference.
The HIV Resistance Response Database Initiative is a not-for-profit group set up in 2002 as a wholly independent international body to:
The RDI consists of a small research team based in the UK and a large global network of advisors, research partners and data donors.
The data is donated to the RDI by hospitals, clinics, research programmes, pharmaceutical companies and other institutions and groups around the world. More information.