Frequently Asked Questions

The Models

What models does the system use to make its predictions?

The predictions of treatment response are made by a selection of random forest models. The models used depend on the data available and entered into the system by the user. For example, if there is a recent viral genotype available that can be uploaded (or the mutations entered) then models that have been trained to use this information will be used. If not models that do not require a genotype will be used. For each case the predictions are made by a ‘committee of 5-10 appropriate models and their outputs averaged for the final prediction.

There are two classes of models: those that estimate the probability of a virological response (plasma HIV RNA <50 copies/ml), which are called ‘Classifier’ (C) models and those that estimate the absolute viral load at different time points, which are called ‘Absolute’ (A) models.

How reliable are the predictions made by the system when the genotype is included?

The models vary in accuracy. Models that estimate the probability of response (viral load <50 copies HIV RNA/ml), C models, achieve accuracy of around 80% in independent testing. Those that use a genotype in their predictions usually perform 2-3 percent better than those that do not. Models that can make predictions with missing values e.g. the baseline CD4 count is not available, are generally a couple of percentage points less accurate. The exact accuracy of the models in training and testing can be found in our version history and in our publications.

Models that estimate the absolute viral load correlate with actual viral load values in independent testing with a correlation coefficient of around 0.7 and a mean absolute error of about 0.7 log.

How do you classify the system’s outputs as response or failure?

The output of the models is an estimate of the probability of the HIV viral load going below 50 copies HIV RNA/ml following a change of antiretroviral treatment. During the development and cross validation of the models we identified the optimum operating point (OOP) - a cut-off value for response and failure that provides the best overall accuracy of the system. Any estimated pro0bability of response below that value is classified as a prediction of failure and any above as a prediction of success. In order to optimise the performance of the system across all different drugs and regimens we use different OOPs for regimens containing certain drugs.

How much data was used to train the models used by the system?

The RDI database contains data from over 250,000 patients. From this database treatment change episodes (TCEs) are extracted that meet the criteria required for training each of the different types of models. The training sets typically vary from between 20,000 and 50,000 TCEs. The models are then tested with independent test sets that are typically 5% of the size of the training set.

What data are the predictions based on?

The system makes its predictions of response to a new antiretroviral treatment based on the individual patient’s treatment history (the drugs to which they have been exposed and an estimate of the time they have spent on treatment), the baseline genotype, viral load, CD4 count and the time to follow-up, as entered by the healthcare professional. Different models have been trained to make predictions with certain of these data missing.


General questions about HIV-TRePS

Why can’t I get predictions for rilpivirine or dolutegraivir from your system?

Each time we develop a new set of models we select cases from our database with all the information required to develop those particular models.  Having done so we then check to see how much data there is on treatment with each of the available drugs.  Sometimes there is insufficient data on a particular drug to include it in those particular models.  This particularly applies to drugs that have not been in use for long as there may be insufficient data collected by the clinics and provided to RDI at the time of model development.

How many people are using the HIV-TRePS system?

To date there are over 700 users in 88 countries using the system. New users are signing-up and new cases being entered each month.

How can I turn on/off TRePS reports via email?

Log-in to TRePS and then in the top right hand corner of the home screen select 'My Account' on this screen and then 'Account Details' you will then see an option at the bottom of the screen entitled 'Email Reports' here you can change it to 'Yes' or 'No' and then Save your preference.

Where can I read the TRePS Terms and Conditions and Privacy Policy?

You can read the TRePS terms and conditions and TRePS privacy policy here.



What is the RDI?

The HIV Resistance Response Database Initiative is a not-for-profit group set up in 2002 as a wholly independent international body to:

  1. Be a global repository for HIV resistance and outcome data
  2. Use these data to develop computational models to predict patients response to antiretroviral treatment
  3. Make such models available free-of-charge over the internet

More information

Who are the RDI?

The RDI consists of a small research team based in the UK and a large global network of advisors, research partners and data donors.

Where does your data come from?

The data is donated to the RDI by hospitals, clinics, research programmes, pharmaceutical companies and other institutions and groups around the world. More information.