Predictive analytics in health care: how can we know it works?

Abstract

The current interest in predictive analytics for improving health care is reflected by a surge in long-term investment in developing new technologies using artificial intelligence and machine learning to forecast future events (possibly in real time) to improve the health of individuals. Predictive algorithms or clinical prediction models, as they have historically been called, help identify individuals at increased likelihood of disease for diagnosis and prognosis (see Supplementary Material Table S1 for a glossary of terms used in this manuscript). In an era of personalized medicine, predictive algorithms are used to make clinical management decisions based on individual patient characteristics (rather than on population averages) and to counsel patients. The rate at which new algorithms are published shows no sign of abating, particularly with the increasing availability of Big Data, medical imaging, routinely collected electronic health records, and national registry data. The scientific community is making efforts to improve data sharing, increase study registration beyond clinical trials, and make reporting transparent and comprehensive with full disclosure of study results., We discuss the importance of transparency in the context of medical predictive analytics.

ALGORITHM PERFORMANCE IS NOT GUARANTEED: FULLY INDEPENDENT EXTERNAL VALIDATION IS KEY

Before recommending a predictive algorithm for clinical practice, it is important to know whether and for whom it works well. First, predictions should discriminate between individuals with and without the disease (ie, higher predictions in those with the disease compared to those without the disease). Risk predictions should be also accurate (often referred to as calibrated). Algorithm development may suffer from overfitting, which usually results in poorer discrimination and calibration when evaluated on new data. Although the clinical literature tends to focus on discrimination, calibration is clearly crucial. Inaccurate risk predictions can lead to inappropriate decisions or expectations, even when discrimination is good. Calibration has therefore been labeled the Achilles heel of prediction.

In addition, there is often substantial heterogeneity between populations, as well as changes in populations over time., For example, there may be differences between patients in academic hospitals compared with patients at regional hospitals, ethnicities, or past versus contemporary patients due to advances in patient care. Recent work indicated that the half-life of clinical data relevance can be remarkably short., Hence, algorithms are likely to perform differently across centers, settings, and time. On top of overfitting and heterogeneity between populations, operational heterogeneity can affect algorithm performance. Different hospitals may, for example, use different EHR software, imaging machines, or marker kits.,, As a result, the clinical utility of predictive algorithms for decision-making may vary greatly. It is well established that “internal validation” of performance using, for example, a train-test split of available data is insufficient. Rather, algorithms should undergo “external validation” on a different data set., Notably, algorithms developed using traditional study designs may not validate well when applied on electronic health record data., It is important to stress 3 issues. First, external validation should be extensive: it should take place at various sites in contemporary cohorts of patients from the targeted population. Second, performance should be monitored over time. Third, external validation by independent investigators is imperative. It is a good evolution to include an external validation as part of the algorithm development study, but one can imagine that algorithms with poor performance on a different data set may be less likely to get published in the first place. If performance in a specific setting is poor, an algorithm can be updated-specifically, its calibration., To counter temporal changes in populations, continual updating strategies may help. For example, QRISK2 models ( www.qrisk.org) are updated regularly as new data are continually being collected.

POTENTIAL HURDLES FOR MAKING PREDICTIVE ALGORITHMS PUBLICLY AVAILABLE

To allow others to independently evaluate the predictive accuracy, it is important to describe in full detail how the algorithm was developed. Algorithms should be available in a format that can readily be implemented by others. Not adhering to these principles severely limits the usefulness of the findings-surely a research waste. An analogous situation would be an article describing the findings from a randomized clinical trial without actually reporting the intervention effect or how to implement the intervention.

Transparent and full reporting

The Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement, a reporting guideline for studies on predictive algorithms, recommends that the equation behind an algorithm is presented in the publication describing its development. More explicitly, the mathematical formula of an algorithm should be available in full. This includes details such as which predictors are included, how they are coded (including ranges of any continuous predictors, units of measurement), and the values of the regression coefficients. Publications presenting new algorithms often fail to include key information such as specification of the baseline risk (namely, the intercept in logistic regression models for binary outcomes; the baseline hazard at 1 or more clinically relevant time points for time-to-event regression models). Without this information, making predictions is not possible. Below, we expand on modern artificial intelligence methods that do not produce straightforward mathematical equations.

Online calculators and mobile apps

It has become customary to implement algorithms as online calculators or mobile apps. Then, we depend on the researchers’ openness to provide clear and honest information about algorithm development and results of validation studies, with references to relevant publications. For example, FRAX predicts the 10-year probability of hip fracture and major osteoporotic fracture ( www.sheffield.ac.uk/FRAX/). FRAX is a collection of algorithms (eg, 68 country-specific equations), which are both freely available via a website interface or commercially available via a desktop application. However, none of these algorithms has been published in full. The release notes indicate that the algorithms are continually revised, but do not offer detailed information. This lack of full disclosure prohibits independent evaluation. In theory, we can try “reverse engineering” by reconstructing the equation based on risk estimates for a sample of patients (see Supplementary Material). However, such reverse engineering is not a realistic solution. The solution is to avoid hidden algorithms.

Online or mobile calculators allow the inclusion of algorithms into daily clinical routine, which is a positive evolution. However, it is impractical for large-scale independent validation studies, because information for every single patient has to be entered manually.

Machine learning algorithms

Machine learning methods, such as random forests or deep learning, are becoming increasingly popular to develop predictive algorithms., The architecture of these algorithms is often too complex to fully disentangle and report the relation between a set of predictors and the outcome (“black box”). This is the commonly addressed problem when discussing transparency of predictive analytics based on machine learning. We argue that algorithm availability is at least as important. A similar problem can affect regression-based algorithms that use complex spline functions to model continuous predictors. Software implementations are therefore imperative for validation purposes, in particular, because these algorithms have a higher risk of overfitting and instable performance., Machine learning algorithms can be stored in computer files that may be transferred to other computers to allow validation studies. Recently, initiatives in this direction are being set up.,

Proprietary algorithms

Developers may choose not to disclose an algorithm, and to offer the algorithm on a fee-for-service basis. For example, a biomarker-based algorithm to diagnose ovarian cancer has a cost of $897 per patient ( http://vermillion.com/2436-2/). Assume we want to validate this algorithm in a center that has 20% malignancies in the target population. If we want to recruit at least 100 patients in each outcome group, following current recommendations for validation studies, the study needs at least 500 patients. This implies a minimum cost of $448 500 in order to obtain useful information about whether this algorithm works in this particular center. It is important to emphasize this is just the cost required to judge whether the algorithm has any validity in this setting; there is no guarantee that it will be clinically useful.

Many predictive algorithms have been developed using financial support from public institutions. Then we believe that the results belong to the community and should be fully and publicly available. If this is the case, asking a small installation fee for an attractive and user-friendly calculator is defendable to cover software development and generate resources for maintenance and improvements. Such implementations facilitate uptake and inclusion into daily workflow.

Private companies may invest in the development of an algorithm that uses predictors for which the company offers measurement tools (eg, kits, biomarkers). In these instances, the return on investment should focus on the measurement tools, not on selling the algorithm. We argue that it is ethically unacceptable to have a business model that focuses on selling an algorithm. However, such business models may facilitate Food and Drug Administration (FDA) approval or Conformité Européenne (CE) marking of predictive algorithms (eg, https://www.hcanews.com/news/predictive-patient-surveillance-system-receives-fda-clearance). It is important to realize that regulatory approval does not imply clinical validity or usefulness of a predictive algorithm in a specific clinical setting.

THE IMPORTANCE OF ALGORITHM METADATA IN ORDER TO MAKE ALGORITHMS WORK

Although making algorithms fully and publicly available is imperative, the context of the algorithm is equally important. This extends the abovementioned issue of full and transparent reporting according to the TRIPOD guidelines. Reporting should provide full details of algorithm development practices. This includes-but is not limited to-the source of study data (e.g., retrospective EHR, randomized controlled trial data, or prospectively collected cohort data), the number and type of participating centers, the patient recruitment period, inclusion and exclusion criteria, clear definitions of predictors and the outcome, details on how variables were measured, detailed information on missing values and how these were handled, and a full account of the modeling strategy (eg, predictor selection, handling of continuous variables, hyperparameter tuning). Unfortunately, studies reveal time and again that such metadata are poorly reported., Even when authors develop an algorithm using sensible procedures (eg ,with low risk of overfitting), poor reporting will lead to poor understanding of the context, which may contribute to decreased performance on external validation. Initiatives such as the Observational Health Data Sciences and Informatics (OHDSI; http://ohdsi.org) focus on such contextual differences and aim to standardize procedures (eg, in terms of terminology, data formats, and definitions of variables) in order to lead to better and more applicable predictive algorithms., In addition, when an algorithm is made available electronically, we recommend it include an indication of the extent to which the algorithm has been validated.

CONCLUSION

Predictive algorithms should be fully and publicly available to facilitate independent external validation across various settings (Table 1). For complex algorithms, alternative and innovative solutions are needed; a calculator is a minimal requirement, but downloadable software to batch process multiple records is more efficient. We believe that selling predictions from an undisclosed algorithm is unethical. This article does not touch on legal consequences of using predictive algorithms, where issues such as algorithm availability or black-box predictions cannot be easily ignored. When journals consider manuscripts introducing a predictive algorithm, its availability should be a minimum requirement before acceptance. Clinical guideline documents should focus on publicly available algorithms that have been independently validated.

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American Medical Informatics Association online.

FUNDING

This work was funded by Research Foundation – Flanders (grant G0B4716N), Internal Funds KU Leuven (grant C24/15/037). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

AUTHOR CONTRIBUTORS

Conception: BVC, LW, DT, EWS, GSC. Writing-original draft preparation: BVC. Writing-review and editing: BVC, LW, DT, EWS, GSC. All authors approved the submitted version and agreed to be accountable.

Conflict of interest statement. LW is a postdoctoral fellow of the Research Foundation – Flanders. GSC was supported by the NIHR Biomedical Research Centre, Oxford..

REFERENCES

Browse

Article by channel:

Read more articles tagged: Predictive Analytics