Algorithm to predict pancreatic cancer risk based on disease trajectory

Table of Contents

The main question here is: iIs it possible to predict pancreatic cancer? The answer isn’t as obvious as we would have thought. However, we noticed a publication describing a deep learning medical algorithm to predict the risk of pancreatic cancer from disease trajectories.

This study used information encoded in the time sequence of clinical events from previously developed machine learning technology. The study initially utilized the Danish National Patient Registry (DNPR), encompassing data from 8.6 million patients between 1977 and 2018. Subsequently, a smaller group of patients from the United States Veterans Affairs (US-VA) Clinical Data Warehouse (CDW) was included. To maximize the extraction of predictive information from these records, a variety of machine learning methods were evaluated.

This type of cancer is a very insidious disease; in the initial stages of development, it often does not give any symptoms. Therefore, it is often detected only at a late stage. Patients diagnosed with the disease in its early stages have a possibility of being cured. This is made possible through a combination of treatments, including surgery, chemotherapy, and radiotherapy. Therefore, it is important to better understand what the risk factors for pancreatic cancer are in order to be able to detect it relatively early.

Risk Factors: Challenges and new approaches in diagnosis

Age is one of the main risk factors for this disease. In contrast, pancreatic cancer screening based solely on pancreatic cancer is impractical due to potentially costly clinical trials for large numbers of patients with a false positive prognosis.

Medical practitioners assess the risk of developing pancreatic cancer by considering family history, behavioral and clinical risk factors, and, more recently, circulating biomarkers and genetic predisposition.

The study performed in the publication aimed to predict the risk of pancreatic cancer on the basis of the actual longitudinal clinical records of a large number of patients. It then identified a moderate number of high-risk patients with the intention of facilitating the design of affordable surveillance programs for early detection.

Use of deep neural networks for pancreatic cancer risk prediction from sequential medical data

Developing realistic risk prediction models necessitates the careful selection of suitable machine learning (ML) methods. Deep learning (DL) techniques, specifically designed to handle large and noisy sequential data sets, are particularly promising in this regard.

To assess pancreatic cancer risk, ML predictive models have previously been built using data from:

health interview questionnaires,
medical records of family physicians controlled in relation to patients with other types of cancer,
actual hospital system data.

Although previous studies have demonstrated the informative value of medical records for cancer risk, they only used the occurrence of disease codes and not the time sequence of disease states on the patient’s trajectory.

Pancreatic cancer: a new (improved) prediction algorithm?

A machine learning model for predicting cancer risk based on disease trajectory consists of:

input data for each event on the trajectory (diagnosis code and timestamps);
embedding event features on vectors of real numbers;
encoding a trajectory in lower dimensional latent space;
Predicting the risk of cancer is time-dependent (method-dependent).

The likely outcome of personalised positive cancer risk prediction should ideally take into account the possibility of disease in the short or long term. They developed an AI method to not only predict the probability of cancer occurrence but also to assess risk at regular intervals following the initial prediction This approach allows for continuous monitoring and adjustment of risk assessments over time.

To enhance the interpretability of the trained models, the researchers analyzed the diagnosis

Training and prediction of pancreatic cancer risk from disease trajectories.

Source: https://www.nature.com/articles/s41591-023-02332-5/figures/1

Pancreatic cancer predictive algorithm results

If pancreatic cancer developed within 36 months, the best DNPR model achieved an AUROC value of 0.88. The value dropped to 0.83 when disease events from the last 3 months before the cancer diagnosis were excluded. This model estimated that among 1,000 patients over the age of 50, the risk of developing the disease is 59.

By contrast, when the Danish model was applied to the US data, its performance was lower (AUROC = 0.71), so retraining was necessary, which improved the results to AUROC = 0.78 and 0.76 for the last 3 months.

These results could aid in developing effective monitoring programs for high-risk patients. Early detection of aggressive cancer through these programs could potentially improve patients’ length and quality of life.

Predicting Pancreatic Cancer: A promising algorithm?

In one of our previous posts, we explored the concept of a gold standard in clinical predictive analytics. Today, we delved deeper into a specific topic: predicting pancreatic cancer.

In conclusion, the presented prediction model shows promise for clinical risk monitoring. AI applied to clinical data can create scalable tools for early cancer detection, improving patient outcomes and cost-effectiveness.

References:

Placido, D., Yuan, B., Hjaltelin, J.X. et al. A deep learning algorithm to predict risk of pancreatic cancer from disease trajectories. Nat Med 29, 1113–1122 (2023). https://doi.org/10.1038/s41591-023-02332-5