Using visualizations to communicate data science – Case cancer risk analysis

Data science sounds complex, and it often is. However, the results of data analysis need not – and, indeed, should not – be complex. In this post, we show how interactive visualization can be used to effectively communicate risk estimates for cancer relapse.

The applications of advanced number crunching, or data science, show huge potential in the field of healthcare. It was recently estimated that a one percent reduction in process inefficiency in healthcare would lead to $63 billion in savings. In order to tap into this potential, data science and predictive analytics solutions need to be implemented and integrated as part of the clinical healthcare systems.

This requires applying the latest medical research results systematically to clinical care processes, and communicating the outcomes to the clinical staff and patients. Multiple experts are needed to bridge this gap between research and practice.

The first big step is to integrate latest computational methods from scientific publications into actual production systems in healthcare. Implementing reliable computational software for healthcare requires understanding of the medical domain and the computational methods – a task for experienced data scientists.

The second step is to make sure that the results are understood by their intended audience. In this case, that means patients and doctors. While state-of-the-art predictive analytics algorithms may be pure gibberish inside, their outcomes can often be nicely visualised for effective communication.

Case: Online tool for evaluating the risk of GIST recurrence   

Together with Reaktor data scientists and user interface designers, Netmedi developed an online calculator visualising the risk of gastrointestinal stromal tumor (GIST) recurrence and helping the doctors to evaluate the need for additional treatment after surgery. GIST Risk calculator is based on a research containing an international sample of 2000 diagnosed GIST-patients, published in 2012 by HUCH Cancer Center and Aalto University.

“The nonlinear mathematic model predicts the combined effect of the main prognostic factors more accurately than any known model before”, describes Aki Vehtari, Professor at Aalto University and the researcher behind the mathematical model of the risk analysis.


GIST Risk calculator provides the user with a risk prediction based on four different parameters: size of the tumor, mitotic count (represents amount of dividing cells indicating the aggressiveness of the tumor), tumor site (location of the tumor) and whether the tumor has been ruptured or not.

The tool then computes and visualizes estimates of the probability of the tumor recurring after a given time period after the surgery (at max 10 years). In addition, the uncertainty in the estimates is indicated as the 90% credible interval (the light blue area between the grey curves), meaning that the risk probability lies within the given range with 90 % confidence, according to the model.

The method is based on Gaussian Processes and Bayesian inference, tailored for this specific application to be suitable for assessing nonlinear effects and implicit interactions between input variables. The model also gives a robust estimate of the uncertainty in the results it provides in the form of credible intervals.

As the computational method is customised for this specific application, we could not use the existing machine learning libraries directly, and had to implement the required linear algebra computations manually.  While implementing the calculator, we wanted to explore how the technologies recently gathered popularity in the modern software development, such as Node.js and Docker, would fit for the task.

We began by experimenting if the results could be computed live with Node.js, but support to standard native libraries was not good enough yet. In the end, we came to the conclusion that Node.js’s event loop model does not suit compute-intensive linear algebra very well either, and ended up using a popular and order of magnitude faster solution: Python, fuelled with libraries such as NumPy and SciPy.

One aspect that we wanted to pursue was to make the calculator and the results of the research widely available and easy to use for everyone. The resulting web service uses modern web technologies to combine the state-of-the-art computational biology research with a convenient user interface, and is available for all cancer care and research professionals.

Recommended posts