Ground truth in a medical machine learning project

Objective, well-prepared data, and well-prepared ground truth in medical project are the basis. After all, a machine learning model trained on a very good data set produces very good results. This is obvious. But how demanding professional project preparation is we have found out only recently. In a project where we use machine learning algorithms.

Ground truth in a medical machine learning project – the challenge

In one of our recent medical machine learning projects, we faced a major organizational challenge. The client had several hundred MRI brain scans available. We were to use them in our medical machine learning project. This was very good news, and such a number of studies augured well for the successful results of the work. However, there was an obstacle that prevented their immediate use. Unfortunately, the studies were “raw.” That is, unannotated and unlabeled in a way that would allow them to be used in the development of artificial intelligence models. Therefore, our task additionally became also the preparation of data “from scratch”. So that they could be used in the project. The data had to be prepared in just a few months.

Ground truth – the preparation

Preparing several hundred studies to our expected standard, which we could successfully use in a medical machine learning project, required us to take a completely new approach. We had to organize the whole process including both data annotation and verification of the accuracy of these annotations. We realized that these are crucial elements for the implementation of the whole process. If we use poor-quality data for training, we should also expect such results from the model trained on them.

The project involved medical imaging, so we engaged radiologists with the expertise necessary to properly analyze a given MRI study to prepare the data. We assembled a team of seven physicians, including two specialists with more than 10 years of experience in brain MRI diagnosis. A major challenge was to ensure there was no influence of a given physician’s personal experience, which is a common problem in the analysis of medical imaging data. It was important to ensure the maximum degree of objective analysis of a given study.


To achieve this result, the doctors were working into two teams. The first, consisting of five people, was in charge of preparing the annotations, that is, marking the areas of interest in this project on the surveys, divided into four subregions. The task of the members of the second team, consisting of the two most experienced radiologists, was to evaluate each annotation. They could accept such a study or refer to it for improvement. Thus, one specialist was describing each study. A second, more experienced specialist then reviewed this description. In this way, we obtained a double-checked annotation of each study, so to speak. Once in its preparation and the second time in its evaluation. Our task was to ensure a smooth flow of studies and annotations between doctors and to monitor the whole process.

ground truth preparation

We placed special emphasis in our medical machine learning project on validating the model and evaluating its performance. For this reason, after the complete and accepted annotation process and all studies, we selected a test set from the studies for training. Then, forwarded the annotations for evaluation to a second member of the evaluation team who had not previously evaluated a particular study to annotate the studies from the test set.

Three times evaluated such a study and its annotations, as a result. The first time in preparing the annotations, the second time in evaluating them, and the third time by another expert. Not until both radiologists with more than 10 years of experience accepted a study could it be used for model validation.

Objective analysis of the study in a medical machine learning project

The presented process allowed us to ensure a high degree of objectivity in the analysis of a given study and to eliminate the risk related to incorrect assessment resulting from personal experience of a given doctor.

cytat dotyczący przygotowania ground truth in medical project oraz symbol robotycznej ręki, która trzyma długopis i pisze
Ground truth in medical project – summary

To sum up – a lot depends on a reliable execution of the ground-truth stage in medical machine learning project. Objective and properly prepared data, on which we will train the model, will ensure the expected results. And very good results are after all the success of the entire project.