Everything you’d like to know about AI projects, but are afraid to ask

Four years ago, I joined Graylight Imaging as an Account Manager knowing little about Machine Learning. On my first or second day, I met with our current Head of Project Management. They explained the mysteries of AI projects in medicine to me.

He then used an analogy that resonated with me: “How does a child learn what an elephant is?” Initially, the child has no knowledge. Next, we expose them to stories, picture books, and images of elephants. Finally, cartoons enter the mix. Whenever an elephant appears, we point it out, saying, “Look! An elephant!” One day, the child points at a picture and exclaims, “Mommy/daddy elephant!” What element triggered their recognition? The color? The trunk? The ears? The shape? The tusks? It’s difficult to say for sure. All we know is the child can identify an elephant and differentiate it from, say, a giraffe. In the same way, machine learning algorithms in AI projects within medicine function.

How to get started?

Once we know that we want to identify an elephant (or in the medical field: tumor, organ, bones, lymph nodes, coronary plaques, and anything else that can be identified in medical imaging studies), it would be good for us to define acceptance criteria for such a project precisely. Acceptance criteria are a set of parameters by which we will know whether the algorithm works with our expectations (whether it recognizes the elephant according to our expectations).

One example of such a parameter is the DICE score. The Dice score, is a metric that quantifies the overlap between two sets. In image segmentation, it is used to assess the resemblance between a predicted segmentation mask and the corresponding ground truth mask (thanks to it we can assess the score of how many elephants out of 100 have been recognized, if its 100 out of 100 – DICE score = 1, if its 80 = 0,8 DICE).

Is there any magic behind it, that is, what can the algorithm identify?

The easiest way to say it is that we are able to identify everything that a radiologist, cardiologist or other doctor specialized in clinical evaluation can identify from a medical imaging study. Standard medical AI algorithms do not detect things invisible to the naked eye (there is a field called Radiomics for that, but that’s a topic for a whole other story).

To answer the title question, no, there is no magic here. The algorithm will always recognize only those elements that we first point out to it in the learning data. Its advantage at this point is its objectivity (it will always evaluate the same study in the same way) and its ability to make quick volumetric calculations or, for example, to compare studies of a single patient over time.

So what is learning data then?

As in the example mentioned at the beginning of the child, we show him various pictures of an elephant. The same goes for algorithms, we must first show it what it would start detecting. For this, we need image data of the same type as the data it will later analyze.

Raw data alone isn’t enough. Every example you provide needs annotation. This means outlining the elephant each time, so the AI learns the specific shape, color, or other parameters to identify.

How many of these images do we need?

Unfortunately, to this question, there is no right and one answer. The simplest answer is “it depends.” On average, in AI projects in medicine, we assume that to create a proof of concept of the project, we need about 100 annotated images, and for the final product, the appropriate number of exams is between 300 and 1000.

We use Proof of Concepts (POCs) to investigate the feasibility of a technology and assess the algorithm’s ability to detect a specific change. POCs often serve as the go/no-go decision point for the entire project, leading to potentially significant cost savings. The number of studies required for the main project depends primarily on the project’s complexity and the desired level of accuracy.

How long does such a project take?

A POC project in medicine typically takes 4-6 weeks to complete after you provide us with the annotated studies. There are some factors that can influence this timeframe, though.

  • Algorithm Accuracy Needs: How precise does the algorithm need to be? Higher accuracy might require more time for refinement.
  • Certification Requirements: Will you need documentation for CE or FDA certification? Factor in extra time for this.

Main Project Timeline: Plan on 3-6 months to complete the main project from start to finish. Here’s a critical point:

  • Early Results Review: If you don’t see promising preliminary results within 2-3 months, strongly consider stopping the project to save resources.

Can any doctor annotate data?

Unfortunately, the answer to this question is no. The quality of the final algorithm depends largely on the source data it has been “fed”. If this data is prepared superficially, without due care, the algorithm trained on it can be very inaccurate and thus practically useless.

Therefore, it is worthwhile for the “ground truth” to be prepared by physicians who have already had experience in this, or at least under the supervision of someone like this. It is also desirable that the prepared data be consistent. According to the principle of 1 patient, 2 doctors, 3 opinions -> evaluation of medical imaging studies can differ significantly between annotators. Unfortunately, this has a very negative impact on the algorithm, which may then completely misunderstand what it is supposed to detect and how.

What is the project experience in medical AI?

AI projects in medicine, can in theory be outsourced to any software development outsourcing company that has a specification in ML. Why am I writing in theory? Because experience and specialization in medical algorithms provides several benefits. 

First of all, it allows you to better evaluate the algorithm’s performance. It is easier to see when the algorithm is not working and does not mark the searched areas correctly, which makes corrections and marking to the algorithm what it should not take into account easier. Previously developed workflows also help speed up the algorithm development process itself, saving time and costs.

Is Machine Learning in medicine only applicable in medical imaging research?

ML algorithms excel in various medical applications beyond diagnosis. They can predict disease occurrence in patients by analyzing their health data. This is done through predictive algorithms, which use patient data like gender, height, weight, blood results, and more to calculate the likelihood of developing a specific condition.

We are also aothers andect data directly from medical instruments, such as ventilators, blood pressure monitors and others, and enrich our predictions with actual data coming from these devices.


In this blog post, I set out to create a clear and concise guide that simplifies the initial stages of AI projects in medicine, particularly for those new to the field. If you’re considering a project involving medical algorithms, this article equips you with non-technical answers to those initial questions swirling in your head. Further, it allows us to dive right into the heart of the matter and focus on your specific challenges.

Do you have any additional questions? Contact Przemyslaw Urbanski: