AI set to improve diagnostics on the basis of measurement data from blood samples

Symbolic picture for the article. The link opens the image in a large view. — Image: OTV / Peter Nickl

14. October 2024

Vast quantities of information can be gained from blood samples today using modern testing methods. The process for evaluating this hoard of data is equally complex as is the ability to draw reliable conclusions from it to diagnose diseases.

Collaborative project from FAU and biotech company BioVariance

Researchers from Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and biotech company BioVariance are hoping to develop new processes from artificial intelligence (AI) for this purpose. The system will be trained both with actual measurement data and generated synthetic datasets that will it to find irregularities that frequently occur in certain diseases. BioSamp has received around one million euros of funding from the Free State of Bavaria, with around a third of this sum going to FAU.

Physicians currently often only have a few dozen criteria on which to base their diagnoses. Omics analysis has the potential to change this. Using omics measurement data for tens of thousand of genes can be gained from a single drop of blood. For example, which quantities of which proteins does the sample contain? Which lipids and metabolic products? Which genes are currently being transcripted in the person who gave the blood sample?

“Essentially, everything that’s found in the blood is measured,” explains Prof. Dr. Daniel Tenbrinck, Professor for Data Science at FAU. “This huge amount of data has the potential to tell us a great deal about patients’ health – not only what diseases they are suffering from, but also which variants of them. Also, whether they may have an increased risk of having a heart attack or developing diabetes, but are currently completely healthy, thus allowing these conditions to be prevented through prophylaxis.”

Search for the needle in the haystack

Researchers around the globe are therefore searching for irregularities in omics data that are associated with certain diseases. Due to the amount of data involved however, this task is literally like searching for a needle in a haystack. This means that machine learning methods are frequently being used to make this task easier. “Artificial intelligence is being trained using a large quantity of omics data from patients and for the diagnoses they have been diagnosed with” explains Tenbrinck. “This enables the algorithm to learn to identify suspicious traces in new measurement values and interpret them accordingly.”

Omics data from thousands of people are needed to train the AI. Gaining this data is not only costly, but takes a long time. Tenbrinck would therefore like to use another strategy in conjunction with the company BioVariance. In the field, it is known as something called “synthetic data generation”. “For this method, we analyze up to 100 omics datasets using statistical models and look for patterns and regularities,” he says. “We then use these to produce new datasets that cannot be distinguished from the data from actual blood tests.”

This synthetically created information can then be used to train the AI. What sounds like a trick has actually proven itself many times in practice already. “Synthetic data generation is therefore a very active area of research in our field at the moment,” says Tenbrinck. For example, facial recognition software is often fed with portraits that have been distorted or where image noise has been added. The algorithm becomes more robust using this method, which means it does not allow itself to be misled so easily by an unfavorable angle in the image of a person or by poor lighting conditions.

The process can even be trained using completely new and artificially generated images. “To do so, we have to ensure that the faces in these images look realistic,” says Tenbrinck. For example, if they all have only one eye, the detection performance of the software trained with it will probably decrease. “We are investigating how we can generate synthetic omics data that are so realistic that they make the diagnoses produced by AI more robust and more precise,” emphasizes the researcher. “An important point to remember is that medical experts look at the synthetic datasets and evaluate how plausible they are.” Metaphorically speaking, this means the images with one-eyed faces would be discarded.

In focus: Long Covid and depression

The partners in the BioSamp project intend to make progress in the diagnosis of two medical conditions – severe depression and chronic fatigue syndrome, which frequently occurs with long Covid. “Both are conditions that cause a great deal of suffering,” emphasizes Tenbrinck. “Investigations are already underway at BioVariance about depression that we can use in our research.” The aim is, on the one hand, to reliably identify these conditions and potentially split them into different variants. For example, some patients suffering from depression respond better to certain treatment strategies and medication than others.

“We also want to make a contribution to finding out what is going wrong in the body and what is causing these conditions,” explains Tenbrinck. For example, the AI could detect a certain gene in the omics data that is particularly active in people suffering from depression. “We can then look at what is already known about this gene and make predictions about the potential development of the disease,” says researcher Tenbrinck. “Our findings could not only improve the diagnosis of diseases, but also treatments and preventive measures. This is what I think is so fascinating about this topic.”

Further information

Prof. Dr. Daniel Tenbrinck
Professorship for Data Science
Phone: + 49 9131 85 67233
daniel.tenbrinck@fau.de