Image Processing: Techniques, Types, & Applications [2023]

Rohit Kundu

Deep learning has revolutionized the world of computer vision—the ability for machines to “see” and interpret the world around them.

In particular, Convolutional Neural Networks (CNNs) were designed to process image data more efficiently than traditional Multi-Layer Perceptrons (MLP).

Since images contain a consistent pattern spanning several pixels, processing them one pixel at a time—as MLPs do—is inefficient.

This is why CNNs that process images in patches or windows are now the de-facto choice for image processing tasks.

But let’s start from the beginning—

case study on image processing

‍ Here’s what we’ll cover:

What is Image Processing?

  • How Machines “See” Images?

Phases of Image Processing

Image processing techniques.

Manage your datasets, annotate data, and train models 10x faster.

Don't start empty-handed. Explore our repository of 500+ open datasets and test-drive V7's tools.

Digital Image processing is the class of methods that deal with manipulating digital images through the use of computer algorithms. It is an essential preprocessing step in many applications, such as face recognition, object detection, and image compression.

Image processing is done to enhance an existing image or to sift out important information from it. This is important in several Deep Learning-based Computer Vision applications, where such preprocessing can dramatically boost the performance of a model. Manipulating images, for example, adding or removing objects to images, is another application, especially in the entertainment industry.

This paper addresses a medical image segmentation problem, where the authors used image inpainting in their preprocessing pipeline for the removal of artifacts from dermoscopy images. Examples of this operation are shown below.

case study on image processing

The authors achieved a 3% boost in performance with this simple preprocessing procedure which is a considerable enhancement, especially in a biomedical application where the accuracy of diagnosis is crucial for AI systems. The quantitative results obtained with and without preprocessing for the lesion segmentation problem in three different datasets are shown below.

case study on image processing

Types of Images / How Machines “See” Images?

Digital images are interpreted as 2D or 3D matrices by a computer, where each value or pixel in the matrix represents the amplitude, known as the “intensity” of the pixel. Typically, we are used to dealing with 8-bit images, wherein the amplitude value ranges from 0 to 255.

case study on image processing

Thus, a computer “sees” digital images as a function: I(x, y) or I(x, y, z) , where “ I ” is the pixel intensity and (x, y) or (x, y, z) represent the coordinates (for binary/grayscale or RGB images respectively) of the pixel in the image.

case study on image processing

Computers deal with different “types” of images based on their function representations. Let us look into them next.

1. Binary Image

Images that have only two unique values of pixel intensity- 0 (representing black) and 1 (representing white) are called binary images. Such images are generally used to highlight a discriminating portion of a colored image. For example, it is commonly used for image segmentation, as shown below.

case study on image processing

2. Grayscale Image

Grayscale or 8-bit images are composed of 256 unique colors, where a pixel intensity of 0 represents the black color and pixel intensity of 255 represents the white color. All the other 254 values in between are the different shades of gray.

An example of an RGB image converted to its grayscale version is shown below. Notice that the shape of the histogram remains the same for the RGB and grayscale images.

case study on image processing

3. RGB Color Image

The images we are used to in the modern world are RGB or colored images which are 16-bit matrices to computers. That is, 65,536 different colors are possible for each pixel. “RGB” represents the Red, Green, and Blue “channels” of an image.

Up until now, we had images with only one channel. That is, two coordinates could have defined the location of any value of a matrix. Now, three equal-sized matrices (called channels), each having values ranging from 0 to 255, are stacked on top of each other, and thus we require three unique coordinates to specify the value of a matrix element.

Thus, a pixel in an RGB image will be of color black when the pixel value is (0, 0, 0) and white when it is (255, 255, 255). Any combination of numbers in between gives rise to all the different colors existing in nature. For example, (255, 0, 0) is the color red (since only the red channel is activated for this pixel). Similarly, (0, 255, 0) is green and (0, 0, 255) is blue.

An example of an RGB image split into its channel components is shown below. Notice that the shapes of the histograms for each of the channels are different.

case study on image processing

4. RGBA Image

RGBA images are colored RGB images with an extra channel known as “alpha” that depicts the opacity of the RGB image. Opacity ranges from a value of 0% to 100% and is essentially a “see-through” property.

Opacity in physics depicts the amount of light that passes through an object. For instance, cellophane paper is transparent (100% opacity), frosted glass is translucent, and wood is opaque. The alpha channel in RGBA images tries to mimic this property. An example of this is shown below.

case study on image processing

The fundamental steps in any typical Digital Image Processing pipeline are as follows:

1. Image Acquisition

The image is captured by a camera and digitized (if the camera output is not digitized automatically) using an analogue-to-digital converter for further processing in a computer.

2. Image Enhancement

In this step, the acquired image is manipulated to meet the requirements of the specific task for which the image will be used. Such techniques are primarily aimed at highlighting the hidden or important details in an image, like contrast and brightness adjustment, etc. Image enhancement is highly subjective in nature.

3. Image Restoration

This step deals with improving the appearance of an image and is an objective operation since the degradation of an image can be attributed to a mathematical or probabilistic model. For example, removing noise or blur from images.

4. Color Image Processing

This step aims at handling the processing of colored images (16-bit RGB or RGBA images), for example, peforming color correction or color modeling in images.

5. Wavelets and Multi-Resolution Processing

Wavelets are the building blocks for representing images in various degrees of resolution. Images subdivision successively into smaller regions for data compression and for pyramidal representation.

6. Image Compression

For transferring images to other devices or due to computational storage constraints, images need to be compressed and cannot be kept at their original size. This is also important in displaying images over the internet; for example, on Google, a small thumbnail of an image is a highly compressed version of the original. Only when you click on the image is it shown in the original resolution. This process saves bandwidth on the servers.

7. Morphological Processing

Image components that are useful in the representation and description of shape need to be extracted for further processing or downstream tasks. Morphological Processing provides the tools (which are essentially mathematical operations) to accomplish this. For example, erosion and dilation operations are used to sharpen and blur the edges of objects in an image, respectively.

8. Image Segmentation

This step involves partitioning an image into different key parts to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation allows for computers to put attention on the more important parts of the image, discarding the rest, which enables automated systems to have improved performance.

9. Representation and Description

Image segmentation procedures are generally followed by this step, where the task for representation is to decide whether the segmented region should be depicted as a boundary or a complete region. Description deals with extracting attributes that result in some quantitative information of interest or are basic for differentiating one class of objects from another.

10. Object Detection and Recognition

After the objects are segmented from an image and the representation and description phases are complete, the automated system needs to assign a label to the object—to let the human users know what object has been detected, for example, “vehicle” or “person”, etc.

11. Knowledge Base

Knowledge may be as simple as the bounding box coordinates for an object of interest that has been found in the image, along with the object label assigned to it. Anything that will help in solving the problem for the specific task at hand can be encoded into the knowledge base.

Create pixel-perfect annotations. Generate ground truth hassle-free. Leverage intuitive UX and neural networks.

Mockups of images annotated on V7 platform

Image processing can be used to improve the quality of an image, remove undesired objects from an image, or even create new images from scratch. For example, image processing can be used to remove the background from an image of a person, leaving only the subject in the foreground.

Image processing is a vast and complex field, with many different algorithms and techniques that can be used to achieve different results. In this section, we will focus on some of the most common image processing tasks and how they are performed.

Task 1: Image Enhancement

One of the most common image processing tasks is an image enhancement, or improving the quality of an image. It has crucial applications in Computer Vision tasks, Remote Sensing, and surveillance. One common approach is adjusting the image's contrast and brightness. 

Contrast is the difference in brightness between the lightest and darkest areas of an image. By increasing the contrast, the overall brightness of an image can be increased, making it easier to see. Brightness is the overall lightness or darkness of an image. By increasing the brightness, an image can be made lighter, making it easier to see. Both contrast and brightness can be adjusted automatically by most image editing software, or they can be adjusted manually.

case study on image processing

However, adjusting the contrast and brightness of an image are elementary operations. Sometimes an image with perfect contrast and brightness, when upscaled, becomes blurry due to lower pixel per square inch (pixel density). To address this issue, a relatively new and much more advanced concept of Image Super-Resolution is used, wherein a high-resolution image is obtained from its low-resolution counterpart(s). Deep Learning techniques are popularly used to accomplish this.

case study on image processing

For example, the earliest example of using Deep Learning to address the Super-Resolution problem is the SRCNN model, where a low-resolution image is first upscaled using traditional Bicubic Interpolation and then used as the input to a CNN model. The non-linear mapping in the CNN extracts overlapping patches from the input image, and a convolution layer is fitted over the extracted patches to obtain the reconstructed high-resolution image. The model framework is depicted visually below.

case study on image processing

An example of the results obtained by the SRCNN model compared to its contemporaries is shown below.

case study on image processing

Task 2: Image Restoration

The quality of images could degrade for several reasons, especially photos from the era when cloud storage was not so commonplace. For example, images scanned from hard copies taken with old instant cameras often acquire scratches on them.

case study on image processing

Image Restoration is particularly fascinating because advanced techniques in this area could potentially restore damaged historical documents. Powerful Deep Learning-based image restoration algorithms may be able to reveal large chunks of missing information from torn documents.

Image inpainting, for example, falls under this category, and it is the process of filling in the missing pixels in an image. This can be done by using a texture synthesis algorithm, which synthesizes new textures to fill in the missing pixels. However, Deep Learning-based models are the de facto choice due to their pattern recognition capabilities.

case study on image processing

An example of an image painting framework (based on the U-Net autoencoder) was proposed in this paper that uses a two-step approach to the problem: a coarse estimation step and a refinement step. The main feature of this network is the Coherent Semantic Attention (CSA) layer that fills the occluded regions in the input images through iterative optimization. The architecture of the proposed model is shown below.

case study on image processing

Some example results obtained by the authors and other competing models are shown below.

case study on image processing

Task 3: Image Segmentation

Image segmentation is the process of partitioning an image into multiple segments or regions. Each segment represents a different object in the image, and image segmentation is often used as a preprocessing step for object detection.

There are many different algorithms that can be used for image segmentation, but one of the most common approaches is to use thresholding. Binary thresholding, for example, is the process of converting an image into a binary image, where each pixel is either black or white. The threshold value is chosen such that all pixels with a brightness level below the threshold are turned black, and all pixels with a brightness level above the threshold are turned white. This results in the objects in the image being segmented, as they are now represented by distinct black and white regions.

case study on image processing

In multi-level thresholding, as the name suggests, different parts of an image are converted to different shades of gray depending on the number of levels. This paper , for example, used multi-level thresholding for medical imaging —specifically for brain MRI segmentation, an example of which is shown below.

case study on image processing

Modern techniques use automated image segmentation algorithms using deep learning for both binary and multi-label segmentation problems. For example, the PFNet or Positioning and Focus Network is a CNN-based model that addresses the camouflaged object segmentation problem. It consists of two key modules—the positioning module (PM) designed for object detection (that mimics predators that try to identify a coarse position of the prey); and the focus module (FM) designed to perform the identification process in predation for refining the initial segmentation results by focusing on the ambiguous regions. The architecture of the PFNet model is shown below.

case study on image processing

The results obtained by the PFNet model outperformed contemporary state-of-the-art models, examples of which are shown below.

case study on image processing

Task 4: Object Detection

Object Detection is the task of identifying objects in an image and is often used in applications such as security and surveillance. Many different algorithms can be used for object detection, but the most common approach is to use Deep Learning models, specifically Convolutional Neural Networks (CNNs).

case study on image processing

CNNs are a type of Artificial Neural Network that were specifically designed for image processing tasks since the convolution operation in their core helps the computer “see” patches of an image at once instead of having to deal with one pixel at a time. CNNs trained for object detection will output a bounding box (as shown in the illustration above) depicting the location where the object is detected in the image along with its class label.

An example of such a network is the popular Faster R-CNN ( R egion-based C onvolutional N eural N etwork) model, which is an end-to-end trainable, fully convolutional network. The Faster R-CNN model alternates between fine-tuning for the region proposal task (predicting regions in the image where an object might be present) and then fine-tuning for object detection (detecting what object is present) while keeping the proposals fixed. The architecture and some examples of region proposals are shown below.

case study on image processing

Task 5: Image Compression

Image compression is the process of reducing the file size of an image while still trying to preserve the quality of the image. This is done to save storage space, especially to run Image Processing algorithms on mobile and edge devices, or to reduce the bandwidth required to transmit the image.

Traditional approaches use lossy compression algorithms, which work by reducing the quality of the image slightly in order to achieve a smaller file size. JPEG file format, for example, uses the Discrete Cosine Transform for image compression.

Modern approaches to image compression involve the use of Deep Learning for encoding images into a lower-dimensional feature space and then recovering that on the receiver’s side using a decoding network. Such models are called autoencoders , which consist of an encoding branch that learns an efficient encoding scheme and a decoder branch that tries to revive the image loss-free from the encoded features.

case study on image processing

For example, this paper proposed a variable rate image compression framework using a conditional autoencoder. The conditional autoencoder is conditioned on the Lagrange multiplier, i.e., the network takes the Lagrange multiplier as input and produces a latent representation whose rate depends on the input value. The authors also train the network with mixed quantization bin sizes for fine-tuning the rate of compression. Their framework is depicted below.

case study on image processing

The authors obtained superior results compared to popular methods like JPEG, both by reducing the bits per pixel and in reconstruction quality. An example of this is shown below.

case study on image processing

Task 6: Image Manipulation

Image manipulation is the process of altering an image to change its appearance. This may be desired for several reasons, such as removing an unwanted object from an image or adding an object that is not present in the image. Graphic designers often do this to create posters, films, etc.

An example of Image Manipulation is Neural Style Transfer , which is a technique that utilizes Deep Learning models to adapt an image to the style of another. For example, a regular image could be transferred to the style of “Starry Night” by van Gogh. Neural Style Transfer also enables AI to generate art .

case study on image processing

An example of such a model is the one proposed in this paper that is able to transfer arbitrary new styles in real-time (other approaches often take much longer inference times) using an autoencoder-based framework. The authors proposed an adaptive instance normalization (AdaIN) layer that adjusts the mean and variance of the content input (the image that needs to be changed) to match those of the style input (image whose style is to be adopted). The AdaIN output is then decoded back to the image space to get the final style transferred image. An overview of the framework is shown below.

case study on image processing

Examples of images transferred to other artistic styles are shown below and compared to existing state-of-the-art methods.

case study on image processing

Task 7: Image Generation

Synthesis of new images is another important task in image processing, especially in Deep Learning algorithms which require large quantities of labeled data to train. Image generation methods typically use Generative Adversarial Networks (GANs) which is another unique neural network architecture .

case study on image processing

GANs consist of two separate models: the generator, which generates the synthetic images, and the discriminator, which tries to distinguish synthetic images from real images. The generator tries to synthesize images that look realistic to fool the discriminator, and the discriminator trains to better critique whether an image is synthetic or real. This adversarial game allows the generator to produce photo-realistic images after several iterations, which can then be used to train other Deep Learning models.

Task 8: Image-to-Image Translation

Image-to-Image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. For example, a free-hand sketch can be drawn as an input to get a realistic image of the object depicted in the sketch as the output, as shown below.

case study on image processing

‍ Pix2pix is a popular model in this domain that uses a conditional GAN (cGAN) model for general purpose image-to-image translation, i.e., several problems in image processing like semantic segmentation, sketch-to-image translation, and colorizing images, are all solved by the same network. cGANs involve the conditional generation of images by a generator model. For example, image generation can be conditioned on a class label to generate images specific to that class.

case study on image processing

Pix2pix consists of a U-Net generator network and a PatchGAN discriminator network, which takes in NxN patches of an image to predict whether it is real or fake, unlike traditional GAN models. The authors argue that such a discriminator enforces more constraints that encourage sharp high-frequency detail. Examples of results obtained by the pix2pix model on image-to-map and map-to-image tasks are shown below.

case study on image processing

Key Takeaways

The information technology era we live in has made visual data widely available. However, a lot of processing is required for them to be transferred over the internet or for purposes like information extraction, predictive modeling, etc.

The advancement of deep learning technology gave rise to CNN models, which were specifically designed for processing images. Since then, several advanced models have been developed that cater to specific tasks in the Image Processing niche. We looked at some of the most critical techniques in Image Processing and popular Deep Learning-based methods that address these problems, from image compression and enhancement to image synthesis.

Recent research is focused on reducing the need for ground truth labels for complex tasks like object detection, semantic segmentation, etc., by employing concepts like Semi-Supervised Learning and Self-Supervised Learning , which makes models more suitable for broad practical applications.

If you’re interested in learning more about computer vision, deep learning, and neural networks, have a look at these articles:

  • Deep Learning 101: Introduction [Pros, Cons & Uses]
  • What Is Computer Vision? [Basic Tasks & Techniques]
  • Convolutional Neural Networks: Architectures, Types & Examples

case study on image processing

Rohit Kundu is a Ph.D. student in the Electrical and Computer Engineering department of the University of California, Riverside. He is a researcher in the Vision-Language domain of AI and published several papers in top-tier conferences and notable peer-reviewed journals.

“Collecting user feedback and using human-in-the-loop methods for quality control are crucial for improving Al models over time and ensuring their reliability and safety. Capturing data on the inputs, outputs, user actions, and corrections can help filter and refine the dataset for fine-tuning and developing secure ML solutions.”

Building AI products? This guide breaks down the A to Z of delivering an AI success story.

case study on image processing

Related articles

A Friendly Guide to LabelImg [+Open Datasets, Models, Alternative Tools]

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Open access
  • Published: 12 April 2022

Machine learning for medical imaging: methodological failures and recommendations for the future

  • Gaël Varoquaux 1 , 2 , 3 &
  • Veronika Cheplygina   ORCID: orcid.org/0000-0003-0176-9324 4  

npj Digital Medicine volume  5 , Article number:  48 ( 2022 ) Cite this article

60k Accesses

168 Citations

281 Altmetric

Metrics details

  • Computer science
  • Medical research
  • Research data

Research in computer analysis of medical images bears many promises to improve patients’ health. However, a number of systematic challenges are slowing down the progress of the field, from limitations of the data, such as biases, to research incentives, such as optimizing for publication. In this paper we review roadblocks to developing and assessing methods. Building our analysis on evidence from the literature and data challenges, we show that at every step, potential biases can creep in. On a positive note, we also discuss on-going efforts to counteract these problems. Finally we provide recommendations on how to further address these problems in the future.

Similar content being viewed by others

case study on image processing

Towards a general-purpose foundation model for computational pathology

Richard J. Chen, Tong Ding, … Faisal Mahmood

case study on image processing

A visual-language foundation model for computational pathology

Ming Y. Lu, Bowen Chen, … Faisal Mahmood

case study on image processing

Segment anything in medical images

Jun Ma, Yuting He, … Bo Wang

Introduction

Machine learning, the cornerstone of today’s artificial intelligence (AI) revolution, brings new promises to clinical practice with medical images 1 , 2 , 3 . For example, to diagnose various conditions from medical images, machine learning has been shown to perform on par with medical experts 4 . Software applications are starting to be certified for clinical use 5 , 6 . Machine learning may be the key to realizing the vision of AI in medicine sketched several decades ago 7 .

The stakes are high, and there is a staggering amount of research on machine learning for medical images. But this growth does not inherently lead to clinical progress. The higher volume of research could be aligned with the academic incentives rather than the needs of clinicians and patients. For example, there can be an oversupply of papers showing state-of-the-art performance on benchmark data, but no practical improvement for the clinical problem. On the topic of machine learning for COVID, Robert et al. 8 reviewed 62 published studies, but found none with potential for clinical use.

In this paper, we explore avenues to improve clinical impact of machine learning in medical imaging. After sketching the situation, documenting uneven progress in Section It’s not all about larger datasets, we study a number of failures frequent in medical imaging papers, at different steps of the “publishing lifecycle”: what data to use (Section Data, an imperfect window on the clinic), what methods to use and how to evaluate them (Section Evaluations that miss the target), and how to publish the results (Section Publishing, distorted incentives). In each section, we first discuss the problems, supported with evidence from previous research as well as our own analyses of recent papers. We then discuss a number of steps to improve the situation, sometimes borrowed from related communities. We hope that these ideas will help shape research practices that are even more effective at addressing real-world medical challenges.

It’s not all about larger datasets

The availability of large labeled datasets has enabled solving difficult machine learning problems, such as natural image recognition in computer vision, where datasets can contain millions of images. As a result, there is widespread hope that similar progress will happen in medical applications, algorithm research should eventually solve a clinical problem posed as discrimination task. However, medical datasets are typically smaller, on the order of hundreds or thousands: 9 share a list of sixteen “large open source medical imaging datasets”, with sizes ranging from 267 to 65,000 subjects. Note that in medical imaging we refer to the number of subjects, but a subject may have multiple images, for example, taken at different points in time. For simplicity here we assume a diagnosis task with one image/scan per subject.

Few clinical questions come as well-posed discrimination tasks that can be naturally framed as machine-learning tasks. But, even for these, larger datasets have to date not lead to the progress hoped for. One example is that of early diagnosis of Alzheimer’s disease (AD), which is a growing health burden due to the aging population. Early diagnosis would open the door to early-stage interventions, most likely to be effective. Substantial efforts have acquired large brain-imaging cohorts of aging individuals at risk of developing AD, on which early biomarkers can be developed using machine learning 10 . As a result, there have been steady increases in the typical sample size of studies applying machine learning to develop computer-aided diagnosis of AD, or its predecessor, mild cognitive impairment. This growth is clearly visible in publications, as on Fig. 1 a, a meta-analysis compiling 478 studies from 6 systematic reviews 4 , 11 , 12 , 13 , 14 , 15 .

figure 1

A meta-analysis across 6 review papers, covering more than 500 individual publications. The machine-learning problem is typically formulated as distinguishing various related clinical conditions, Alzheimer’s Disease (AD), Healthy Control (HC), and Mild Cognitive Impairment, which can signal prodromal Alzheimer’s . Distinguishing progressive mild cognitive impairment (pMCI) from stable mild cognitive impairment (sMCI) is the most relevant machine-learning task from the clinical standpoint. a Reported sample size as a function of the publication year of a study. b Reported prediction accuracy as a function of the number of subjects in a study. c Same plot distinguishing studies published in different years.

However, the increase in data size (with the largest datasets containing over a thousand subjects) did not come with better diagnostic accuracy, in particular for the most clinically relevant question, distinguishing pathological versus stable evolution for patients with symptoms of prodromal Alzheimer’s (Fig. 1 b). Rather, studies with larger sample sizes tend to report worse prediction accuracy. This is worrisome, as these larger studies are closer to real-life settings. On the other hand, research efforts across time did lead to improvements even on large, heterogeneous cohorts (Fig. 1 c), as studies published later show improvements for large sample sizes (statistical analysis in Supplementary Information) . Current medical-imaging datasets are much smaller than those that brought breakthroughs in computer vision. Although a one-to-one comparison of sizes cannot be made, as computer vision datasets have many classes with high variation (compared to few classes with less variation in medical imaging), reaching better generalization in medical imaging may require assembling significantly larger datasets, while avoiding biases created by opportunistic data collection, as described below.

Data, an imperfect window on the clinic

Datasets may be biased: reflect an application only partly.

Available datasets only partially reflect the clinical situation for a particular medical condition, leading to dataset bias 16 . As an example, a dataset collected as part of a population study might have different characteristics that people who are referred to the hospital for treatment (higher incidence of a disease). As the researcher may be unaware of the corresponding dataset bias is can lead to important that shortcomings of the study. Dataset bias occurs when the data used to build the decision model (the training data), has a different distribution than the data on which it should be applied 17 (the test data). To assess clinically-relevant predictions, the test data must match the actual target population, rather than be a random subset of the same data pool as the train data, the common practice in machine-learning studies. With such a mismatch, algorithms which score high in benchmarks can perform poorly in real world scenarios 18 . In medical imaging, dataset bias has been demonstrated in chest X-rays 19 , 20 , 21 , retinal imaging 22 , brain imaging 23 , 24 , histopathology 25 , or dermatology 26 . Such biases are revealed by training and testing a model across datasets from different sources, and observing a performance drop across sources.

There are many potential sources of dataset bias in medical imaging, introduced at different phases of the modeling process 27 . First, a cohort may not appropriately represent the range of possible patients and symptoms, a bias sometimes called spectrum bias 28 . A detrimental consequence is that model performance can be overestimated for different groups, for example between male and female individuals 21 , 26 . Yet medical imaging publications do not always report the demographics of the data.

Imaging devices or procedures may lead to specific measurement biases. A bias particularly harmful to clinically relevant automated diagnosis is when the data capture medical interventions. For instance, on chest X-ray datasets, images for the “pneumothorax” condition sometimes show a chest drain, which is a treatment for this condition, and which would not yet be present before diagnosis 29 . Similar spurious correlations can appear in skin lesion images due to markings placed by dermatologists next to the lesions 30 .

Labeling errors can also introduce biases. Expert human annotators may have systematic biases in the way they assign different labels 31 , and it is seldom possible to compensate with multiple annotators. Using automatic methods to extract labels from patient reports can also lead to systematic errors 32 . For example, a report on a follow-up scan that does not mention previously-known findings, can lead to an incorrect “negative” labels.

Dataset availability distorts research

The availability of datasets can influence which applications are studied more extensively. A striking example can be seen in two applications of oncology: detecting lung nodules, and detecting breast tumors in radiological images. Lung datasets are widely available on Kaggle or grand-challenge.org , contrasted with (to our knowledge) only one challenge focusing on mammograms. We look at the popularity of these topics, here defined by the fraction of papers focusing on lung or breast imaging, either in literature on general medical oncology, or literature on AI. In medical oncology this fraction is relatively constant across time for both lung and breast imaging, but in the AI literature lung imaging publications show a substantial increase in 2016 (Fig. 2 , methodological details in Supplementary Information ). We suspect that the Kaggle lung challenges published around that time contributed to this disproportional increase. A similar point on dataset trends has been made throughout the history of machine learning in general 33 .

figure 2

We show the percentage of papers on lung cancer (in blue) vs breast cancer (in red), relative to all papers within two fields: medical oncology (solid line) and AI (dotted line). Details on how the papers are selected are given in the Supplementary Information) . The percentages are relatively constant, except lung cancer in AI, which shows an increase after 2016.

Let us build awareness of data limitations

Addressing such problems arising from the data requires critical thinking about the choice of datasets, at the project level, i.e. which datasets to select for a study or a challenge, and at a broader level, i.e. which datasets we work on as a community.

At the project level, the choice of the dataset will influence the models trained on the data, and the conclusions we can draw from the results. An important step is using datasets from multiple sources, or creating robust datasets from the start when feasible 9 . However, existing datasets can still be critically evaluated for dataset bias 34 , hidden subgroups of patients 29 , or mislabeled instances 35 . A checklist for such evaluation on computer vision datasets is presented in Zendel et al. 18 . When problems are discovered, relabeling a subset of the data can be a worthwhile investment 36 .

At the community level, we should foster understanding of the datasets’ limitations. Good documentation of datasets should describe their characteristics and data collection 37 . Distributed models should detail their limitations and the choices made to train them 38 .

Meta-analyses which look at evolution of dataset use in different areas are another way to reflect on current research efforts. For example, a survey of crowdsourcing in medical imaging 39 shows a different distribution of applications than surveys focusing on machine learning 1 , 2 . Contrasting more clinically-oriented venues to more technical venues can reveal opportunities for machine learning research.

Evaluations that miss the target

Evaluation error is often larger than algorithmic improvements.

Research on methods often focuses on outperforming other algorithms on benchmark datasets. But too strong a focus on benchmark performance can lead to diminishing returns , where increasingly large efforts achieve smaller and smaller performance gains. Is this also visible in the development of machine learning in medical imaging?

We studied performance improvements in 8 Kaggle medical-imaging challenges, 5 on detection of diagnosis of diseases and 3 on image segmentation (details in Supplementary Information) . We use the differences in algorithms performance between the public and private leaderboards (two test sets used in the challenge) to quantify the evaluation noise –the spread of performance differences between the public and private test sets–, in Fig. 3 . We compare its distribution to the winner gap —the difference in performance between the best algorithm, and the “top 10%” algorithm.

figure 3

The blue violin plot shows the evaluation noise —the distribution of differences between public and private leaderboards. A systematic shift between public and private set (positive means that the private leaderboard is better than the public leaderboard) indicates overfitting or dataset bias. The width of this distribution shows how noisy the evaluation is, or how representative the public score is for the private score. The brown bar is the winner gap , the improvement between the top-most model (the winner) and the 10% best model. It is interesting to compare this improvement to the shift and width in the difference between the public and private sets: if the winner gap is smaller, the 10% best models reached diminishing returns and did not lead to a actual improvement on new data.

Overall, 6 of the 8 challenges are in the diminishing returns category. For 5 challenges—lung cancer, schizophrenia, prostate cancer diagnosis and intracranial hemorrhage detection—the evaluation noise is worse than the winner gap. In other words, the gains made by the top 10% of methods are smaller than the expected noise when evaluating a method.

For another challenge, pneumothorax segmentation, the performance on the private set is worse than on the public set, revealing an overfit larger than the winner gap. Only two challenges (covid 19 abnormality and nerve segmentation) display a winner gap larger than the evaluation noise, meaning that the winning method made substantial improvements compared to the 10% competitor.

Improper evaluation procedures and leakage

Unbiased evaluation of model performance relies on training and testing the models with independent sets of data 40 . However incorrect implementations of this procedure can easily leak information, leading to overoptimistic results. For example some studies classifying ADHD based on brain imaging have engaged in circular analysis 41 , performing feature selection on the full dataset, before cross-validation. Another example of leakage arises when repeated measures of an individual are split across train and test set, the algorithm then learning to recognize the individual patient rather than markers of a condition 42 .

A related issue, yet more difficult to detect, is what we call “overfitting by observer”: even when using cross-validation, overfitting may still occur by the researcher adjusting the method to improve the observed cross-validation performance, which essentially includes the test folds into the validation set of the model. Skocik et al. 43 provide an illustration of this phenomenon by showing how by adjusting the model this way can lead to better-than-random cross-validation performance for randomly generated data. This can explain some of the overfitting visible in challenges (Section Evaluation error is often larger than algorithmic improvements), though with challenges a private test set reveals the overfitting, which is often not the case for published studies. Another recommendation for challenges would be to hold out several datasets (rather than a part of the same dataset), as is for example done in the Decathlon challenge 44 .

Metrics that do not reflect what we want

Evaluating models requires choosing a suitable metric. However, our understanding of “suitable” may change over time. For example, an image similarity metric which was widely used to evaluate image registration algorithms, was later shown to be ineffective as scrambled images could lead to high scores 45 .

In medical image segmentation, Maier-Hein et al. 46 review 150 challenges and show that the typical metrics used to rank algorithms are sensitive to different variants of the same metric, casting doubt on the objectivity of any individual ranking.

Important metrics may be missing from evaluation. Next to typical classification metrics (sensitivity, specificity, area under the curve), several authors argue for a calibration metric that compares the predicted and observed probabilities 28 , 47 .

Finally, the metrics used may not be synonymous with practical improvement 48 , 49 . For example, typical metrics in computer vision do not reflect important aspects of image recognition, such as robustness to out-of-distribution examples 49 . Similarly, in medical imaging, improvements in traditional metrics may not necessarily translate to different clinical outcomes, e.g. robustness may be more important than an accurate delineation in a segmentation application.

Incorrectly chosen baselines

Developing new algorithms builds upon comparing these to baselines. However, if these baselines are poorly chosen, the reported improvement may be misleading.

Baselines may not properly account for recent progress, as revealed in machine-learning applications to healthcare 50 , but also other applications of machine learning 51 , 52 , 53 .

Conversely, one should not forget simple approaches effective for the problem at hand. For example, Wen et al. 14 show that convolutional neural networks do not outperform support vector machines for Alzheimer’s disease diagnosis from brain imaging.

Finally, minute implementation details of algorithms may be important and many are not aware of implementation factors 54 .

Statistical significance not tested, or misunderstood

Experimental results are by nature noisy: results may depend on which specific samples were used to train the models, the random initializations, small differences in hyper-parameters 55 . However, benchmarking predictive models currently lacks well-adopted statistical good practices to separate out noise from generalizable findings.

A first, well-documented, source of brittleness arises from machine-learning experiments with too small sample sizes 56 . Indeed, testing predictive modeling requires many samples, more than conventional inferential studies, else the measured prediction accuracy may be a distant estimation of real-life performance. Sample sizes are growing, albeit slowly 57 . On a positive note, a meta-analysis of public vs private leaderboards on Kaggle 58 suggests that overfitting is less of an issue with “large enough” test data (at least several thousands).

Another challenge is that strong validation of a method requires it to be robust to details of the data. Hence validation should go beyond a single dataset, and rather strive for statistical consensus across multiple datasets 59 . Yet, the corresponding statistical procedures require dozens of datasets to establish significance and are seldom used in practice. Rather, medical imaging research often reuses the same datasets across studies, which raises the risk of finding an algorithm that performs well by chance, in an implicit multiple comparison problem 60 .

But overall medical imaging research seldom analyzes how likely empirical results are to be due to chance: only 6% of segmentation challenges surveyed 61 , and 15% out of 410 popular computer science papers published by ACM used a statistical test 62 .

However, null-hypothesis tests are often misinterpreted 63 , with two notable challenges: (1) the lack of statistically significant results does not demonstrate the absence of effect, and (2) any trivial effect can be significant given enough data 64 , 65 . For these reasons, Bouthiellier et al. 66 recommend to replace traditional null-hypothesis testing with superiority testing , testing that the improvement is above a given threshold.

Let us redefine evaluation

Higher standards for benchmarking.

Good machine-learning benchmarks are difficult. We compile below several recognized best practices for medical machine learning evaluation 28 , 40 , 67 , 68 :

Safeguarding from data leakage by separating out all test data from the start, before any data transformation.

A documented way of selecting model hyper-parameters (including architectural parameters for neural networks, the use of additional (unlabeled) dataset or transfer learning 2 ), without ever using data from the test set.

Enough data in the test set to bring statistical power, at least several hundreds samples, ideally thousands or more 9 , and confidence intervals on the reported performance metric—see Supplementary Information . In general, more research on appropriate sample sizes for machine learning studies would be helpful.

Rich data to represent the diversity of patients and disease heterogeneity, ideally multi-institutional data including all relevant patient demographics and disease state, with explicit inclusion criteria; other cohorts with different recruitment go the extra mile to establish external validity 69 , 70 .

Strong baselines that reflect the state of the art of machine-learning research, but also historical solutions including clinical methodologies not necessarily relying on medical imaging.

A discussion the variability of the results due to arbitrary choices (random seeds) and data sources with an eye on statistical significance—see Supplementary Information .

Using different quantitative metrics to capture the different aspects of the clinical problem and relating them to relevant clinical performance metrics. In particular, the potential health benefits from a detection of the outcome of interest should be used to choose the right trade off between false detections and misses 71 .

Adding qualitative accounts and involving groups that will be most affected by the application in the metric design 72 .

More than beating the benchmark

Even with proper validation and statistical significance testing, measuring a tiny improvement on a benchmark is seldom useful. Rather, one view is that, beyond rejecting a null, a method should be accepted based on evidence that it brings a sizable improvement upon the existing solutions. This type of criteria is related to superiority tests sometimes used in clinical trials 73 , 74 , 75 . These tests are easy to implement in predictive modeling benchmarks, as they amount to comparing the observed improvement to variation of the results due to arbitrary choices such as data sampling or random seeds 55 .

Organizing blinded challenges, with a hidden test set, mitigate the winner’s curse. But to bring progress, challenges should not only focus on the winner. Instead, more can be learned by comparing the competing methods and analyzing the determinants of success, as well as failure cases.

Evidence-based medicine good practices

A machine-learning algorithm deployed in clinical practice is a health intervention. There is a well-established practice to evaluate the impact of health intervention, building mostly on randomized clinical trials 76 . These require actually modifying patients’ treatments and thus should be run only after thorough evaluation on historical data.

A solid trial evaluates a well-chosen measure of patient health outcome, as opposed to predictive performance of an algorithm. Many indirect mechanisms may affect this outcome, including how the full care processes adapts to the computer-aided decision. For instance, a positive consequence of even imperfect predictions may be reallocating human resources to complex cases. But a negative consequence may be over-confidence leading to an increase in diagnostic errors. Cluster randomized trials can account for how modifications at the level of care unit impact the individual patient: care units, rather than individuals are randomly allocated to receive the intervention (the machine learning algorithm) 77 . Often, double blind is impossible: the care provider is aware of which arm of the study is used, the baseline condition or the system evaluated. Providers’ expectations can contribute to the success of a treatment, for instance via indirect placebo or nocebo effects 78 , making objective evaluation of the health benefits challenging, if these are small.

Publishing, distorted incentives

No incentive for clarity.

The publication process does not create incentives for clarity. Efforts to impress may give rise to unnecessary “mathiness” of papers or suggestive language 79 (such as “human-level performance”).

Important details may be omitted, from ablation experiments showing what part of the method drives improvements 79 , to reporting how algorithms were evaluated in a challenge [ 46 ]. This in turn undermines reproducibility: being able to reproduce the exact results or even draw the same conclusions 80 , 81 .

Optimizing for publication

As researchers our goal should be to solve scientific problems. Yet, the reality of the culture we exist in can distort this objective. Goodhart’s law summarizes well the problem: when a measure becomes a target, it ceases to be a good measure . As our academic incentive system is based publications, it erodes their scientific content via Goodhart’s law.

Methods publication are selected for their novelty. Yet, comparing 179 classifiers on 121 datasets shows no statistically significant differences between the top methods [ 82 ]. In order to sustain novelty, researchers may be introducing unnecessary complexity into the methods, that do not improve their prediction but rather contribute to technical debt, making systems harder to maintain and deploy 83 .

Another metric emphasized is obtaining “state-of-the-art” results, which leads to several of the evaluation problems outlined in Section Evaluations that miss the target. The pressure to publish “good” results can aggravate methodological loopholes 84 , for instance gaming the evaluation in machine learning 85 . It is then all too appealing to find after-the-fact theoretical justifications of positive yet fragile empirical findings. This phenomenon, known as HARKing (hypothesizing after the results are known) 86 , has been documented in machine learning 87 and computer science in general 62 .

Finally, the selection of publications creates the so-called “file drawer problem” 88 : positive results, some due to experimental flukes, are more likely to be published than corresponding negative findings. For example, in 410 most downloaded papers from the ACM, 97% of the papers which used significance testing had a finding with p -value of less than 0.05 62 . It seems highly unlikely that only 3% of the initial working hypotheses—even for impactful work—turned out not confirmed.

Let us improve our publication norms

Fortunately there are various alleys to improve reporting and transparency. For instance, the growing set of open datasets could be leveraged for collaborative work beyond the capacities of a single team 89 . The set of metrics studied could then be broadened, shifting the publication focus away from a single-dimension benchmark. More metrics can indeed help understanding a method’s strengths and weaknesses 41 , 90 , 91 , exploring for instance calibration metrics 28 , 47 , 92 or learning curves 93 . The medical-research literature has several reporting guidelines for prediction studies 67 , 94 , 95 . They underline many points raised in previous sections: reporting on how representative the study sample is, on the separation between train and test data, on the motivation for the choice of outcome, evaluation metrics, and so forth. Unfortunately, algorithmic research in medical imaging seldom refers to these guidelines.

Methods should be studied on more than prediction performance: reproducibility 81 , carbon footprint 96 , or a broad evaluation of costs should be put in perspective with the real-world patient outcomes, from a putative clinical use of the algorithms 97 .

Preregistration or registered reports can bring more robustness and trust: the motivation and experimental setup of a paper are to be reviewed before empirical results are available, and thus the paper is be accepted before the experiments are run 98 . Translating this idea to machine learning faces the challenge that new data is seldom acquired in a machine learning study, yet it would bring sizeable benefits 62 , 99 .

More generally, accelerating the progress in science calls for accepting that some published findings are sometimes wrong 100 . Popularizing different types of publications may help, for example publishing negative results 101 , replication studies 102 , commentaries 103 and reflections on the field 68 or the recent NeurIPS Retrospectives workshops. Such initiatives should ideally be led by more established academics, and be welcoming of newcomers 104 .

Conclusions

Despite great promises, the extensive research in medical applications of machine learning seldom achieves a clinical impact. Studying the academic literature and data-science challenges reveals troubling trends: accuracy on diagnostic tasks progresses slower on research cohorts that are closer to real-life settings; methods research is often guided by dataset availability rather than clinical relevance; many developments of model bring improvements smaller than the evaluation errors. We have surveyed challenges of clinical machine-learning research that can explain these difficulties. The challenges start with the choice of datasets, plague model evaluation, and are amplified by publication incentives. Understanding these mechanisms enables us to suggest specific strategies to improve the various steps of the research cycle, promoting publications best practices 105 . None of these strategies are silver-bullet solutions. They rather require changing procedures, norms, and goals. But implementing them will help fulfilling the promises of machine-learning in healthcare: better health outcomes for patients with less burden on the care system.

Data availability

For reproducibility, all data used in our analyses are available on https://github.com/GaelVaroquaux/ml_med_imaging_failures .

Code availability

For reproducibility, all code for our analyses is available on https://github.com/GaelVaroquaux/ml_med_imaging_failures .

Litjens, G. et al. A survey on deep learning in medical image analysis. Med. Image Anal. 42 , 60–88 (2017).

Article   PubMed   Google Scholar  

Cheplygina, V., de Bruijne, M. & Pluim, J. P. W. Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis. Med. Image Anal. 54 , 280–296 (2019).

Zhou, S. K. et al. A review of deep learning in medical imaging: Image traits, technology trends, case studies with progress highlights, and future promises. Proceedings of the IEEE1-19 (2020).

Liu, X. et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. The Lancet Digital Health (2019).

Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25 , 44–56 (2019).

Article   CAS   PubMed   Google Scholar  

Sendak, M. P. et al. A path for translation of machine learning products into healthcare delivery. Eur. Med. J. Innov. 10 , 19–00172 (2020).

Google Scholar  

Schwartz, W. B., Patil, R. S. & Szolovits, P. Artificial intelligence in medicine (1987).

Roberts, M. et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat. Mach. Intell. 3 , 199–217 (2021).

Article   Google Scholar  

Willemink, M. J. et al. Preparing medical imaging data for machine learning. Radiology192224 (2020).

Mueller, S. G. et al. Ways toward an early diagnosis in Alzheimer’s disease: the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Alzheimer’s Dement. 1 , 55–66 (2005).

Dallora, A. L., Eivazzadeh, S., Mendes, E., Berglund, J. & Anderberg, P. Machine learning and microsimulation techniques on the prognosis of dementia: A systematic literature review. PLoS ONE 12 , e0179804 (2017).

Article   PubMed   PubMed Central   Google Scholar  

Arbabshirani, M. R., Plis, S., Sui, J. & Calhoun, V. D. Single subject prediction of brain disorders in neuroimaging: Promises and pitfalls. NeuroImage 145 , 137–165 (2017).

Sakai, K. & Yamada, K. Machine learning studies on major brain diseases: 5-year trends of 2014–2018. Jpn. J. Radiol. 37 , 34–72 (2019).

Wen, J. et al. Convolutional neural networks for classification of Alzheimer’s disease: overview and reproducible evaluation. Medical Image Analysis 101694 (2020).

Ansart, M. et al. Predicting the progression of mild cognitive impairment using machine learning: a systematic, quantitative and critical review. Medical Image Analysis 101848 (2020).

Torralba, A. & Efros, A. A. Unbiased look at dataset bias. In Computer Vision and Pattern Recognition (CVPR) , 1521–1528 (2011).

Dockès, J., Varoquaux, G. & Poline, J.-B. Preventing dataset shift from breaking machine-learning biomarkers. GigaScience 10 , giab055 (2021).

Zendel, O., Murschitz, M., Humenberger, M. & Herzner, W. How good is my test data? introducing safety analysis for computer vision. Int. J. Computer Vis. 125 , 95–109 (2017).

Pooch, E. H., Ballester, P. L. & Barros, R. C. Can we trust deep learning models diagnosis? the impact of domain shift in chest radiograph classification. In MICCAI workshop on Thoracic Image Analysis (Springer, 2019).

Zech, J. R. et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med. 15 , e1002683 (2018).

Larrazabal, A. J., Nieto, N., Peterson, V., Milone, D. H. & Ferrante, E. Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis. Proceedings of the National Academy of Sciences (2020).

Tasdizen, T., Sajjadi, M., Javanmardi, M. & Ramesh, N. Improving the robustness of convolutional networks to appearance variability in biomedical images. In International Symposium on Biomedical Imaging (ISBI), 549–553 (IEEE, 2018).

Wachinger, C., Rieckmann, A., Pölsterl, S. & Initiative, A. D. N. et al. Detect and correct bias in multi-site neuroimaging datasets. Med. Image Anal. 67 , 101879 (2021).

Ashraf, A., Khan, S., Bhagwat, N., Chakravarty, M. & Taati, B. Learning to unlearn: building immunity to dataset bias in medical imaging studies. In NeurIPS workshop on Machine Learning for Health (ML4H) (2018).

Yu, X., Zheng, H., Liu, C., Huang, Y. & Ding, X. Classify epithelium-stroma in histopathological images based on deep transferable network. J. Microsc. 271 , 164–173 (2018).

Abbasi-Sureshjani, S., Raumanns, R., Michels, B. E., Schouten, G. & Cheplygina, V. Risk of training diagnostic algorithms on data with demographic bias. In Interpretable and Annotation-Efficient Learning for Medical Image Computing , 183–192 (Springer, 2020).

Suresh, H. & Guttag, J. V. A framework for understanding unintended consequences of machine learning. arXiv preprint arXiv:1901.10002 (2019).

Park, S. H. & Han, K. Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction. Radiology 286 , 800–809 (2018).

Oakden-Rayner, L., Dunnmon, J., Carneiro, G. & Ré, C. Hidden stratification causes clinically meaningful failures in machine learning for medical imaging. In ACM Conference on Health, Inference, and Learning, 151–159 (2020).

Winkler, J. K. et al. Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition. JAMA Dermatol. 155 , 1135–1141 (2019).

Joskowicz, L., Cohen, D., Caplan, N. & Sosna, J. Inter-observer variability of manual contour delineation of structures in CT. Eur. Radiol. 29 , 1391–1399 (2019).

Oakden-Rayner, L. Exploring large-scale public medical image datasets. Academic Radiol. 27 , 106–112 (2020).

Langley, P. The changing science of machine learning. Mach. Learn. 82 , 275–279 (2011).

Rabanser, S., Günnemann, S. & Lipton, Z. C. Failing loudly: an empirical study of methods for detecting dataset shift. In Neural Information Processing Systems (NeurIPS) (2018).

Rädsch, T. et al. What your radiologist might be missing: using machine learning to identify mislabeled instances of X-ray images. In Hawaii International Conference on System Sciences (HICSS) (2020).

Beyer, L., Hénaff, O. J., Kolesnikov, A., Zhai, X. & Oord, A. v. d. Are we done with ImageNet? arXiv preprint arXiv:2006.07159 (2020).

Gebru, T. et al. Datasheets for datasets. In Workshop on Fairness, Accountability, and Transparency in Machine Learning (2018).

Mitchell, M. et al. Model cards for model reporting. In Fairness, Accountability, and Transparency (FAccT) , 220–229 (ACM, 2019).

Ørting, S. N. et al. A survey of crowdsourcing in medical image analysis. Hum. Comput. 7 , 1–26 (2020).

Poldrack, R. A., Huckins, G. & Varoquaux, G. Establishment of best practices for evidence for prediction: a review. JAMA Psychiatry 77 , 534–540 (2020).

Pulini, A. A., Kerr, W. T., Loo, S. K. & Lenartowicz, A. Classification accuracy of neuroimaging biomarkers in attention-deficit/hyperactivity disorder: Effects of sample size and circular analysis. Biol. Psychiatry.: Cogn. Neurosci. Neuroimaging 4 , 108–120 (2019).

Saeb, S., Lonini, L., Jayaraman, A., Mohr, D. C. & Kording, K. P. The need to approximate the use-case in clinical machine learning. Gigascience 6 , gix019 (2017).

Hosseini, M. et al. I tried a bunch of things: The dangers of unexpected overfitting in classification of brain data. Neuroscience & Biobehavioral Reviews (2020).

Simpson, A. L. et al. A large annotated medical image dataset for the development and evaluation of segmentation algorithms. arXiv preprint arXiv:1902.09063 (2019).

Rohlfing, T. Image similarity and tissue overlaps as surrogates for image registration accuracy: widely used but unreliable. IEEE Trans. Med. Imaging 31 , 153–163 (2011).

Maier-Hein, L. et al. Why rankings of biomedical image analysis competitions should be interpreted with care. Nat. Commun. 9 , 5217 (2018).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Van Calster, B., McLernon, D. J., Van Smeden, M., Wynants, L. & Steyerberg, E. W. Calibration: the Achilles heel of predictive analytics. BMC Med. 17 , 1–7 (2019).

Wagstaff, K. L. Machine learning that matters. In International Conference on Machine Learning (ICML), 529–536 (2012).

Shankar, V. et al. Evaluating machine accuracy on imagenet. In International Conference on Machine Learning (ICML) (2020).

Bellamy, D., Celi, L. & Beam, A. L. Evaluating progress on machine learning for longitudinal electronic healthcare data. arXiv preprint arXiv:2010.01149 (2020).

Oliver, A., Odena, A., Raffel, C., Cubuk, E. D. & Goodfellow, I. J. Realistic evaluation of semi-supervised learning algorithms. In Neural Information Processing Systems (NeurIPS) (2018).

Dacrema, M. F., Cremonesi, P. & Jannach, D. Are we really making much progress? a worrying analysis of recent neural recommendation approaches. In ACM Conference on Recommender Systems, 101–109 (2019).

Musgrave, K., Belongie, S. & Lim, S.-N. A metric learning reality check. In European Conference on Computer Vision, 681–699 (Springer, 2020).

Pham, H. V. et al. Problems and opportunities in training deep learning software systems: an analysis of variance. In IEEE/ACM International Conference on Automated Software Engineering, 771–783 (2020).

Bouthillier, X. et al. Accounting for variance in machine learning benchmarks. In Machine Learning and Systems (2021).

Varoquaux, G. Cross-validation failure: small sample sizes lead to large error bars. NeuroImage 180 , 68–77 (2018).

Szucs, D. & Ioannidis, J. P. Sample size evolution in neuroimaging research: an evaluation of highly-cited studies (1990–2012) and of latest practices (2017–2018) in high-impact journals. NeuroImage117164 (2020).

Roelofs, R. et al. A meta-analysis of overfitting in machine learning. In Neural Information Processing Systems (NeurIPS), 9179–9189 (2019).

Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7 , 1–30 (2006).

Thompson, W. H., Wright, J., Bissett, P. G. & Poldrack, R. A. Meta-research: dataset decay and the problem of sequential analyses on open datasets. eLife 9 , e53498 (2020).

Maier-Hein, L. et al. Is the winner really the best? a critical analysis of common research practice in biomedical image analysis competitions. Nature Communications (2018).

Cockburn, A., Dragicevic, P., Besançon, L. & Gutwin, C. Threats of a replication crisis in empirical computer science. Commun. ACM 63 , 70–79 (2020).

Gigerenzer, G. Statistical rituals: the replication delusion and how we got there. Adv. Methods Pract. Psychol. Sci. 1 , 198–218 (2018).

Benavoli, A., Corani, G. & Mangili, F. Should we really use post-hoc tests based on mean-ranks? J. Mach. Learn. Res. 17 , 152–161 (2016).

Berrar, D. Confidence curves: an alternative to null hypothesis significance testing for the comparison of classifiers. Mach. Learn. 106 , 911–949 (2017).

Bouthillier, X., Laurent, C. & Vincent, P. Unreproducible research is reproducible. In International Conference on Machine Learning (ICML), 725–734 (2019).

Norgeot, B. et al. Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist. Nat. Med. 26 , 1320–1324 (2020).

Drummond, C. Machine learning as an experimental science (revisited). In AAAI workshop on evaluation methods for machine learning, 1–5 (2006).

Steyerberg, E. W. & Harrell, F. E. Prediction models need appropriate internal, internal–external, and external validation. J. Clin. Epidemiol. 69 , 245–247 (2016).

Woo, C.-W., Chang, L. J., Lindquist, M. A. & Wager, T. D. Building better biomarkers: brain models in translational neuroimaging. Nat. Neurosci. 20 , 365 (2017).

Van Calster, B. et al. Reporting and interpreting decision curve analysis: a guide for investigators. Eur. Urol. 74 , 796 (2018).

Thomas, R. & Uminsky, D. The problem with metrics is a fundamental problem for AI. arXiv preprint arXiv:2002.08512 (2020).

for the Evaluation of Medicinal Products, E. A. Points to consider on switching between superiority and non-inferiority. Br. J. Clin. Pharmacol. 52 , 223–228 (2001).

D’Agostino Sr, R. B., Massaro, J. M. & Sullivan, L. M. Non-inferiority trials: design concepts and issues–the encounters of academic consultants in statistics. Stat. Med. 22 , 169–186 (2003).

Christensen, E. Methodology of superiority vs. equivalence trials and non-inferiority trials. J. Hepatol. 46 , 947–954 (2007).

Hendriksen, J. M., Geersing, G.-J., Moons, K. G. & de Groot, J. A. Diagnostic and prognostic prediction models. J. Thrombosis Haemost. 11 , 129–141 (2013).

Campbell, M. K., Elbourne, D. R. & Altman, D. G. Consort statement: extension to cluster randomised trials. BMJ 328 , 702–708 (2004).

Blasini, M., Peiris, N., Wright, T. & Colloca, L. The role of patient–practitioner relationships in placebo and nocebo phenomena. Int. Rev. Neurobiol. 139 , 211–231 (2018).

Lipton, Z. C. & Steinhardt, J. Troubling trends in machine learning scholarship: some ML papers suffer from flaws that could mislead the public and stymie future research. Queue 17 , 45–77 (2019).

Tatman, R., VanderPlas, J. & Dane, S. A practical taxonomy of reproducibility for machine learning research. In ICML workshop on Reproducibility in Machine Learning (2018).

Gundersen, O. E. & Kjensmo, S. State of the art: Reproducibility in artificial intelligence. In AAAI Conference on Artificial Intelligence (2018).

Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D. & Amorim Fernández-Delgado, D. Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15 , 3133–3181 (2014).

Sculley, D. et al. Hidden technical debt in machine learning systems. In Neural Information Processing Systems (NeurIPS), 2503–2511 (2015).

Ioannidis, J. P. A. Why most published research findings are false. PLoS Med. 2 , e124 (2005).

Teney, D. et al. On the value of out-of-distribution testing: an example of Goodhart’s Law. In Neural Information Processing Systems (NeurIPS) (2020).

Kerr, N. L. HARKing: hypothesizing after the results are known. Personal. Soc. Psychol. Rev. 2 , 196–217 (1998).

Article   CAS   Google Scholar  

Gencoglu, O. et al. HARK side of deep learning–from grad student descent to automated machine learning. arXiv preprint arXiv:1904.07633 (2019).

Rosenthal, R. The file drawer problem and tolerance for null results. Psychological Bull. 86 , 638 (1979).

Kellmeyer, P. Ethical and legal implications of the methodological crisis in neuroimaging. Camb. Q. Healthc. Ethics 26 , 530–554 (2017).

Japkowicz, N. & Shah, M. Performance evaluation in machine learning. In Machine Learning in Radiation Oncology , 41–56 (Springer, 2015).

Santafe, G., Inza, I. & Lozano, J. A. Dealing with the evaluation of supervised classification algorithms. Artif. Intell. Rev. 44 , 467–508 (2015).

Han, K., Song, K. & Choi, B. W. How to develop, validate, and compare clinical prediction models involving radiological parameters: study design and statistical methods. Korean J. Radiol. 17 , 339–350 (2016).

Richter, A. N. & Khoshgoftaar, T. M. Sample size determination for biomedical big data with limited labels. Netw. Modeling Anal. Health Inform. Bioinforma. 9 , 12 (2020).

Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (tripod): the tripod statement. J. Br. Surg. 102 , 148–158 (2015).

Wolff, R. F. et al. Probast: a tool to assess the risk of bias and applicability of prediction model studies. Ann. Intern. Med. 170 , 51–58 (2019).

Henderson, P. et al. Towards the systematic reporting of the energy and carbon footprints of machine learning. J. Mach. Learn. Res. 21 , 1–43 (2020).

Bowen, A. & Casadevall, A. Increasing disparities between resource inputs and outcomes, as measured by certain health deliverables, in biomedical research. Proc. Natl Acad. Sci. 112 , 11335–11340 (2015).

Chambers, C. D., Dienes, Z., McIntosh, R. D., Rotshtein, P. & Willmes, K. Registered reports: realigning incentives in scientific publishing. Cortex 66 , A1–A2 (2015).

Forde, J. Z. & Paganini, M. The scientific method in the science of machine learning. In ICLR workshop on Debugging Machine Learning Models (2019).

Firestein, S.Failure: Why science is so successful (Oxford University Press, 2015).

Borji, A. Negative results in computer vision: a perspective. Image Vis. Comput. 69 , 1–8 (2018).

Voets, M., Møllersen, K. & Bongo, L. A. Replication study: Development and validation of deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. arXiv preprint arXiv:1803.04337 (2018).

Wilkinson, J. et al. Time to reality check the promises of machine learning-powered precision medicine. The Lancet Digital Health (2020).

Whitaker, K. & Guest, O. #bropenscience is broken science. Psychologist 33 , 34–37 (2020).

Kakarmath, S. et al. Best practices for authors of healthcare-related artificial intelligence manuscripts. NPJ Digital Med. 3 , 134–134 (2020).

Download references

Acknowledgements

We would like to thank Alexandra Elbakyan for help with the literature review. We thank Pierre Dragicevic for providing feedback on early versions of this manuscript, and Pierre Bartet for comments on the preprint. We also thank the reviewers, Jack Wilkinson and Odd Erik Gundersen, for excellent comments which improved our manuscript. GV acknowledges funding from grant ANR-17-CE23-0018, DirtyData.

Author information

Authors and affiliations.

INRIA, Versailles, France

Gaël Varoquaux

McGill University, Montreal, Canada

Mila, Montreal, Canada

IT University of Copenhagen, Copenhagen, Denmark

Veronika Cheplygina

You can also search for this author in PubMed   Google Scholar

Contributions

Both V.C. and G.V. collected the data; conceived, designed, and performed the analysis; reviewed the literature; and wrote the paper.

Corresponding authors

Correspondence to Gaël Varoquaux or Veronika Cheplygina .

Ethics declarations

Competing interests.

The authors declare that there are no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Latex source files, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Varoquaux, G., Cheplygina, V. Machine learning for medical imaging: methodological failures and recommendations for the future. npj Digit. Med. 5 , 48 (2022). https://doi.org/10.1038/s41746-022-00592-y

Download citation

Received : 21 June 2021

Accepted : 09 March 2022

Published : 12 April 2022

DOI : https://doi.org/10.1038/s41746-022-00592-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Deep representation learning of tissue metabolome and computed tomography annotates nsclc classification and prognosis.

  • Marc Boubnovski Martell
  • Kristofer Linton-Reid
  • Eric O. Aboagye

npj Precision Oncology (2024)

Electronic health records and stratified psychiatry: bridge to precision treatment?

  • Adrienne Grzenda
  • Alik S. Widge

Neuropsychopharmacology (2024)

Diagnostic performance of artificial intelligence-assisted PET imaging for Parkinson’s disease: a systematic review and meta-analysis

npj Digital Medicine (2024)

Improving generalization of machine learning-identified biomarkers using causal modelling with examples from immune receptor diagnostics

  • Milena Pavlović
  • Ghadi S. Al Hajj
  • Geir K. Sandve

Nature Machine Intelligence (2024)

Deep learning-aided decision support for diagnosis of skin disease across skin tones

  • Matthew Groh
  • Rosalind Picard

Nature Medicine (2024)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

case study on image processing

  • Reference Manager
  • Simple TEXT file

People also looked at

Specialty grand challenge article, grand challenges in image processing.

www.frontiersin.org

  • Université Paris-Saclay, CNRS, CentraleSupélec, Laboratoire des signaux et Systèmes, Gif-sur-Yvette, France

Introduction

The field of image processing has been the subject of intensive research and development activities for several decades. This broad area encompasses topics such as image/video processing, image/video analysis, image/video communications, image/video sensing, modeling and representation, computational imaging, electronic imaging, information forensics and security, 3D imaging, medical imaging, and machine learning applied to these respective topics. Hereafter, we will consider both image and video content (i.e. sequence of images), and more generally all forms of visual information.

Rapid technological advances, especially in terms of computing power and network transmission bandwidth, have resulted in many remarkable and successful applications. Nowadays, images are ubiquitous in our daily life. Entertainment is one class of applications that has greatly benefited, including digital TV (e.g., broadcast, cable, and satellite TV), Internet video streaming, digital cinema, and video games. Beyond entertainment, imaging technologies are central in many other applications, including digital photography, video conferencing, video monitoring and surveillance, satellite imaging, but also in more distant domains such as healthcare and medicine, distance learning, digital archiving, cultural heritage or the automotive industry.

In this paper, we highlight a few research grand challenges for future imaging and video systems, in order to achieve breakthroughs to meet the growing expectations of end users. Given the vastness of the field, this list is by no means exhaustive.

A Brief Historical Perspective

We first briefly discuss a few key milestones in the field of image processing. Key inventions in the development of photography and motion pictures can be traced to the 19th century. The earliest surviving photograph of a real-world scene was made by Nicéphore Niépce in 1827 ( Hirsch, 1999 ). The Lumière brothers made the first cinematographic film in 1895, with a public screening the same year ( Lumiere, 1996 ). After decades of remarkable developments, the second half of the 20th century saw the emergence of new technologies launching the digital revolution. While the first prototype digital camera using a Charge-Coupled Device (CCD) was demonstrated in 1975, the first commercial consumer digital cameras started appearing in the early 1990s. These digital cameras quickly surpassed cameras using films and the digital revolution in the field of imaging was underway. As a key consequence, the digital process enabled computational imaging, in other words the use of sophisticated processing algorithms in order to produce high quality images.

In 1992, the Joint Photographic Experts Group (JPEG) released the JPEG standard for still image coding ( Wallace, 1992 ). In parallel, in 1993, the Moving Picture Experts Group (MPEG) published its first standard for coding of moving pictures and associated audio, MPEG-1 ( Le Gall, 1991 ), and a few years later MPEG-2 ( Haskell et al., 1996 ). By guaranteeing interoperability, these standards have been essential in many successful applications and services, for both the consumer and business markets. In particular, it is remarkable that, almost 30 years later, JPEG remains the dominant format for still images and photographs.

In the late 2000s and early 2010s, we could observe a paradigm shift with the appearance of smartphones integrating a camera. Thanks to advances in computational photography, these new smartphones soon became capable of rivaling the quality of consumer digital cameras at the time. Moreover, these smartphones were also capable of acquiring video sequences. Almost concurrently, another key evolution was the development of high bandwidth networks. In particular, the launch of 4G wireless services circa 2010 enabled users to quickly and efficiently exchange multimedia content. From this point, most of us are carrying a camera, anywhere and anytime, allowing to capture images and videos at will and to seamlessly exchange them with our contacts.

As a direct consequence of the above developments, we are currently observing a boom in the usage of multimedia content. It is estimated that today 3.2 billion images are shared each day on social media platforms, and 300 h of video are uploaded every minute on YouTube 1 . In a 2019 report, Cisco estimated that video content represented 75% of all Internet traffic in 2017, and this share is forecasted to grow to 82% in 2022 ( Cisco, 2019 ). While Internet video streaming and Over-The-Top (OTT) media services account for a significant bulk of this traffic, other applications are also expected to see significant increases, including video surveillance and Virtual Reality (VR)/Augmented Reality (AR).

Hyper-Realistic and Immersive Imaging

A major direction and key driver to research and development activities over the years has been the objective to deliver an ever-improving image quality and user experience.

For instance, in the realm of video, we have observed constantly increasing spatial and temporal resolutions, with the emergence nowadays of Ultra High Definition (UHD). Another aim has been to provide a sense of the depth in the scene. For this purpose, various 3D video representations have been explored, including stereoscopic 3D and multi-view ( Dufaux et al., 2013 ).

In this context, the ultimate goal is to be able to faithfully represent the physical world and to deliver an immersive and perceptually hyperrealist experience. For this purpose, we discuss hereafter some emerging innovations. These developments are also very relevant in VR and AR applications ( Slater, 2014 ). Finally, while this paper is only focusing on the visual information processing aspects, it is obvious that emerging display technologies ( Masia et al., 2013 ) and audio also plays key roles in many application scenarios.

Light Fields, Point Clouds, Volumetric Imaging

In order to wholly represent a scene, the light information coming from all the directions has to be represented. For this purpose, the 7D plenoptic function is a key concept ( Adelson and Bergen, 1991 ), although it is unmanageable in practice.

By introducing additional constraints, the light field representation collects radiance from rays in all directions. Therefore, it contains a much richer information, when compared to traditional 2D imaging that captures a 2D projection of the light in the scene integrating the angular domain. For instance, this allows post-capture processing such as refocusing and changing the viewpoint. However, it also entails several technical challenges, in terms of acquisition and calibration, as well as computational image processing steps including depth estimation, super-resolution, compression and image synthesis ( Ihrke et al., 2016 ; Wu et al., 2017 ). The resolution trade-off between spatial and angular resolutions is a fundamental issue. With a significant fraction of the earlier work focusing on static light fields, it is also expected that dynamic light field videos will stimulate more interest in the future. In particular, dense multi-camera arrays are becoming more tractable. Finally, the development of efficient light field compression and streaming techniques is a key enabler in many applications ( Conti et al., 2020 ).

Another promising direction is to consider a point cloud representation. A point cloud is a set of points in the 3D space represented by their spatial coordinates and additional attributes, including color pixel values, normals, or reflectance. They are often very large, easily ranging in the millions of points, and are typically sparse. One major distinguishing feature of point clouds is that, unlike images, they do not have a regular structure, calling for new algorithms. To remove the noise often present in acquired data, while preserving the intrinsic characteristics, effective 3D point cloud filtering approaches are needed ( Han et al., 2017 ). It is also important to develop efficient techniques for Point Cloud Compression (PCC). For this purpose, MPEG is developing two standards: Geometry-based PCC (G-PCC) and Video-based PCC (V-PCC) ( Graziosi et al., 2020 ). G-PCC considers the point cloud in its native form and compress it using 3D data structures such as octrees. Conversely, V-PCC projects the point cloud onto 2D planes and then applies existing video coding schemes. More recently, deep learning-based approaches for PCC have been shown to be effective ( Guarda et al., 2020 ). Another challenge is to develop generic and robust solutions able to handle potentially widely varying characteristics of point clouds, e.g. in terms of size and non-uniform density. Efficient solutions for dynamic point clouds are also needed. Finally, while many techniques focus on the geometric information or the attributes independently, it is paramount to process them jointly.

High Dynamic Range and Wide Color Gamut

The human visual system is able to perceive, using various adaptation mechanisms, a broad range of luminous intensities, from very bright to very dark, as experienced every day in the real world. Nonetheless, current imaging technologies are still limited in terms of capturing or rendering such a wide range of conditions. High Dynamic Range (HDR) imaging aims at addressing this issue. Wide Color Gamut (WCG) is also often associated with HDR in order to provide a wider colorimetry.

HDR has reached some levels of maturity in the context of photography. However, extending HDR to video sequences raises scientific challenges in order to provide high quality and cost-effective solutions, impacting the whole imaging processing pipeline, including content acquisition, tone reproduction, color management, coding, and display ( Dufaux et al., 2016 ; Chalmers and Debattista, 2017 ). Backward compatibility with legacy content and traditional systems is another issue. Despite recent progress, the potential of HDR has not been fully exploited yet.

Coding and Transmission

Three decades of standardization activities have continuously improved the hybrid video coding scheme based on the principles of transform coding and predictive coding. The Versatile Video Coding (VVC) standard has been finalized in 2020 ( Bross et al., 2021 ), achieving approximately 50% bit rate reduction for the same subjective quality when compared to its predecessor, High Efficiency Video Coding (HEVC). While substantially outperforming VVC in the short term may be difficult, one encouraging direction is to rely on improved perceptual models to further optimize compression in terms of visual quality. Another direction, which has already shown promising results, is to apply deep learning-based approaches ( Ding et al., 2021 ). Here, one key issue is the ability to generalize these deep models to a wide diversity of video content. The second key issue is the implementation complexity, both in terms of computation and memory requirements, which is a significant obstacle to a widespread deployment. Besides, the emergence of new video formats targeting immersive communications is also calling for new coding schemes ( Wien et al., 2019 ).

Considering that in many application scenarios, videos are processed by intelligent analytic algorithms rather than viewed by users, another interesting track is the development of video coding for machines ( Duan et al., 2020 ). In this context, the compression is optimized taking into account the performance of video analysis tasks.

The push toward hyper-realistic and immersive visual communications entails most often an increasing raw data rate. Despite improved compression schemes, more transmission bandwidth is needed. Moreover, some emerging applications, such as VR/AR, autonomous driving, and Industry 4.0, bring a strong requirement for low latency transmission, with implications on both the imaging processing pipeline and the transmission channel. In this context, the emergence of 5G wireless networks will positively contribute to the deployment of new multimedia applications, and the development of future wireless communication technologies points toward promising advances ( Da Costa and Yang, 2020 ).

Human Perception and Visual Quality Assessment

It is important to develop effective models of human perception. On the one hand, it can contribute to the development of perceptually inspired algorithms. On the other hand, perceptual quality assessment methods are needed in order to optimize and validate new imaging solutions.

The notion of Quality of Experience (QoE) relates to the degree of delight or annoyance of the user of an application or service ( Le Callet et al., 2012 ). QoE is strongly linked to subjective and objective quality assessment methods. Many years of research have resulted in the successful development of perceptual visual quality metrics based on models of human perception ( Lin and Kuo, 2011 ; Bovik, 2013 ). More recently, deep learning-based approaches have also been successfully applied to this problem ( Bosse et al., 2017 ). While these perceptual quality metrics have achieved good performances, several significant challenges remain. First, when applied to video sequences, most current perceptual metrics are applied on individual images, neglecting temporal modeling. Second, whereas color is a key attribute, there are currently no widely accepted perceptual quality metrics explicitly considering color. Finally, new modalities, such as 360° videos, light fields, point clouds, and HDR, require new approaches.

Another closely related topic is image esthetic assessment ( Deng et al., 2017 ). The esthetic quality of an image is affected by numerous factors, such as lighting, color, contrast, and composition. It is useful in different application scenarios such as image retrieval and ranking, recommendation, and photos enhancement. While earlier attempts have used handcrafted features, most recent techniques to predict esthetic quality are data driven and based on deep learning approaches, leveraging the availability of large annotated datasets for training ( Murray et al., 2012 ). One key challenge is the inherently subjective nature of esthetics assessment, resulting in ambiguity in the ground-truth labels. Another important issue is to explain the behavior of deep esthetic prediction models.

Analysis, Interpretation and Understanding

Another major research direction has been the objective to efficiently analyze, interpret and understand visual data. This goal is challenging, due to the high diversity and complexity of visual data. This has led to many research activities, involving both low-level and high-level analysis, addressing topics such as image classification and segmentation, optical flow, image indexing and retrieval, object detection and tracking, and scene interpretation and understanding. Hereafter, we discuss some trends and challenges.

Keypoints Detection and Local Descriptors

Local imaging matching has been the cornerstone of many analysis tasks. It involves the detection of keypoints, i.e. salient visual points that can be robustly and repeatedly detected, and descriptors, i.e. a compact signature locally describing the visual features at each keypoint. It allows to subsequently compute pairwise matching between the features to reveal local correspondences. In this context, several frameworks have been proposed, including Scale Invariant Feature Transform (SIFT) ( Lowe, 2004 ) and Speeded Up Robust Features (SURF) ( Bay et al., 2008 ), and later binary variants including Binary Robust Independent Elementary Feature (BRIEF) ( Calonder et al., 2010 ), Oriented FAST and Rotated BRIEF (ORB) ( Rublee et al., 2011 ) and Binary Robust Invariant Scalable Keypoints (BRISK) ( Leutenegger et al., 2011 ). Although these approaches exhibit scale and rotation invariance, they are less suited to deal with large 3D distortions such as perspective deformations, out-of-plane rotations, and significant viewpoint changes. Besides, they tend to fail under significantly varying and challenging illumination conditions.

These traditional approaches based on handcrafted features have been successfully applied to problems such as image and video retrieval, object detection, visual Simultaneous Localization And Mapping (SLAM), and visual odometry. Besides, the emergence of new imaging modalities as introduced above can also be beneficial for image analysis tasks, including light fields ( Galdi et al., 2019 ), point clouds ( Guo et al., 2020 ), and HDR ( Rana et al., 2018 ). However, when applied to high-dimensional visual data for semantic analysis and understanding, these approaches based on handcrafted features have been supplanted in recent years by approaches based on deep learning.

Deep Learning-Based Methods

Data-driven deep learning-based approaches ( LeCun et al., 2015 ), and in particular the Convolutional Neural Network (CNN) architecture, represent nowadays the state-of-the-art in terms of performances for complex pattern recognition tasks in scene analysis and understanding. By combining multiple processing layers, deep models are able to learn data representations with different levels of abstraction.

Supervised learning is the most common form of deep learning. It requires a large and fully labeled training dataset, a typically time-consuming and expensive process needed whenever tackling a new application scenario. Moreover, in some specialized domains, e.g. medical data, it can be very difficult to obtain annotations. To alleviate this major burden, methods such as transfer learning and weakly supervised learning have been proposed.

In another direction, deep models have been shown to be vulnerable to adversarial attacks ( Akhtar and Mian, 2018 ). Those attacks consist in introducing subtle perturbations to the input, such that the model predicts an incorrect output. For instance, in the case of images, imperceptible pixel differences are able to fool deep learning models. Such adversarial attacks are definitively an important obstacle to the successful deployment of deep learning, especially in applications where safety and security are critical. While some early solutions have been proposed, a significant challenge is to develop effective defense mechanisms against those attacks.

Finally, another challenge is to enable low complexity and efficient implementations. This is especially important for mobile or embedded applications. For this purpose, further interactions between signal processing and machine learning can potentially bring additional benefits. For instance, one direction is to compress deep neural networks in order to enable their more efficient handling. Moreover, by combining traditional processing techniques with deep learning models, it is possible to develop low complexity solutions while preserving high performance.

Explainability in Deep Learning

While data-driven deep learning models often achieve impressive performances on many visual analysis tasks, their black-box nature often makes it inherently very difficult to understand how they reach a predicted output and how it relates to particular characteristics of the input data. However, this is a major impediment in many decision-critical application scenarios. Moreover, it is important not only to have confidence in the proposed solution, but also to gain further insights from it. Based on these considerations, some deep learning systems aim at promoting explainability ( Adadi and Berrada, 2018 ; Xie et al., 2020 ). This can be achieved by exhibiting traits related to confidence, trust, safety, and ethics.

However, explainable deep learning is still in its early phase. More developments are needed, in particular to develop a systematic theory of model explanation. Important aspects include the need to understand and quantify risk, to comprehend how the model makes predictions for transparency and trustworthiness, and to quantify the uncertainty in the model prediction. This challenge is key in order to deploy and use deep learning-based solutions in an accountable way, for instance in application domains such as healthcare or autonomous driving.

Self-Supervised Learning

Self-supervised learning refers to methods that learn general visual features from large-scale unlabeled data, without the need for manual annotations. Self-supervised learning is therefore very appealing, as it allows exploiting the vast amount of unlabeled images and videos available. Moreover, it is widely believed that it is closer to how humans actually learn. One common approach is to use the data to provide the supervision, leveraging its structure. More generally, a pretext task can be defined, e.g. image inpainting, colorizing grayscale images, predicting future frames in videos, by withholding some parts of the data and by training the neural network to predict it ( Jing and Tian, 2020 ). By learning an objective function corresponding to the pretext task, the network is forced to learn relevant visual features in order to solve the problem. Self-supervised learning has also been successfully applied to autonomous vehicles perception. More specifically, the complementarity between analytical and learning methods can be exploited to address various autonomous driving perception tasks, without the prerequisite of an annotated data set ( Chiaroni et al., 2021 ).

While good performances have already been obtained using self-supervised learning, further work is still needed. A few promising directions are outlined hereafter. Combining self-supervised learning with other learning methods is a first interesting path. For instance, semi-supervised learning ( Van Engelen and Hoos, 2020 ) and few-short learning ( Fei-Fei et al., 2006 ) methods have been proposed for scenarios where limited labeled data is available. The performance of these methods can potentially be boosted by incorporating a self-supervised pre-training. The pretext task can also serve to add regularization. Another interesting trend in self-supervised learning is to train neural networks with synthetic data. The challenge here is to bridge the domain gap between the synthetic and real data. Finally, another compelling direction is to exploit data from different modalities. A simple example is to consider both the video and audio signals in a video sequence. In another example in the context of autonomous driving, vehicles are typically equipped with multiple sensors, including cameras, LIght Detection And Ranging (LIDAR), Global Positioning System (GPS), and Inertial Measurement Units (IMU). In such cases, it is easy to acquire large unlabeled multimodal datasets, where the different modalities can be effectively exploited in self-supervised learning methods.

Reproducible Research and Large Public Datasets

The reproducible research initiative is another way to further ensure high-quality research for the benefit of our community ( Vandewalle et al., 2009 ). Reproducibility, referring to the ability by someone else working independently to accurately reproduce the results of an experiment, is a key principle of the scientific method. In the context of image and video processing, it is usually not sufficient to provide a detailed description of the proposed algorithm. Most often, it is essential to also provide access to the code and data. This is even more imperative in the case of deep learning-based models.

In parallel, the availability of large public datasets is also highly desirable in order to support research activities. This is especially critical for new emerging modalities or specific application scenarios, where it is difficult to get access to relevant data. Moreover, with the emergence of deep learning, large datasets, along with labels, are often needed for training, which can be another burden.

Conclusion and Perspectives

The field of image processing is very broad and rich, with many successful applications in both the consumer and business markets. However, many technical challenges remain in order to further push the limits in imaging technologies. Two main trends are on the one hand to always improve the quality and realism of image and video content, and on the other hand to be able to effectively interpret and understand this vast and complex amount of visual data. However, the list is certainly not exhaustive and there are many other interesting problems, e.g. related to computational imaging, information security and forensics, or medical imaging. Key innovations will be found at the crossroad of image processing, optics, psychophysics, communication, computer vision, artificial intelligence, and computer graphics. Multi-disciplinary collaborations are therefore critical moving forward, involving actors from both academia and the industry, in order to drive these breakthroughs.

The “Image Processing” section of Frontier in Signal Processing aims at giving to the research community a forum to exchange, discuss and improve new ideas, with the goal to contribute to the further advancement of the field of image processing and to bring exciting innovations in the foreseeable future.

Author Contributions

The author confirms being the sole contributor of this work and has approved it for publication.

A Survey of Forensic Applications using Digital Image Processing: Image Improvement Case Study

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

  • Search for: Search Button Close Search Window

Request for Quote

Image Processing Case Study

Let’s look at the transportation industry-based case of extensive image processing.

Two video cameras were looking at the boxes moving fast on the conveyor belt. To provide high enough image resolution the cameras were placed close to the belt but they could not cover all the belt cross-section. They were placed on the sides of the belt, and could see parts of the boxes. Customer wanted good images of the texture on the top of the boxes, so the images from the two cameras needed to be stitched.

Two cameras see the same object at different angles and distances. Before merging the images from the different cameras the images must be transformed from the coordinate systems of the cameras to one common coordinate system, and placed in one common plane in XYZ space. Our developed software performed transformation automatically, based on the known geometry of the camera positions relative to the conveyor belt.

Still, after such transformation, multi-megapixel grayscale images from the left and the right cameras are shifted in common plane relative to each other:

case study on image processing

Here grayscale images from the two cameras are shown in false color. The scale on the right demonstrates the relation between 8-bit pixel signal strength and the false color. We see that the two images also have different brightness.

Our algorithms adjust the brightness and shift the images from the left and right cameras to make merging of two images into one image possible. The resulting combined image is shown using different choice of false colors:

case study on image processing

Right image pixels are shown using magenta, and the left image pixels are shown using green color.

Here is the zoomed version of the overlap region of the stitched image:

case study on image processing

If the stitching would be perfect, then in the overlap region all the pixels would be gray. Our engineer saw that while there are small fringes of color on the edges of black digits and stripes, the overall stitching accuracy is good. This is not trivial, as stitching of the images obtained by different cameras, looking at nearby object from different angles, is not easy.

For comparison, here is an example of a  not very successful stitching:

case study on image processing

Avantier Inc.’s engineering  team with over 30 years of experience  developed software for the customer to perform  all the necessary transformations automatically, without any operator intervention.

GREAT ARTICLE!

Share this article to gain insights from your connections!

  • Aerospace and Defense (14)
  • AR/MR/VR (12)
  • Automotive (12)
  • Consumer (11)
  • Industrial (21)
  • Life Science (29)
  • Medical (32)
  • Security & Surveillance (16)
  • Application Note (26)
  • Aspheric Lens (5)
  • Large Optics (1)
  • Microlens Arrays (1)
  • Microscope Objective Lens (5)
  • OAP Mirrors (1)
  • Reverse Engineering (1)
  • Knowledge Center (32)
  • Technical Article (50)
  • Custom Optics (58)
  • Image Processing (14)
  • Metrology (13)
  • Optical Design (21)
  • Optical Engineering (15)
  • Optical Lens Assembly (31)
  • Opto-Mechanical Design (13)
  • Rapid Optical Prototyping (4)
  • Reverse Optical Engineering (4)
  • Uncategorized (3)

A new hybrid feature reduction method by using MCMSTClustering algorithm with various feature projection methods: a case study on sleep disorder diagnosis

  • Original Paper
  • Published: 30 March 2024

Cite this article

  • Ali Şenol   ORCID: orcid.org/0000-0003-0364-2837 1 ,
  • Tarık Talan   ORCID: orcid.org/0000-0002-5371-4520 2 &
  • Cemal Aktürk   ORCID: orcid.org/0000-0003-3764-3862 2  

67 Accesses

Explore all metrics

In the machine learning area, having a large number of irrelevant or less relevant features to the result of the dataset can reduce classification success and run-time performance. For this reason, feature selection or reduction methods are widely used. The aim is to eliminate irrelevant features or transform the features into new features that have fewer numbers and are relevant to the results. However, in some cases, feature reduction methods are not sufficient on their own to increase success. In this study, we propose a new hybrid feature projection model to increase the classification performance of classifiers. For this goal, the MCMSTClustering algorithm is used in the data preprocessing stage of classification with various feature projection methods, which are PCA, LDA, SVD, t-SNE, NCA, Isomap, and PR, to increase the classification performance of the sleep disorder diagnosis. To determine the best parameters of the MCMSTClustering algorithm, we used the VIASCKDE Index, Dunn Index, Silhouette Index, Adjusted Rand Index, and Accuracy as cluster quality evaluation methods. To evaluate the performance of the proposed model, we first appended class labels produced by the MCMSTClustering to the dataset as a new feature. We applied selected feature projection methods to decrease the number of features. Then, we performed the kNN algorithm on the dataset. Finally, we compared the obtained results. To reveal the efficiency of the proposed model, we tested it on a sleep disorder diagnosis dataset and compared it with two models that were pure kNN and kNN with the feature projection methods used in the proposed approach. According to the experimental results, the proposed method, in which the feature projection method was Kernel PCA, was the best model with a classification accuracy of 0.9627. In addition, the MCMSTClustering algorithm increases the performance of PCA, Kernel PCA, SVD, t-SNE, and PR. However, the performance of the LDA, NCA, and Isomap remains the same.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

case study on image processing

Similar content being viewed by others

case study on image processing

Sleep Stages Classification from Electroencephalographic Signals Based on Unsupervised Feature Space Clustering

case study on image processing

A feature reduction and selection algorithm for improved obstructive sleep apnea classification process

Ahmed Elwali & Zahra Moussavi

case study on image processing

Machine learning approach for classification of Parkinson disease using acoustic features

Vikas Mittal & R. K. Sharma

Data availability

The data are publicly available.

Keskin, N., Tamam, L.: Sleep disorders: classification and treatment. Arch. Med. Rev. J. 27 (2), 241–260 (2018)

Google Scholar  

Pıçak, R., İsmailoğulları, S., Mazıcıoğlu, M., Üstünbaş, H.B., Murat, A.: Approaches and recommendations for sleep disorders in primary care. Turk. J. Fam. Med. Prim. Care 4 (3), 12–22 (2010)

Ursavaş, A.: New classification of sleep disorders (ICSD-3) what has changed in sleep breathing disorders. Updat. Pulm. Dis. 2 (2), 139–151 (2014)

Yahyaoui, A.: Chest diseases diagnosis based on machine learning algorithms. Doctoral Dissertation. Sakarya Universitesi (Türkiye) (2017)

Altan, G., Kutlu, Y.: A review on respiratory sound analysis using machine learning. In: 2016 20th National Biomedical Engineering Meeting (BIYOMUT), pp 1–4. IEEE (2016). https://doi.org/10.1109/BIYOMUT.2016.7849379

Eyüpoğlu, C., Yavuz, E.: A new classification method based on machine learning techniques for cancer diagnosis. Bilecik Şeyh Edebali Üniversitesi Fen Bilim Derg. 7 (2), 1106–1123 (2020). https://doi.org/10.35193/bseufbd.742456

Article   Google Scholar  

Saygın, E., Baykara, M.: Measuring the success of machine learning methods using feature selection in diagnosis of liver failure. Fırat Univ. J. Eng. Sci. 33 (2), 367–377 (2021). https://doi.org/10.35234/fumbd.832264

Khan, R.A., Luo, Y., Wu, F.-X.: Machine learning based liver disease diagnosis: a systematic review. Neurocomputing 468 , 492–509 (2022). https://doi.org/10.1016/j.neucom.2021.08.138

Alizadehsani, R., et al.: Machine learning-based coronary artery disease diagnosis: a comprehensive review. Comput. Biol. Med. 111 , 103346 (2019). https://doi.org/10.1016/j.compbiomed.2019.103346

Ahsan, M.M., Siddique, Z.: Machine learning-based heart disease diagnosis: a systematic literature review. Artif. Intell. Med. 128 , 102289 (2022). https://doi.org/10.1016/j.artmed.2022.102289

Qezelbash-Chamak, J., Badamchizadeh, S., Eshghi, K., Asadi, Y.: A survey of machine learning in kidney disease diagnosis. Mach. Learn. Appl. 10 , 100418 (2022). https://doi.org/10.1016/j.mlwa.2022.100418

Kumar, N., Narayan Das, N., Gupta, D., Gupta, K., Bindra, J.: Efficient automated disease diagnosis using machine learning models. J. Healthc. Eng. (2021). https://doi.org/10.1155/2021/9983652

Khan, P., et al.: Machine learning and deep learning approaches for brain disease diagnosis: principles and recent advances. IEEE Access 9 , 37622–37655 (2021). https://doi.org/10.1109/ACCESS.2021.306248

Bozkurt, S., Bostanci, A., Turhan, M.: Estimation of obstructive sleep apnea severity using additive Bayesian networks. J. Sleep Res. (2018)

Cooray, N., Andreotti, F., Lo, C., Symmonds, M., Hu, M.T.M., De Vos, M.: Automating the detection of REM sleep behaviour disorder. In: 2018 40th Annual İnternational Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp 1460–1463. IEEE (2018). https://doi.org/10.1109/EMBC.2018.8512539

Biswas, A., Chatterjee, S., Munshi, S.: Automated system design for the ıdentification of sleep disorder: cross-correlation and SVM based approach. In: 2020 IEEE VLSI Devıce Cırcuıt and System (VLSI DCS), pp. 1–5. IEEE (2020). https://doi.org/10.1109/VLSIDCS47293.2020.9179872

Şenel, F.A., Saygın, R.R., Saygın, M., Öztürk, Ö.: The diagnosis of OSAS with body analysis using machine learning algorithm. Uyku Bülteni 2 (1), 6–10 (2021)

Nazli, B.: Evaluation of different machine learning algorithms for classification of sleep apnea. In: 2021 29th Signal Processing and Communications Applications Conference (SIU), pp. 1–4. IEEE (2021). https://doi.org/10.1109/SIU53274.2021.9477705

Atianashie Miracle, A., Armah, E.D., Mohammed, N.: A portable gui based sleep disorder system classification based on convolution neural networks (cnn) in raspberry pi. J. Eng. Appl. Sci. Humanit. 6 , 13–23 (2021)

Milani, M.G.M., Murugaiya, R., Murugiah, K., Senaratne, G.G.: Sleep pattern analysis from polysomnographic signals using a supervised machine learning approach. SN Comput. Sci. 2 (3), 228 (2021). https://doi.org/10.1007/s42979-021-00606-8

Bulut, A., Öztürk, G., Ozturk, G., Kaya, İ, Kaya, I.: Classification of sleep stages via machine learning algorithms. Akıllı Sist. ve Uygulamaları Derg. 5 (1), 66–70 (2022). https://doi.org/10.54856/jiswa.202205210

Altun, S.: Classification of sleep stages from polysomnography signals with deep learning and machine learning methods. Black Sea J. Sci. 13 (2), 583–600 (2023). https://doi.org/10.31466/kfbd.1246482

Gawhale, S., Upasani, D.E., Chaudhari, L., Khankal, D.V., Kumar, J.R.R., Upadhye, V.A.: EEG signal processing for the identification of sleeping disorder using hybrid deep learning with ensemble machine learning classifier. Int. J. Intell. Syst. Appl. Eng. 11 (10s), 113–129 (2023)

Zheng, H., Wu, Y.: A xgboost model with weather similarity analysis and feature engineering for short-term wind power forecasting. Appl. Sci. 9 (15), 3019 (2019). https://doi.org/10.3390/app9153019

Razavi, R., Gharipour, A., Fleury, M., Akpan, I.J.: A practical feature-engineering framework for electricity theft detection in smart grids. Appl. Energy 238 , 481–494 (2019). https://doi.org/10.1016/j.apenergy.2019.01.076

Zhang, W., Dong, X., Li, H., Xu, J., Wang, D.: Unsupervised detection of abnormal electricity consumption behavior based on feature engineering. IEEE Access 8 , 55483–55500 (2020). https://doi.org/10.1109/ACCESS.2020.2980079

Lee, Z.-J., Lee, C.-Y., Chang, L.-Y., Sano, N.: Clustering and classification based on distributed automatic feature engineering for customer segmentation. Symmetry (Basel) 13 (9), 1557 (2021). https://doi.org/10.3390/sym13091557

Wang, J., Dong, Y., Liu, J.: A novel multifactor clustering integration paradigm based on two-stage feature engineering and improved bidirectional deep neural networks for exchange rate forecasting. Digit. Signal Process. 143 , 104258 (2023). https://doi.org/10.1016/j.dsp.2023.104258

Panda, M., Abd Allah, A.M., Hassanien, A.E.: Developing an efficient feature engineering and machine learning model for detecting IoT-botnet cyber attacks. IEEE Access 9 , 91038–91052 (2021). https://doi.org/10.1109/ACCESS.2021.3092054

Wen, H., Hou, B., Jin, X.: Fault identification of a chain conveyor based on functional data feature engineering and optimized multi-layer kernel extreme learning machine. J. Mech. Sci. Technol. 37 (5), 2289–2300 (2023). https://doi.org/10.1007/s12206-023-0405-x

Suha, S.A., Islam, M.N.: Exploring the dominant features and data-driven detection of polycystic ovary syndrome through modified stacking ensemble machine learning technique. Heliyon (2023). https://doi.org/10.1016/j.heliyon.2023.e14518

Hidayat, I.A.: Classification of sleep disorders using random forest on sleep health and lifestyle dataset. J. Dinda Data Sci. Inf. Technol. Data Anal. 3 (2), 71–76 (2023)

Soni, T., Gupta, D., Uppal, M.: Enhancing accuracy of sleep disorder with logistic regression model. In: 2023 IEEE 2nd International Conference on Industrial Electronics: Developments and Applications (ICIDeA), pp. 292–295. IEEE (2023). https://doi.org/10.1109/ICIDeA59866.2023.10295230 .

Taspinar, Y.S., Cinar, I.: Prediction of Sleep health status, visualization and analysis of data. In: 11th International Conference on Advanced Technologies, 2023, pp. 29–34. https://doi.org/10.58190/icat.2023.13

Şenol, A.: ImpKmeans: an ımproved version of the K-means algorithm, by determining optimum ınitial centroids, based on multivariate kernel density estimation and Kd-tree. Acta Polytech. Hung. (2024). https://doi.org/10.12700/APH.21.2.2024.2.6

Şenol, A., Kaya, M., Canbay, Y.: A comparison of tree data structures in the streaming data clustering issue. J. Fac. Eng. Archit. Gazi Univ. 39 (1), 217–231 (2024). https://doi.org/10.17341/gazimmfd.1144533

Şenol, A.: MCMSTClustering: defining non-spherical clusters by using minimum spanning tree over KD-tree-based micro-clusters. Neural Comput. Appl. 35 (18), 13239–13259 (2023). https://doi.org/10.1007/s00521-023-08386-3

Nargesian, F., Samulowitz, H., Khurana, U., Khalil, E.B., Turaga, D.S.: Learning feature engineering for classification. In: Ijcai, 2017, pp. 2529–2535. https://doi.org/10.24963/ijcai.2017/352

Khurana, U., Samulowitz, H., Turaga, D.: Feature engineering for predictive modeling using reinforcement learning. Proc. AAAI Conf. Artif. Intell. (2018). https://doi.org/10.1609/aaai.v32i1.11678

Arjmandi, M.K., Pooyan, M., Mikaili, M., Vali, M., Moqarehzadeh, A.: Identification of voice disorders using long-time features and support vector machine with different feature reduction methods. J. Voice 25 (6), e275–e289 (2011). https://doi.org/10.1016/j.jvoice.2010.08.003

Idakwo, G., Luttrell, J., IV., Chen, M., Hong, H., Gong, P., Zhang, C.: A Review of Feature Reduction Methods for QSAR-Based Toxicity Prediction. Springer, Berlin (2019). https://doi.org/10.1007/978-3-030-16443-0_7

Book   Google Scholar  

Richards, J.A.: Feature reduction. In: Remote Sensing Digital Image Analysis, pp. 403–446. Springer (2022). https://doi.org/10.1007/978-3-030-82327-6_10

Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2 (4), 433–459 (2010). https://doi.org/10.1002/wics.101

Jolliffe, I.T., Cadima, J.: Principal component analysis: a review and recent developments. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 374 (2065), 20150202 (2016). https://doi.org/10.1098/rsta.2015.0202

Article   MathSciNet   Google Scholar  

Greenacre, M., Groenen, P.J.F., Hastie, T., D’Enza, A.I., Markos, A., Tuzhilina, E.: Publisher correction: principal component analysis. Nat. Rev. Methods Prim. 3 (1), 22 (2023). https://doi.org/10.1038/s43586-023-00209-y

Schölkopf, B., Smola, A., Müller, K.-R.: Kernel principal component analysis. In: Gerstner, J., Germond, W., Hasler, A., Nicoud, M. (eds.) International conference on artificial neural networks. Berlin, Heidelberg: Springer, pp. 583–588 (1997). https://doi.org/10.1007/BFb0020217

Mika, S., Schölkopf, B., Smola, A., Müller, K.-R., Scholz, M., Rätsch, G.: Kernel PCA and de-noising in feature spaces. Adv. Neural. Inf. Process. Syst. 11 , 536–542 (1999)

Tharwat, A., Gaber, T., Ibrahim, A., Hassanien, A.E.: Linear discriminant analysis: a detailed tutorial. AI Commun. 30 (2), 169–190 (2017). https://doi.org/10.3233/AIC-170729

Sharma, A., Paliwal, K.K.: Linear discriminant analysis for the small sample size problem: an overview. Int. J. Mach. Learn. Cybern. 6 , 443–454 (2015). https://doi.org/10.1007/s13042-013-0226-9

Park, C.H., Park, H.: A comparison of generalized linear discriminant analysis algorithms. Pattern Recognit. 41 (3), 1083–1097 (2008). https://doi.org/10.1016/j.patcog.2007.07.022

Gerbrands, J.J.: On the relationships between SVD, KLT and PCA. Pattern Recognit. 14 (1–6), 375–381 (1981). https://doi.org/10.1016/0031-3203(81)90082-0

Neto, E.A.L., Rodrigues, P.C.: Kernel robust singular value decomposition. Expert Syst. Appl. 211 , 118555 (2023). https://doi.org/10.1016/j.eswa.2022.118555

Dongarra, J., et al.: The singular value decomposition: anatomy of optimizing an algorithm for extreme scale. SIAM Rev. 60 (4), 808–865 (2018). https://doi.org/10.1137/17M1117732

Makbol, N.M., Khoo, B.E.: Robust blind image watermarking scheme based on redundant discrete wavelet transform and singular value decomposition. AEU-Int. J. Electron. Commun. 67 (2), 102–112 (2013). https://doi.org/10.1016/j.aeue.2012.06.008

Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9 (11), 2579–2605 (2008)

Belkina, A.C., Ciccolella, C.O., Anno, R., Halpert, R., Spidlen, J., Snyder-Cappione, J.E.: Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets. Nat. Commun. 10 (1), 5415 (2019). https://doi.org/10.1038/s41467-019-13055-y

Shi, S., Xu, Y., Xu, X., Mo, X., Ding, J.: A preprocessing manifold learning strategy based on T-distributed stochastic neighbor embedding. Entropy 25 (7), 1065 (2023). https://doi.org/10.3390/e25071065

Anowar, F., Sadaoui, S., Selim, B.: Conceptual and empirical comparison of dimensionality reduction algorithms (pca, kpca, lda, mds, svd, lle, isomap, le, ica, t-sne). Comput. Sci. Rev. 40 , 100378 (2021). https://doi.org/10.1016/j.cosrev.2021.100378

Yang, W., Wang, K., Zuo, W.: Neighborhood component feature selection for high-dimensional data. J. Comput. 7 (1), 161–168 (2012). https://doi.org/10.4304/jcp.7.1.161-168

Nasip, Ö.F., Zengin, K.: Transfer Öğrenme ve Komşuluk Bileşen Analizine Dayalı Balgam Yayma Mikroskop Görüntüleri Üzerinden Otomatik Tüberküloz Teşhisi. Mühendislik Bilim. ve Araştırmaları Derg. 4 (2), 236–246 (2022). https://doi.org/10.46387/bjesr.1160038

Raghu, S., Sriraam, N.: Classification of focal and non-focal EEG signals using neighborhood component analysis and machine learning algorithms. Expert Syst. Appl. 113 , 18–32 (2018). https://doi.org/10.1016/j.eswa.2018.06.031

Zhang, J., Sang, J.-G., Liu, J.-M., Yu, G.-L.: An adaptive manifold learning algorithm based on ISOMAP. In: 2009 International Conference on Research Challenges in Computer Science, pp. 104–107. IEEE (2009). https://doi.org/10.1109/ICRCCS.2009.34

Kaur, H., Khanna, P.: Gaussian random projection based non-invertible cancelable biometric templates. Procedia Comput. Sci. 54 , 661–670 (2015). https://doi.org/10.1016/j.procs.2015.06.077

Şenol, A.: VIASCKDE ındex: a novel internal cluster validity index for arbitrary-shaped clusters based on the kernel density estimation. Comput. Intell. Neurosci. (2022). https://doi.org/10.1155/2022/4059302

Brock, G., Pihur, V., Datta, S., Datta, S.: clValid: an R package for cluster validation. J. Stat. Softw. 25 , 1–22 (2008)

Dudek, A.: Silhouette index as clustering evaluation tool. In: Classification and Data Analysis: Theory and Applications, vol. 28, pp. 19–33. Springer (2020). https://doi.org/10.1007/978-3-030-52348-0_2

Hathaliya, J., et al.: Convolutional neural network-based Parkinson disease classification using SPECT imaging data. Mathematics 10 (15), 2566 (2022). https://doi.org/10.3390/math10152566

Sleep Health and Lifestyle Dataset. https://www.kaggle.com/datasets/uom190346a/sleep-health-and-lifestyle-dataset

Henderi, H., Wahyuningsih, T., Rahwanto, E.: Comparison of Min-Max normalization and Z-Score Normalization in the K-nearest neighbor (kNN) Algorithm to Test the Accuracy of Types of Breast Cancer. Int. J. Inform. Inf. Syst. 4 (1), 13–20 (2021)

Download references

The authors declare that this paper was not funded by any institution.

Author information

Authors and affiliations.

Tarsus University, Mersin, Turkey

Gaziantep Islam Science and Technology University, Gaziantep, Turkey

Tarık Talan & Cemal Aktürk

You can also search for this author in PubMed   Google Scholar

Contributions

All of the authors contributed equally to this work and reviewed the manuscript.

Corresponding author

Correspondence to Tarık Talan .

Ethics declarations

Conflict of interest.

The authors declare no conflict of interest.

Ethical approval

Not applicable.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Şenol, A., Talan, T. & Aktürk, C. A new hybrid feature reduction method by using MCMSTClustering algorithm with various feature projection methods: a case study on sleep disorder diagnosis. SIViP (2024). https://doi.org/10.1007/s11760-024-03097-1

Download citation

Received : 11 January 2024

Revised : 10 February 2024

Accepted : 16 February 2024

Published : 30 March 2024

DOI : https://doi.org/10.1007/s11760-024-03097-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Feature projection
  • Breast cancer diagnosis
  • MCMSTClustering
  • Classification
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. PPT

    case study on image processing

  2. Case Study on Image Processing Service to a Customer- MAPSystems

    case study on image processing

  3. Figure 1 from Skin Disease Diagnosis System using Image Processing and

    case study on image processing

  4. Chapter 4: Image processing in the digital humanities

    case study on image processing

  5. case study image analysis

    case study on image processing

  6. (PDF) A STUDY ON THE IMPORTANCE OF IMAGE PROCESSING AND ITS APLLICATIONS

    case study on image processing

VIDEO

  1. #MS academy, bihar ka pahala modal school jaha sikhaya jata hai study processing mathod #playars

  2. BeEklavya's Study Processing Method

  3. Explain of science topic💫 #by study processing

  4. #High conference#💫ke sath topic #Explain by khushboo#👍🏻

  5. dream study abroad palnadu overseas sattenapalli free visa study processing

  6. study processing method # method #explore by Simran # method # support # viral # short,s

COMMENTS

  1. PDF Digital Image Processing in Medical Applications: A Case Study

    also stored computer tuT image 41 Ullrasorti-veraphy is a technique 1..'\ which energy is used ta detect the stale of the ernal body organs. of' ultrasonic energy transmitted trom a piem-e o are nna.net13strie+.ive tr¿nsdueer through the and inlermal Strikes an inle.rtace between two tissues ol' dil'fi-tre.nl acaustieal impedance,

  2. An Industrial Case study on Deep learning image classification

    Image enhancement is a fundamental process in the field of image processing, aimed at improving the perceptual quality of an image or… 18 min read · Feb 24, 2024 Luís Fernando Torres

  3. Machine Learning and Genetic Algorithms: A case study on image

    3. The image reconstruction problem as a case study. In this section, we describe the problem that was used to demonstrate the benefits of integrating machine learning and metaheuristics. The image reconstruction problem has several applications in image processing and can be used as a filter on an input image.

  4. Image Processing: Techniques, Types, & Applications [2023]

    Task 1: Image Enhancement. One of the most common image processing tasks is an image enhancement, or improving the quality of an image. It has crucial applications in Computer Vision tasks, Remote Sensing, and surveillance. One common approach is adjusting the image's contrast and brightness.

  5. Case Study: Image Processing

    The most common ordering system is RGB: the first channel represents the red intensity of the pixel, the second channel represents green, and the third blue. 10 Using three bytes to represent each color gives us a much larger colorspace than greyscale images: 256 3 = 16, 777, 216 possible values for each pixel.

  6. Machine learning for medical imaging: methodological failures and

    A review of deep learning in medical imaging: Image traits, technology trends, case studies with progress highlights, and future promises. Proceedings of the IEEE1-19 (2020). Liu, X. et al.

  7. Image Processing on IOPA Radiographs: A comprehensive case study on

    1 . Abstract— With the recent advancements in Image Processing Techniques and development of new robust computer vision algorithms, new areas of research within Medical Diagnosis and Biomedical Engineering are picking up pace. This paper provides a comprehensive in-depth case study of Image Processing, Feature Extraction and Analysis of ...

  8. Deep Decomposition Network for Image Processing: A Case Study for

    Image decomposition is a crucial subject in the field of image processing. It can extract salient features from the source image. We propose a new image decomposition method based on convolutional neural network. This method can be applied to many image processing tasks. In this paper, we apply the image decomposition network to the image fusion task. We input infrared image and visible light ...

  9. Case Study: Image Processing

    Download Citation | Case Study: Image Processing | This chapter introduces techniques for working with imagery in Python. First, we review the data formats commonly used for storing images.

  10. Accelerating embedded image processing for real time: a case study

    Many image processing applications need real-time performance, while having restrictions of size, weight and power consumption. Common solutions, including hardware/software co-designs, are based on Field Programmable Gate Arrays (FPGAs). Their main drawback is long development time. In this work, a co-design methodology for processor-centric embedded systems with hardware acceleration using ...

  11. Image Processing Using Artificial Intelligence: Case Study on

    The ML-based classification approaches have been found effective on many benchmark RS datasets. This chapter explores the issues and challenges of image-processing techniques in the classification of high-dimensional RS datasets. It discusses the potential of AI/ML-based approaches by showcasing a case study on the classification of Airborne ...

  12. (PDF) Studies on application of image processing in ...

    This study, which is Part II, provides an in-depth exploration of the concepts and methodologies underlying UV bandpass-filtered imaging, advanced image processing techniques, and the mechanisms ...

  13. JOURNAL OF LA A Deep Decomposition Network for Image Processing: A Case

    for Image Processing: A Case Study of Visible and Infrared Image Fusion Yu Fu, Tianyang Xu, Xiao-Jun Wu, Abstract—Image decomposition into constituent components has many applications in the field of image processing. It aims to extract salient features from the source image for subsequent pattern recognition. In this paper, we propose a new ...

  14. (PDF) A Deep Decomposition Network for Image Processing: A Case Study

    A Case Study for V isible and Infrared Image Fusion Y u Fu, Xiao-Jun Wu, Josef Kittler , Life Fellow , IEEE Abstract —Image decomposition is a crucial subject in the field

  15. Frontiers

    The field of image processing has been the subject of intensive research and development activities for several decades. This broad area encompasses topics such as image/video processing, image/video analysis, image/video communications, image/video sensing, modeling and representation, computational imaging, electronic imaging, information forensics and security, 3D imaging, medical imaging ...

  16. PDF X-ray Computed Tomography Image Processing & Segmentation: A Case Study

    The image filtering (in the spatial domain) is based on a spatial convolution operation between the image itself, represented by an M N (in the simplest 2-D case) dimension matrix, and a pre-defined K K matrix, known as the kernel or mask, which results in an image with the same original dimension (M N). In other

  17. Super-resolution Image Processing for Hemoglob

    Conjunctival/fingertip pallor may be quantified and used to detect anemia using digital pictures acquired with a camera or a simple smartphone. Super-resolution image reconstruction can augment the spatial resolution of captured imageries, allowing for rich feature extraction based on color and texture, which is a prerequisite for hemoglobin ...

  18. Image Processing on IOPA Radiographs: A comprehensive case study on

    A comprehensive in-depth case study of Image Processing, Feature Extraction and Analysis of Apical Periodontitis diagnostic cases in IOPA Radiographs, a common case in oral diagnostic pipeline. With the recent advancements in Image Processing Techniques and development of new robust computer vision algorithms, new areas of research within Medical Diagnosis and Biomedical Engineering are ...

  19. PDF Using Intel® FPGAs for Real-Time Image Processing in High-Performance

    In addition to excellent low-noise performance and quantitative measurement capability, the ORCA-Quest qCMOS has the following features: High quantum eficiency (90% at 475 nm) High resolution (9.4 megapixels) High-speed frame rate (120 fps) In the ORCA-Quest system, Intel FPGAs perform unique image. Implemented image processing to maximize ...

  20. (PDF) A Review on Image Processing

    Image Processing includes changing the nature of an image in order to improve its pictorial information for human interpretation, for autonomous machine perception. ... After doing the case study ...

  21. A Survey of Forensic Applications using Digital Image Processing: Image

    To gather reliable evidence and submit it to the court, forensic applications are used. Due to recent advancements in technology, many crimes now involve the modification of photos. Finding the original evidence and presenting it to the court has been simpler with the development of forensic software. The notion of digital image processing is combined with forensic applications to discover the ...

  22. X-ray Computed Tomography Image Processing & Segmentation: A Case Study

    The correct delimitation of the peaks permits the identification of the phases of interest (segmentation) in the X-ray CT image. In the case of a 'dried' soil sample, the left and the right peak are associated with the soil pore space and the soil solid matrix, respectively (Fig. 5.1).However, it is common to find histograms from soil images presenting overlapping peaks, which requires ...

  23. Image Processing Case Study

    Image Processing Case Study. Let's look at the transportation industry-based case of extensive image processing. Two video cameras were looking at the boxes moving fast on the conveyor belt. To provide high enough image resolution the cameras were placed close to the belt but they could not cover all the belt cross-section. They were placed ...

  24. Security Improvements of JPEG Images Using Image De-Identification

    In the case of images, de-identification is performed by mosaic processing or applying various effects. ... evaluated ways of protecting data by encrypting image data. These studies allow us to protect the privacy of users by making an original image unidentifiable when an unauthorized person accesses the data. ... they are easy to process, and ...

  25. A new hybrid feature reduction method by using ...

    A new hybrid feature reduction method by using MCMSTClustering algorithm with various feature projection methods: a case study on sleep disorder diagnosis. Original Paper; Published: 30 March 2024 (2024) ... such as pattern recognition, bioinformatics, biomedical engineering, and image processing . LDA is calculated as follows: