Medical image diagnosis in leukemia
Previous article on Acute Myeloid Leukemia in general: here.
In leukemia and hematology in general there are many potential applications for machine learning techniques with the aim of a differential diagnosis, aiding in the therapy decision, prediction of risk and reducing potential medical errors. The main applications nowadays are predictive modelling, diagnostics and medical image analysis (1). In this section, we will focus on machine learning and deep learning in medical images diagnosis. The increase in available data, hardware capabilities and cloud computing are allowing a great development in the field, and medicine is benefiting from this revolution. Nowadays, many algorithms can be run in a personal computer or in cloud service, increasing the potential number of users and researchers.
Figure 1. example of different blood cell according to morphology. N (Normal lymphocytes), CLL (chronic lymphocytic leukemia), SMZL (splenic marginal zone lymphoma), MCL (mantle cell lymphoma), HCL (hairy cell leukemia), FL (follicular lymphoma), PL (B and T prolymphocytic leukemia), LGL-T (large granular lymphocyte lymphoma), SS (Sézary syndrome), BC (Blast cells), PC (plasma cell), RL (Reactive lymphocytes). Figure source: (2).?
Medical image diagnosis is widely spread method to identify cancer in different organs, from blood cancer to skin cancer, brain and breast cancer and so on. There are many different techniques to evaluate the presence of abnormalities in human organs, such as MRI, PET, CT scans, radiography and so on. There are many cases in medicine where deep learning (DL) model achieved performance equal to physicians (or outperformed): for example DL reached a superior sensitivity in identify retinopathy, malignant skin lesion and breast lesions in mammograms (1).
Peripheral blood is easily accessible and it is a non-invasive procedure. Moreover, peripheral blood cells can be examined by optical microscopy (which is not expensive), thus morphological images can be acquired without high cost. Morphological analysis is indeed an important step and tool for diagnosis, especially in the case of leukemia. Traditionally morphological analysis of blood smear is conducted by a clinician, which is time-consuming, dependent on the experience, error prone and subjective. Myeloblasts are abnormal white blood cells (WBC) from myeloid lineage. WBC are generally composed of different cell subsets (leukocytes, monocytes, granulocytes and so on). Normally, these cell populations are in a fixed proportion. A sudden increase in the number of one of these population can indicate the presence of a blood related disease. A high increase in lymphocytes can be the symptom of acute lymphoblastic leukemia, conversely the sudden increase of neutrophils can be caused by chronic myeloid leukemia. Generally, the count of different blood type is done with the aid of diagnostic instrument like flow cytometry. But still in poor countries, medical doctors are using microscope images, which is a hard-consuming work and can be subjective.
Normally, before differentiating one myeloblast from a non-malignant cell, WBC segmentation is necessary. Another issue in working with blood cells is the adjacent cells: the blood cell can be present in groups and it makes harder to extract features from the cell image. For instance, leukocyte cells are round, but grouped leukocytes are not round, measuring roundness is a way to identify grouped leukocytes for further preprocessing. This require the identification of the cell, separating the cell from the background. WBC segmentation can imply also the identification of the cytoplasm and the nucleus separately. The analysis of the nucleus and the cytoplasm is useful to identify the different WBC subsets. Cells have different dimension, but some subsets have similar dimension and different nucleus shape. Monocytes and lymphocytes for instance do not present granules in the cytoplasm, allowing to separate them from granulocytes. Moreover, the shape the nucleus is giving important information for cell identification (structure, lobe presence). For instance, neutrophils have different nucleus lobes (up to 5). This information can be incorporated in the model to reach better accuracy.
The separation of the nucleus from the cytoplasm can be obtained applying a threshold, the nucleus is in contrast with the cytoplasm. This can be achieved using medical software’s image processing functions (for example, water-shedding, cauterization, contrast stretching, morphological filtering and so on). Although Images of WBC can be gray or color scale, many times they are gray scale, and since WBC are darker than other blood components, they can be separated using contrast. Preprocessing can make simple the segmentation and improve classification accuracy without high computational cost. As an example Putzu proposed a method for background separation and leukocytes identification with color-space conversion (RGB to CYMK) and thresholding, obtaining accurate results (92 %) (3). Another example, Patel proposed an automated method for identify leukemia cell at an early stage. After preprocessing they extracted different features (Color features, geometric features (perimeter, radius and so on), texture features (entropy, energy etc…) and statistical features) and on top of that they build a support vector machine (SVM) classifier (4). Kumar used neural network to build a classifier for leukemia images. They used principal component analysis (PCA) as a first step to reduce high dimensional feature data. PCA is useful as a dimensionality reduction technique to select the most variant components and thus increase performance of the algorithm (helpful also in preventing overfitting). On the extracted feature they used neural network with artificial bee colony (ABC) algorithm. ABC is a global search technique to global research and to slow convergence in the neural network. Kumar reported an average accuracy of 98% but they used a small dataset (5). Zhang used a 5000 dataset to identify WBC population. They used a residual segmentation network combined with a discriminator network for adversarial training. They used a convolutional neural network (CNN) as feature extractor and they combined these obtained feature vectors with histogram of oriented gradient (HOG) features. They used these features for a SVM classifier (6).
In the last five years, CNN bases methods have become the standard, reaching more than 95 % of accuracy (however some misclassification error are still standing in the case of cell subpopulation that look similar). Transfer learning is also making CNN approach much more accurate than previous traditional systems (7). Interestingly, in context where the resources are limited, CNN can be the basis for tool for triage and referral. A 2018 paper show a proof of concept about a mobile device for lymphoma detection (the device was based on deep learning approach) (8). These models can be also useful also when morphological analysis of specific disease is quite complex and request hard experience. Indeed, it is required a standard to allow higher confidence in the models (moreover, we need a standard also for the training data and their labels used in the studies).
An alternative approach to detecting leukemia was followed by Dehghan, they did not start using blood images but microarray images. The pre-processes image of microarray gene expression to extract the gene expression and on the top of that they build a classifier. Pre-processing started with detecting rotated image, removing the background, identify the gene block and spot location. Since in microarray the are a lot of genes it is necessary to select the relevant one, so they performed gene selection and they used a tree classifier (9) .
Figure 2. Overview of a leukemia detection classifier. Figure source: (10).
Figure 3. overview of convolutional neural network. Figure source: (11).
As mentioned before leukemia can infilter the bone marrow. Rehman proposed an approach for identify leukemia cells in the bone marrow microscope image. They worked in acute lymphoblast leukemia recognition, building a classifier to classify leukocyte into normal or three different ALL subsets. They pre-processed the images before to use a CNN. They compared their classifier versus SVM, naïve Baysian and they observed an higher accuracy with CNN classifier (12).
In average, most of the models proposed for leukemia diagnosis reach easily more than 90 % of accuracy. Most of the machine learning based models proposed in leukemia are using small sample size and the data in the studies are coming from a single medical center. This limitation raises questions about how these models can generalize on the data acquired in different centers or hospital. Some of the studies claimed to use thousands of images, but most of these images are obtained by just a small number of patients (13). Thus, there is the need to obtain more robust dataset and the constitution of large dataset libraries that can permit to avoid overfitting. Moreover, we need to do a prospectively validation of the proposed model, to increase generalization. These challenges are important to be addressed in the idea of integrating these models in daily clinical care (13).
following article on machine learning using other data source: here
Selected bibliography:
1. Shouval R, Fein JA, Savani B, Mohty M, Nagler A. Machine learning and artificial intelligence in haematology. British Journal of Haematology [Internet]. [cited 2020 Oct 5];n/a. Available from: https://onlinelibrary.wiley.com/doi/abs/10.1111/bjh.16915
2. Rodellar J, Alférez S, Acevedo A, Molina A, Merino A. Image processing and machine learning in the morphological analysis of blood cells. International Journal of Laboratory Hematology. 2018;40:46–53.
3. Putzu L, Ruberto CD. White Blood Cells Identification and Counting from Microscopic Blood Image. 2013;7:9.
4. Patel N, Mishra A. Automated Leukaemia Detection Using Microscopic Images. Procedia Computer Science. 2015;58:635–42.
5. Sharma R, Kumar R. A Novel Approach for the Classification of Leukemia Using Artificial Bee Colony Optimization Technique and Back-Propagation Neural Networks. In: Krishna CR, Dutta M, Kumar R, editors. Proceedings of 2nd International Conference on Communication, Computing and Networking. Singapore: Springer; 2019. page 685–94.
6. Zhang C, Wu S, Lu Z, Shen Y, Wang J, Huang P, et al. Hybrid adversarial-discriminative network for leukocyte classification in leukemia. Medical Physics. 2020;47:3732–44.
7. Hegde RB, Prasad K, Hebbar H, Singh BMK. Comparison of traditional image processing and deep learning approaches for classification of white blood cells in peripheral blood smear images. Biocybernetics and Biomedical Engineering. 2019;39:382–92.
8. Im H, Pathania D, McFarland PJ, Sohani AR, Degani I, Allen M, et al. Design and clinical validation of a point-of-care device for the diagnosis of lymphoma via contrast-enhanced microholography and machine learning. Nature Biomedical Engineering. 2018;2:666–74.
9. Dehghan Khalilabad N, Hassanpour H. Employing image processing techniques for cancer detection using microarray images. Computers in Biology and Medicine. 2017;81:139–47.
10. Saba T. Recent advancement in cancer detection using machine learning: Systematic survey of decades, comparisons and challenges. Journal of Infection and Public Health. 2020;13:1274–89.
11. Radakovich N, Nagy M, Nazha A. Machine learning in haematological malignancies. The Lancet Haematology. 2020;7:e541–50.
12. Rehman A, Abbas N, Saba T, Rahman SI ur, Mehmood Z, Kolivand H. Classification of acute lymphoblastic leukemia using deep learning. Microscopy Research and Technique. 2018;81:1310–7.
13. Salah HT, Muhsen IN, Salama ME, Owaidah T, Hashmi SK. Machine learning applications in the diagnosis of leukemia: Current trends and future directions. International Journal of Laboratory Hematology. 2019;41:717–25.