Improving lesion detection in mammograms by leveraging a Cycle-GAN-based lesion remover

Dataset

Under an approved IRB protocol, we collected 10,310 screening Full Field Digital Mammograms (FFDMs) from 4,832 women who visited the University of Pittsburgh Medical Center (UPMC) for routine breast cancer screening. The Selenia Dimension system (Hologic Inc, Marlborough, MA, USA) was used for all mammogram exams. We used four standard views including left–right Cranio-Caudal (CC) and left–right Medio-obilque (MLO) for this study. The dataset included 4,942 mammograms that showed a recalled lesion (BI-RADS 0) from 2,416 women and 5,368 mammograms randomly selected from exams with normal readings (BI-RADS 1) from 2,416 women. MQSA radiologists marked the location of the lesion for the recalled cases. Note that we had the BI-RADS classification information at the time of the screening only. As a result, further details about the lesions, such as pathology (benign, malignant) and types (masses, calcifications), were not available at the time of the data acquisition.

To develop the lesion remover and test its potential as a lesion highlighther for improving the performance of the lesion detection algorithm, we divided our dataset into the development and independent testing, where the development set include 3,959 mammograms of 1,909 women with recalled lesions and 4,263 mammograms of 1,429 women with normal/healthy breasts, while the testset include 983 mammograms of 507 women with recalled lesions and 1105 mammograms of 987 women with normal and healthy breasts. We further divided our development set into testing and validation with the ratio of 8:2.

Preprocessing

Using the lesion locations marked by MQSA radiologists, we segmented the patches to a size of 400 by 400 pixels (2.8 cm by 2.8 cm in size), including the recalled lesions for the cases. For normal controls, we segmented the same 400 by 400 pixel patch from the centroid of the breast area. We treated patches from the same woman, but different views (e.g., MLO and CC), as independent samples for the development dataset (training and validation). For testing, we randomly selected only one image patch from each patient to prevent possible data correlation between two different views of the same lesion. Figure 1 illustrates the above preprocessing process.

Fig. 1figure 1

Example lesion and normal patches. This figure illustrates how we extracted 400 by 400 pixel patches from mammograms. For the cases with recalled lesions, we segmented the patch including the lesion. For normal controls, we extracted the centroid of the breast area

Cycle-GAN

A Cycle-GAN consists of two generators, one for the mapping function G:X → Y and another for the mapping function F:Y → X, where X and Y are two different image domains. We set the dataset of normal patches as the source domain X, and recalled lesion patches as the target domain Y.

The loss function of the Cycle-GAN for this study is given as:

$$L\left(G, F,__\right)=_\left(G, _,X,Y\right)+_\left(F, _,Y,X\right)$$

$$+_L}_\left(G,F\right)+_L}_\left(G,F\right),$$

(1)

where LGAN, LCyc, and LIdentity refer to the adversarial loss, the cycle-consistency loss, and the identity loss, respectively. In addition, λ1 and λ2 are the weights that control the relative importance of LCyc and LIdentity compared to LGAN.

With associated generator Gen, discriminator Dis, and images in two domains, LGAN can be formulated as follow:

$$_\left(Gen, Dis,X,Y\right)=}_\left(}Dis\left(x\right)\right)+}_\left(}\left(1-Dis\left(Gen\left(y\right)\right)\right)\right),$$

(2)

where Gen and Dis refer to generator and discriminator. x and y are samples from two image distributions X and Y. Gen and Dis are optimized adversarially, that is, \(__\) \(_(Gen, Dis, A,B)\). In this study, we used G−DY and F−DX as Gen and Dis pairs, and X and Y as images in two different distributions/domains.

LCyc was introduced to ensure the consistency of style-transferred images, i.e., images translated from X to Y, and then back again to X, should be similar to X and vice versa. LCyc can be formulated as:

$$_\left(G,F\right)=}_\left(_\right)+}_\left(_\right).$$

(3)

LIdentity is the loss that restricts the mapping within the same domain as nearly identical when providing the real samples from one domain to the corresponding generator (i.e., G:Y → Y and F:X → X). This loss preserves the original characteristics of the real samples after the generator. LIdentity can be formulated as:

$$_\left(G,F\right)=}_\left(_\right)+}_\left(_\right).$$

(4)

Lesion remover

Once the Cycle-GAN is trained, the two mapping functions G and F can transfer the style from one domain to another domain. As we used patches with normal tissue and with a recalled lesion as the images in two independent domains, the generator with mapping function G will work as the lesion simulator by translating the normal patch to be similar to the lesion patch. Likewise, the generator with the mapping function F will work as the lesion remover by changing the style of the lesion patch to that of a normal patch. We refer to generator G as the lesion simulator and generator F as the lesion remover. Note that the focus of this paper is using the lesion remover as the lesion highlighter to improve the detection performance of CADe algorithms in mammograms. Discussing the potential use of the lesion simulator is beyond the scope of this paper.

We optimized the Cycle-GAN using an Adam optimizer [14] with a learning rate of 0.0002, and momentum parameters of β1 = 0.5, β2 = 0.999. In addition, we set the maximum epoch as 100 and the weights for L1 regularization, λ1 and λ2, as 10 and 0.5, and a minibatch size of 4. We used a random left–right vertical flip as data augmentation. We used a Nvidia Titan X GPU with a 12 GB memory for training the networks. Figure 2 shows the simulation results from the lesion remover over the course of the training.

Fig. 2figure 2

The lesion remover outcomes over the course of the training. Images in the first column show the patch with a recalled lesion for epochs 5, 50 and 100, and images in the second column are their corresponding output results. As the training epoch number increases, the lesion remover starts working as expected; the lesion remover removes or makes the existing lesion subtle

Lesion remover as lesion highlighter

Once trained, the lesion remover can remove the existing lesion in a given mammogram. We hypothesized that one can combine an image with a lesion removed with its original to highlight the existing lesion, such that a CADe algorithm can detect the lesion better from the combined images than that from the original.

We used the color fusion scheme (imfuse in MATLAB) to combine the lesion removed image with its original. The color fusion scheme we used colorizes the pixel value (green or magenta) if the image pixel values from two images were different, while retaining the gray value for those with the same pixel values. As the lesion remover should remove the lesion only, while keeping the other tissue intact, the resulting color fused image should highlight the lesion as shown in Fig. 3. Hence, the lesion remover can be used as a lesion highlighter if we combine the lesion removed with its original.

Fig. 3figure 3

Explanation of the Lesion Highlighter. This figure illustrates how we used the lesion remover as a lesion highlighter to increase the contrast of a given lesion to its background. The left side of this figure shows when the lesion highlighter was applied to a case that contains a lesion, while the right side of the figure shows a normal control image. The yellow arrow indicates the location of a recalled lesion. After applying the lesion remover on the given input image, we fused the image with its original to create a lesion highlighted image, as shown in the bottom left. Note that the lesion remover on the normal tissue kept the original characteristics intact such that there was no highlight shown in the resulting image on the bottom right

Note that one may think the lesion remover is not effective on images with normal tissue, as it was trained to remove lesion-like appearances in a mammogram, which may create a false positive detection by falsely enhancing normal tissue. However, the Cycle-GAN has an identity loss to ensure the F(x) x and G(y) y as shown in Eq. (4), such that the generator F is unlikely to remove any lesion-like normal breast tissue.

We applied the above lesion highlighter scheme on both image patches with normal and recalled lesions. Figure 3 illustrates how we applied the lesion highlighter for improving computer-aided detection of lesions.

Lesion detector

We used various state-of-the-art deep learning architectures for image classification as our lesion detector to classify the given image patch as a recalled lesion or normal. We employed ResNet18 [15], DenseNet201 [16], EfficientNetV2 [17], and Vision Transformer (ViT) [18]. All the networks we used were pretrained on ImageNet [19].

We updated the last few layers of each ImageNet pretrained network to match our purpose; to classify the patch as a recalled lesion or not. We then used the images from the training set to train each network. We refer to these networks trained on original mammogram patches as baseline. Likewise, we trained each network using the training set after the lesion highlighter was applied. We refer to these networks as highlighted. We validated the networks after each training epoch using the validation set. As the input size of all networks was 224 by 224 pixels, we randomly segmented 224 by 224 patches from the original patch images with 400 by 400 pixels. In addition, we employed random vertical and horizontal flips, random rotation with ± 30º, and random scales with ± 25%.

For training ResNet18 and DenseNet201, we used the MATLAB training environment. Specifically, we used the Adam optimizer [14] with an initial learning rate of 0.001, a learning rate dropping factor of 0.1 for every 10 epochs, and momentum parameters of β1 = 0.5, β2 = 0.999. In addition, we set the maximum epoch as 50 and a minibatch size of 128. We also employed early stopping when the validation accuracy at each epoch dropped more than 5 times. For training EfficientNetV2 and ViT, we used the Pytorch training environment [20] with a similar augmentation setup to that of MATLAB, except for the number of epochs for ViT, which we set to 100 epochs. We used a Nvidia Titan X GPU with a 12 GB memory for training all networks.

Evaluation methods

We refer to the network trained solely on original mammograms as baseline (or base), and those which trained on lesion highlighted mammograms as highlighted (or hi-lited). It is possible that mammograms before and after applying the lesion highlighter would provide different but complementary information for lesion detection. Therefore, we developed a logistic regression classifier to combine the diagnostic information between the baseline and the highlighted versions. Specifically, we trained the logistic regression classifier using the scores of both versions on the validation set. We then referred the resulting logistic regression classifier for each network that we considered as combined (or comb). Figure 4 illustrates how we constructed baseline, highlighted, and Combined lesion detectors for this study.

Fig. 4figure 4

Explantion of lesion detectors. This figure illustrates how we train lesion detectors using the original and lesion highlighted lesions. We used four different deep network architectures including ResNet18, DenseNet201, EfficientNetV2, and Vision Transformer (ViT) as our lesion detector. For each detector, we built Baseline model using original patch, Highlighted model using highlighted patch, and Combined by combining the scores from Baseline and Highlighted using logistic regression

We used the Area under the Receiver Operating Curve (AUC) for classifying a given patch as containing a lesion or not as our figure of merit. Note that our hypothesis is that the lesion highlighter would increase the AUC of a classifier in identifying patches containing a lesion. Hence, for each CNN architecture, we compared the performances of the highlighted and combined models over the baseline model using Delong’s method [21].

Comments (0)

No login
gif