Impact of Tissue Thickness on Computational Quantification of Features in Whole Slide Images for Diagnostic Pathology

This work studied how TST impacts the presentation of tissue, both visually and via quantitative characterization, at both slide and cell nuclear levels. Extensive laboratory efforts were made to eliminate sources of potential batch effects by processing all samples in a single batch for cutting, staining, coverslipping, and scanning. Slides were all prepared by the same technician in the same batch using the same equipment. Two slides from each patient were also taken to capture intra-patient variability and provide a more comprehensive context. To account for the possibility that adjacent slices from the same patient might display similar nuclear features due to shared structures, we ensured that slides taken at the same thickness were sampled from sections positioned as far apart as possible (see ordering in Fig. 2). In the end, in our well-controlled experiments, our results show significant covariation in tissue presentation with TST, suggesting the potential of TST as a confounding variable in downstream analysis.

TST plays a critical role in the current visual assessment of histology slides for diagnostic purposes. Optimal TST ensures cellular and structural details are clearly visible, allowing for accurate interpretation of tissue architecture and pathology. If TST is too thick, overlapping cells and structures can obscure critical diagnostic features, leading to potential misinterpretation or missed diagnoses. Conversely, overly thin sections may lack sufficient detail, making it difficult to identify key histological characteristics, such as nuclear morphology, mitotic figures, or subtle changes in tissue organization. Our assessment shows thinner sections (≤ 3 µm) appear to be the best for producing higher-quality slides, with our results demonstrating that thinner slides more clearly delineate cell and structure boundaries, while thicker sections begin to exhibit blurriness and overlapping structures.

Our results also strongly suggest that TST affects the visualization and quantification of features, from the overall color space to smaller nuclear details. As TST increases, the texture feature contrast rises due to differences between adjacent pixels, with the brightest regions often being the white background visible through tissue tears. Our study shows that increasing TST leads to a decrease in brightness, resulting in darker tissue overall. We hypothesize that this darker tissue, when set against the white background, enhances measured contrast. The finding that increased thickness predictably raises contrast is not immediately intuitive, as one might expect the darker tissue to diminish contrast in certain regions.

In comparing the binned color distributions of the reference images to WSI using MSE, only one reference image, specifically the darkest one, exhibited a statistically significant trend. This result suggests that brightness, rather than broader variations in color across different channels, primarily influences the similarity between the color spaces of the images. Additionally, while all color channels exhibited a similar absolute drop in intensity, the green and blue channels were disproportionately affected in relative terms. Red channel brightness also minimally varied until 10 µm thickness (see Fig. 7 andSupplemental 5).

Fig. 7

Color histograms from the same patient, produced by HistoQC, as TST increases from top-left (0.5 µm) to bottom-right (10 µm). While there is a noticeable leftward shift in the distributions toward lower intensities across all channels as TST increases, the blue and green channels appear to be most affected

Intensity and brightness features have already been shown to vary with TST in studies in IHC [22]. Our study confirms these results in H&E on a WSI and nuclear level. This suggests that as TST increases, the amount of stain absorbed can change, affecting the intensity of the signal captured in the image. This, in concert with the Beer-Lambert law [23], in which the absorbance of light through a material is proportional to both the concentration of the absorbing substance and the path length (thickness), supports the notion that thicker tissues absorb more stain and more light.

When computed on segmented nuclei, the Haralick feature of difference entropy showed significant variation with TST. A potential explanation is that as more of the nucleus is included in thicker sections, the overlapping chromatin fibers within the nucleus interact with more stain creating a more homogenous texture. Increased homogeneity leads to lower difference entropy values. This effect appears to result in some level of covariation with all Haralick texture features with TST. Haralick features are commonly used in the development of image-based biomarkers [24]; thus, TST may negatively hamper their development, reproducibility, and reliability. Our results suggest that additional studies in this vein are warranted.

Another notable finding was the decreased clarity of nuclei boundaries in thicker samples (see Fig. 5), likely due to increased blurriness, which, despite good segmentation correspondence, may explain the collapse of the positive area trend at higher TSTs (see Fig. 6). Initially, we hypothesized that as thickness increases, the likelihood of capturing the widest part of the nucleus would increase, resulting in a larger average nuclear area. However, confirming this hypothesis is challenging due to the degraded clarity of the boundaries at higher thicknesses.

Z-plane focusing algorithms can account for some of these errors [25]. As TST increases, the potential for intra-slide variation in thickness may become more pronounced, causing the algorithm to focus sharply on structures within one plane while potentially blurring other structures located at different depths. This disproportionally affects thicker sections, where the increased depth and complexity make it difficult to consistently identify focal points which maintain clarity across the sample, leading to blur.

Although 10 µm samples are not common in clinical practice, they are increasingly required by spatial transcriptomics platforms to provide sufficient genetic material for the sequencing to succeed [26]. Researchers have suggested applying models trained on standard thicknesses to 10 µm spatial transcriptomics slides to correlate image-based features with genetic data [27]; however, our study indicates that this approach may be problematic, with models trained at thinner TSTs not being able to function optimally at higher TSTs.

Such disparities in tissue preparation highlight a key challenge in the reproducibility of machine learning models across diverse datasets, as site-specific variations have already been shown to introduce bias into deep learning algorithms [13]. To mitigate these effects, approaches like stain normalization [28] have been widely adopted. Our findings suggest that, in addition to these measures, sample TST should also be carefully considered as a factor influencing algorithmic performance. Appropriate laboratory quality control methods such as automatic sectioning machines and routine microtome calibration can help to ameliorate variable TST. However, inherent differences in, e.g., the fragments’ consistency, will remain key factors influencing TST irregularity. As such, while efforts to address TST variability are likely to yield improved consistency, it is unlikely that it can ever be entirely eliminated.

While directional trends can be observed—such as thinner tissue sections generally yielding higher-quality WSIs—defining specific optimal TST ranges for an algorithm remains challenging. This complexity arises from the interplay of multiple factors, including tissue characteristics, staining methodologies, and the specific algorithms employed for analysis. Algorithms trained on tissue from a particular TST range are likely to perform optimally within that range but may struggle with sections outside this learned distribution. As a result, acceptable TST ranges are often algorithm-specific rather than universally applicable. Additionally, different algorithms exhibit varying levels of robustness and failure characteristics. For example, low-magnification tasks, such as gross cancer detection, may remain relatively unaffected by TST variability due to the strength of the signal. In contrast, higher-magnification tasks that rely on more nuanced morphological features (e.g., mitosis detection) may be more sensitive to variations in TST, leading to increased susceptibility to failure. Unfortunately, many workflows currently do not mandate the documentation of TST, a gap that may hinder the reproducibility of studies and the reliability of diagnoses.

Addressing the lack of standardization requires a dual approach involving both clinical laboratories and developers of computational algorithms. From the clinical laboratory perspective, there is a need to harmonize and standardize tissue preparation practices, including the reporting of TST. On the computational side, algorithms should specify the TST they are optimized for and might also explore innovative solutions for automatically detecting and adjusting to it.

While our results are striking, there were however some limitations to this study. This included lack of a definitive ground truth for nuclei boundaries. There is a notable blur in the thicker sections, consequently affecting segmentation performance. Also, though HoverFast has demonstrated acceptable performance in the past, false positives are unavoidable when operating on such a large scale. To compensate, thorough data cleaning and visual verification were employed. This study also exclusively looks at thyroid tissue; however, it could be reasoned that similar trends will be present in a variety of tissue types. Also, this study exclusively analyzed benign tissue due to its consistent presentation, enabling a more controlled investigation of TST as the primary variable. In contrast, pathologic tissue can exhibit substantial heterogeneity due to the presence of disease, including variations in cellular architecture, stromal composition, and inflammatory response. These factors introduce additional complexity and could confound inter-patient TST comparisons. Given that we observed significant differences in TST even within the benign state, this highlights the need for follow-up studies that specifically examine TST in the context of distinct disease states.

Future research should aim to delineate how TST influences the diagnosis of malignancy and other pathological conditions while also focusing on developing standardized protocols for tissue sample preparation, establishing use case-specific thickness guidelines, and advancing computational algorithms that are more resilient to inherent tissue variability. This includes systematically testing algorithmic failure modes across different tasks and quantifying the impact of TST on existing biomarkers. A deeper understanding of these interactions will be crucial for ensuring the reliability and generalizability of machine and deep learning–driven pathology tools across diverse clinical settings. Lack of reproducibility in DP studies undermines the credibility of the field and hampers clinical implementation. By increasing the robustness of algorithms to pre-analytic variables, such as TST, the field of digital pathology can achieve more reliable and reproducible results.

View original article

ENDOCRINE PATHOLOGY

Like

Share Bookmark

0 0 0 0 0 0 0

More from this channel

Impact of Tissue Thickness on Computational Quantification of Features in Whole Slide Images for Diagnostic Pathology

Comments (0)