Image-Based Artificial Intelligence in Psoriasis Assessment: The Beginning of a New Diagnostic Era?

Main automated image analysis applications in psoriasis include detecting and outlining lesion borders, differentiating psoriatic lesions from other skin conditions, objectively calculating area involvement and severity scores, as well as selecting treatments and predicting their response.

3.1 Image Segmentation of Lesions

In addition to correctly identifying psoriasis on skin photographs, a critical step in performing next-level tasks such as assessing disease severity is the automated detection and delineation of individual lesions. Manual image segmentation is a tedious task for dermatologists, so researchers have focused on developing automated image segmentation algorithms. A major advantage for this feat is that psoriatic lesions are usually easy to distinguish from the surrounding unaffected skin. However, challenges arise from poor image quality, including insufficient illumination, blur, or artifacts such as camera reflections, as well as the polymorphic appearance of lesions [26]. Previous algorithms often relied on feature engineering (e.g., feature-based Bayesian framework), lacked accuracy, or failed to segment challenging input images correctly (e.g., Markov random field combined with a support vector machine), limitations that have been partially overcome by the use of CNNs [1, 26]. Dash et al. developed PsLSNet, a 29-layer deep U-net-based CNN (designed for image segmentation, featuring a U-shaped architecture that effectively captures context in images and enables precise localization), which automatically extracts spatial information and was validated on 5241 images from 1026 psoriasis patients, including more challenging images [26]. Results showed an accuracy of 94.8%, outperforming all previous approaches [26]. In addition, two deep learning models (DLMs) based on a U-net architecture with a ResNet backbone (which enables training of very deep models with hundreds or thousands of layers) were developed and trained by Amruthalingam et al. to anatomically map and segment hand eczema lesions with high accuracy [27]. According to the authors, this model could also be applied to psoriasis, as both conditions can present very similarly with red, scaly patches and plaques on the dorsal and palmar aspects of the hands [27].

At the histopathological level, CNNs are expected to provide future clinical support by automatically analyzing skin biopsy images. As a first step, a U-net-based CNN was applied by Pal et al. to successfully segment psoriasis skin biopsy images into epidermis, dermis, and non-tissue, which is a prerequisite for the development of more sophisticated models that can recognize characteristic pathological features of the disease within each skin layer [28]. Such forms of image segmentation are not only valuable at the microscopic level, but can also be applied to macroscopic images to evaluate the presence of lesions, as well as disease extent and severity, as outlined in the following sections.

3.2 Diagnosis and Subtype Classification

For proper treatment, psoriasis must first be correctly diagnosed. In clinical routine, diagnosis is usually based on an inspection of the entire skin surface, including scalp and nails, while taking into account the patient’s medical and family history. Significant advances have been made by several research groups in developing image-based AI algorithms trained on large datasets of annotated psoriasis images to extract quantitative image features and automatically detect and classify lesions [29,30,31,32,33].

Aggarwal [29] was able to improve the performance of a CNN model discriminating five dermatological diseases (acne, atopic dermatitis, impetigo, psoriasis, and rosacea) by augmenting the input data with image transformations such as zooming, shearing, rotating, and horizontal and vertical flipping. Zhao et al. developed a two-stage CNN using 8021 images to discriminate nine different diagnoses based on clinical photographs, which made 9% fewer errors in diagnosing psoriasis compared with 25 dermatologists using a test set of 100 images (accuracy of CNN: 0.96, mean human accuracy: 0.87) [30]. Using Xiangya-Derm, the largest dermatology data set of the Chinese population with over 150,000 clinical images of 571 different skin diseases, Huang et al. developed a CNN to differentiate six common skin diseases, outperforming the accuracy of 31 dermatologists by 6.6% [31]. Several other CNNs have been developed to discriminate psoriasis from other dermatological diagnoses, with overall accuracy mostly comparable to or better than dermatologists [32, 33]. However, there is a lack of research on real-world applicability and open-source training data for currently published algorithms.

Furthermore, image-based AI applications need not be limited to the analysis of macroscopic images. Dermoscopic images offer high-resolution visualization of the skin, revealing subtle details such as vascular or pigment patterns through magnification of epidermal and upper dermal layers, potentially enhancing diagnostic accuracy depending on the clinical task. However, acquiring and interpreting these images requires time, specialized equipment, and expertise. For CNN classification purposes, dermoscopic image data sets tend to be more standardized, improving model generalizability.

In contrast, macroscopic images are more accessible, faster to acquire, and provide a broader clinical overview of lesions, making them preferable for initial screenings. Based on macroscopic assessment, clinicians can determine whether additional dermoscopic examination is necessary. A combined approach, utilizing both macroscopic and dermoscopic images, can be advantageous in providing both context and detail.

For instance, differentiating between psoriasis and seborrheic dermatitis on the scalp can be challenging using macroscopic assessment alone. Dermoscopy can offer additional diagnostic clues, such as the presence of annular and hairpin blood vessels indicative of psoriasis, or unstructured white areas and atypical vessels suggestive of seborrheic dermatitis, aiding in more accurate diagnosis [34]. Yu et al. trained GoogLeNet, a 22-layer deep CNN pre-trained on the ImagNet dataset, to differentiate scalp psoriasis from seborrheic dermatitis using dermoscopic images [34]. The algorithm outperformed five dermatologists with varying levels of experience with a 26.7% higher sensitivity and 6.8% higher specificity (sensitivity: CNN 96.1%, dermatologists (mean) 69.4%; specificity: CNN: 88.2%, dermatologists (mean) 81.4%) [34]. Furthermore, non-qualified physicians were able to achieve diagnostic performance similar to that of dermoscopy-proficient dermatologists through assistance from the model (mean sensitivity 79.1%, mean specificity 81.9%) [34].

This suggests that physicians without specialized training (e.g., in remote areas) or teledermatological applications could directly benefit from additional AI expertise to optimize patient management with dermatologists referred to when needed. The Telemedicine Working Group of the International Psoriasis Council recently determined that managing psoriasis through teledermatology is feasible in most cases, with exceptions for special affected areas such as the genitals or scalp [35]. A previous study has demonstrated that both online and in-office dermatologic follow-ups for psoriasis result in comparable improvements in psoriasis severity and Dermatology Life Quality Index scores [36]. While diagnostic AI holds significant potential to enhance these services, further studies are necessary to assess its implementation and effectiveness.

In terms of subtype classification, a CNN was used by Aijaz et al. to differentiate plaque, guttate, inverse, erythrodermic, and pustular psoriasis with high accuracy (84.2%) [37]. The training sets used included 80% of 172 images of normal skin and 301 images of psoriasis from the Dermnet dataset, while the remaining 20% were used for validation and testing [37]. Plaque and guttate psoriasis images were overrepresented in the dataset (plaque: n = 99, guttate: n = 96), followed by pustular (n = 48), erythrodermic (n = 33), and inverse psoriasis (n = 25) [37]. Regarding the classification performance for individual subtypes, the highest accuracy was achieved for inverse psoriasis (100%), followed by a sensitivity of 96.5% for normal skin (28/29), 87.2% for guttate (34/39), 85.2% for erythrodermic (23/27), 73.3% for pustular (22/30), and 70% for plaque psoriasis (28/40) [37].

A major limitation of these reported results is the lack of external test sets with diverse patient populations in different clinical settings, which would provide more insight into the generalizability of algorithms and their potential for real-world clinical use. In addition to psoriasis subtypes, other differential diagnoses presenting with red, scaly plaques such as atopic dermatitis, tinea corporis, mycosis fungoides, pityriasis rosea, or cutaneous lupus erythematosus must be distinguished from psoriasis by AI. To make an accurate diagnosis, CNNs must be trained using large datasets containing these differential diagnoses to recognize subtle differences in appearance and distribution patterns. As dermatology training sets become larger and include more images of psoriasis subtypes, differential diagnoses, and diverse patient populations, future algorithms are expected to become more comprehensive. In addition to diagnostic applications, AI has great potential to facilitate the assessment of the extent and severity of psoriasis, as detailed in the following section.

3.3 Assessment of Disease Extent and Severity

Automated assessment of psoriasis disease extent and severity has the potential to significantly reduce physician workload while ensuring a high degree of standardization and reproducibility.

3.3.1 Clinical Scores

Dermatologists currently mainly use the PASI, BSA, or PGA systems to grade clinical severity of plaque psoriasis [2, 14].

PASI is most commonly used in research studies and assesses the intensity of erythema, induration, and desquamation on different anatomical areas using a scale from 0 to 72 (maximum disease activity) [38]. It is often used as a standard measurement tool in the validation of new scores and usually correlates well with physician-based assessments, as measured by Spearman or Pearson correlation coefficients [13]. For example, Bozek and Reich evaluated the reliability of PASI, BSA, and PGA in the examination of nine patients by ten dermatologists, with each subject being assessed twice by the physicians [14]. Significant Pearson correlations were observed between all three scales, and no assessment instrument was significantly superior [14]. Major criticisms of the PASI score include its complexity, extensive time requirements, high variability, low responsiveness in mild disease, and non-linear scale [13,14,15]. Since PASI uses a discontinuous score from 0 to 6 to assess area involvement (0: 0%, 1: 1–9%, 2: 10–29%, 3: 30–49%, 4: 50–69%, 5: 70–89%, 6: 90–100%), changes within a score interval are not adequately reflected [39]. To address these inaccuracies, the linearly increasing PrecisePASI score was developed to accurately reflect the severity of lower BSA ranges by using the actual percentage of area involvement as opposed to imprecise area class intervals [39].

BSA calculation is often included in the assessment of psoriasis severity and can be estimated using the ‘rule of nines’ or the number of patient hand areas affected (with one hand representing approximately 1%) [13]. While computation is easily feasible in clinical routine and results in a linear measure, BSA is prone to overestimation and inter-rater reliability is variable [13].

PGA provides an ordinal 5- to 7-point rating ranging from ‘clear’ to ‘very severe psoriasis’, with good reliability independent from observer experience [13]. PGA has been shown to display the highest inter-rater reliability in comparison with BSA and PASI by Bozek and Reich (coefficients of variation [%]: PGA 29.3, PASI 36.9, BSA 57.1) [14]. It can be used statically to assess a single time point or dynamically for baseline comparison. Disadvantages include the high inter-rater reliability and lack of body surface area assessment [14]. Given these limitations, a more reproducible, standardized, and time-efficient estimation of disease severity is needed, which could be provided by image-based AI algorithms.

3.3.2 Automated Severity Scoring of Plaque Psoriasis

A prerequisite for automated severity scoring is the implementation of an accurate image segmentation algorithm [1, 26,27,28]. With the advancement of ML methods, CNNs (i.e., using U-net models) have already been developed that can estimate BSA at the level of a dermatologist [40]. However, the automated assessment of individual clinical PASI subcriteria from two-dimensional images is more technically challenging, especially with regard to three-dimensional features such as induration. Schaap et al. achieved this feat by using a CNN structure that takes ordinal scales into account and trained a separate network for each anatomical region (trunk, arms, and legs) and each PASI subscore category (erythema, induration, desquamation, and area), resulting in 12 CNNs [41]. The models were able to demonstrate similar performance to dermatologists in the scoring of erythema, scaling, and induration, while outperforming physician assessment in image-based area scoring [41]. A single-shot PASI system (SS-PASI) was developed by Okamoto et al., which assesses a simplified psoriasis severity score from a single input image of the trunk, since photographs of this anatomical area are usually readily available, fairly standardized, and show a large skin surface [42]. The CNN performed consistently with SS-PASI scores of human raters (13 dermatologists, 9 medical students) using a test set of 10 images that were excluded from the training images [42]. However, since the training set used by the authors contained only 670 psoriasis images, risk of overfitting is possible [19, 43].

While these and further examples from research applications and have previously been reviewed by Liu et al. [1], we would like to focus on currently available clinical tools.

3.3.3 Commercially Available Systems For Semi-Automated Severity Scoring

The use of total body photography (TBP) lends itself to automated psoriasis severity calculations in routine practice. Currently, there are two commercially available systems that use standardized photo documentation, automated segmentation, and subsequent semi-automated computer-assisted PASI calculation for patient assessment and follow-up.

3.3.3.1 Automated Total Body Mapping

FotoFinder ATBM® Systems GmbH (Bad Birnbach, Germany) uses Automated Total Body Mapping (ATBM) to provide a standardized, two-dimensional overview of the skin surface by allowing patients to assume various anatomical positions in front of a dynamic mount with a cross-polarized, xenon-flash, high-resolution camera [44]. Using FotoFinder’s PASIscan® analysis software, the underlying psoriasis type can be selected and automated lesion segmentation is performed to estimate PASI pre-score values, including affected body surface area of the head, arms, trunk, and legs, as well as erythema, plaque thickness, and scaling [44]. These values can then be manually adjusted by the physician for final PASI calculation, which may be particularly necessary for areas covered by hair, such as the scalp, or body parts covered by underwear. During follow-up, images can be viewed side by side for direct comparison and improvement is automatically quantified by PASI 50, 75, 90, or 100 (indicating 50%, 75%, 90%, or 100% improvement from baseline) [44]. The accuracy and reproducibility of this algorithm was evaluated in a comparative observational study involving three trained physicians and 120 plaque psoriasis patients, which showed a high level of human–AI agreement and demonstrated superior repeatability of AI assessment compared with physicians [45]. Based on the promising precision and reproducibility, it may be recommended for use in clinics with financial access to such technologies or for research trials after further studies have been conducted. Limitations include the inability of some patients (especially the elderly) to reach predefined positions for image acquisition, and the time resources and/or additional personnel required to capture respective image series [45]. In addition, lack of automated psoriasis subtype identification and body sites such as the genital area or hairy scalp that still require additional, thorough clinical examination by a dermatologist are a main limitation for the development of a fully automated score calculation.

3.3.3.2 3D Total Body Photography

In recent years, 3D TBP has been commercially developed using the VECTRA® WB360 (Canfield Scientific, Parsippany, New Jersey, USA) and overcomes some of these limitations. This system uses images captured instantaneously by 92 cameras in a single anatomical position to create a digital avatar of the patient’s skin surface from two-dimensional images in macro-quality resolution, excluding plantar surfaces, mucous membranes, and areas covered by hair (Fig. 3). A psoriasis assessment tool has recently been developed for the software that allows automated segmentation of the 3D avatar and calculates the lesion coverage of each anatomical region (head and neck, arms, trunk, legs, and whole body) [46]. Physicians can then manually score the erythema, induration, and desquamation of each region to calculate an automated whole body PASI score. Potential benefits include a simplified, more time-efficient image acquisition process. This novel algorithm has, however, not yet been validated in clinical trials. For melanoma screening, it has already been shown that patients prefer the 3D TBP system to the 2D-TBP system, mainly based on the more time-efficient, facilitated imaging process [47]. Further real-world comparative studies are needed to determine patient and physician preferences for psoriasis applications and to demonstrate true benefit of the Canfield algorithm in clinical use. Limitations of this system include its high acquisition cost and the significant space needed for setup, which restrict its clinical availability mainly to larger centers. Additionally, time and personnel resources are required to manually score erythema, induration, and desquamation for each region to calculate the whole-body PASI score. Automatic psoriasis subtype identification, similar to Automated Body Mapping, is currently not yet possible. Furthermore, special areas such as the scalp or plantar surfaces are not imaged and must be examined separately, limiting the potential use in remote settings (Fig. 3).

Fig. 3figure 3

VECTRA® WB360 avatar of a psoriasis patient captured by 3D total body photography. Clinical image courtesy of the University Hospital Basel, used with patient permission

3.3.4 Automated Severity Scoring of Other Psoriasis Subtypes

While the above-mentioned algorithms focus mainly on severity analysis of plaque psoriasis, research has recently shifted towards other subtypes. Several well-established clinical scores have been developed to assess disease severity in psoriasis subtypes such as generalized pustular psoriasis (e.g., Generalized Pustular Psoriasis Area and Severity Index [GPPASI]), or for involvement of specific locations such as the nails (Nail Psoriasis Severity Index [NAPSI]) [48, 49]. Similar to plaque psoriasis assessments, calculation in a clinical setting can be tedious and time consuming, a task that could potentially be facilitated and standardized by the use of AI.

Folle et al. used a transformer DLM, which uses self-attention mechanisms to weigh the importance of different parts of the input image, to automatically quantify NAPSI scores with high agreement with human annotations (Pearson correlation of 90%) [49]. Amruthalingam et al. quantified pustular psoriasis efflorescences using a DLM to objectively evaluate disease activity [50]. A very high agreement was reached between the model’s predictions and expert labelling using a test set (intraclass correlation coefficients [ICC]: 0.97 for count and 0.93 for surface percentage) [50]. Reliability was confirmed by application to an unstandardized test set with multiple pustular disorders (Spearman correlation [SC] coefficients compared with dermatologist evaluation: 0.66 for count and 0.80 for surface percentage) [50].

While an automated severity score of plaque psoriasis would certainly meet the most common demand, we believe that it is important to continue a parallel investigation of AI applications in these rarer subtypes. If the accuracy and reliability of such algorithms continue to improve and even surpass human performance in future studies, we predict that semi- to fully automated severity scoring will soon serve as the gold standard in centers where respective technologies are available and for clinical trial assessments. By offering the advantages of consistency, objectivity, efficiency, precision, and scalability, AI could potentially overcome the limitations of current clinical assessment scores.

3.4 Treatment Selection and Response

Predicting treatment response and personalizing drug selection has great potential to improve the quality of life of psoriasis patients and optimize long-term outcomes. Currently, clinical treatment strategy is based on disease severity, subtype, location, presence of psoriasis arthritis and other co-morbidities, as well as patient preference and satisfaction [8].

Several AI applications have been developed that attempt to identify potential biomarkers and predict individual short- and long-term response to biologics [1, 51]. For example, the quantification of systemic inflammatory proteins measured before and four weeks after initiation of systemic treatment with tofacitinib and etanercept was used to develop an ML model that accurately predicted long-term response [52]. Unsupervised cluster analysis has been used to categorize psoriasis patients into three subgroups based on their lesional and non-lesional skin transcriptome to predict treatment effects of methotrexate and various biologicals using an ML algorithm [53].

Since AI has the capacity to analyze extensive datasets including patient records, clinical photographs, and molecular characteristics, personalized treatment plans may very well be our near future as new patterns continue to be discovered. ML approaches have already been used to show which patients with psoriatic arthritis would benefit from a higher starting dose of secukinumab [54]. We anticipate that image-based AI will also play a central role in the development of automated treatment decision algorithms for psoriasis patients. By integrating imaging data with clinical and genetic information, AI models could identify optimal treatment regimens tailored to individual patient characteristics, improving therapeutic efficacy and reducing potential side effects. Features such as the clinical phenotype, lesion distribution, and severity could be extracted from photographs using CNNs to serve as input for such treatment recommendation models. In addition, potentially influential variables for treatment success, such as patient age, gender, ethnicity, comorbidities, co-medication, or previous treatments, as well as molecular profiles, could be considered to optimize treatment choice once further research has been conducted.

Comments (0)

No login
gif